You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this issue I will address everything related to converting from GraphX to GraphFrame.
I've been looking through our possibilities for switching to Graphframes. Although it is obvious that using dataframes will improve the solution, there are certain advantages and disadvantages. And probably, as Pregel is an iterative process, they are even better and datasets.
Using dataframes or datasets might be a good approach for optimizations. Additionally, dataframes would probably help with memory usage, which is one of the greatest issues we now have (see image below). Additionally, Spark Catalyst may be used as an optimizer.
Last but not least, observe how switching between solutions might not be too difficult, according to Databrick's documentation:
What’s more, as you will note below, you can seamlessly move between DataFrame or Dataset and RDDs at will - by simple API method calls - and DataFrames and Datasets are built on top of RDDs.
Things that should be taken into account
Consider which Framework is more suitable for the dataframe solution.
Consider creating our own Framework.
To-Do list 👍🏻
Create a dataframe branch where the proposed solution is implemented.
Reproduce some benchmarks comparing RDDs and dataframes.
The text was updated successfully, but these errors were encountered:
In this issue I will address everything related to converting from GraphX to GraphFrame.
I've been looking through our possibilities for switching to Graphframes. Although it is obvious that using dataframes will improve the solution, there are certain advantages and disadvantages. And probably, as Pregel is an iterative process, they are even better and datasets.
Using dataframes or datasets might be a good approach for optimizations. Additionally, dataframes would probably help with memory usage, which is one of the greatest issues we now have (see image below). Additionally, Spark Catalyst may be used as an optimizer.
Last but not least, observe how switching between solutions might not be too difficult, according to Databrick's documentation:
Things that should be taken into account
To-Do list 👍🏻
The text was updated successfully, but these errors were encountered: