Adapt to dataframes #4

angelip2303 · 2022-11-14T13:19:55Z

In this issue I will address everything related to converting from GraphX to GraphFrame.

I've been looking through our possibilities for switching to Graphframes. Although it is obvious that using dataframes will improve the solution, there are certain advantages and disadvantages. And probably, as Pregel is an iterative process, they are even better and datasets.

Using dataframes or datasets might be a good approach for optimizations. Additionally, dataframes would probably help with memory usage, which is one of the greatest issues we now have (see image below). Additionally, Spark Catalyst may be used as an optimizer.

Last but not least, observe how switching between solutions might not be too difficult, according to Databrick's documentation:

What’s more, as you will note below, you can seamlessly move between DataFrame or Dataset and RDDs at will - by simple API method calls - and DataFrames and Datasets are built on top of RDDs.

Things that should be taken into account

Consider which Framework is more suitable for the dataframe solution.
Consider creating our own Framework.

To-Do list 👍🏻

Create a dataframe branch where the proposed solution is implemented.
Reproduce some benchmarks comparing RDDs and dataframes.

angelip2303 mentioned this issue Nov 14, 2022

Roadmap and meta-information #1

Open

3 tasks

angelip2303 self-assigned this Nov 14, 2022

angelip2303 added documentation Improvements or additions to documentation enhancement New feature or request and removed documentation Improvements or additions to documentation labels Nov 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt to dataframes #4

Adapt to dataframes #4

angelip2303 commented Nov 14, 2022 •

edited

Loading

Adapt to dataframes #4

Adapt to dataframes #4

Comments

angelip2303 commented Nov 14, 2022 • edited Loading

Things that should be taken into account

To-Do list 👍🏻

angelip2303 commented Nov 14, 2022 •

edited

Loading