Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt to dataframes #4

Open
1 of 2 tasks
Tracked by #1
angelip2303 opened this issue Nov 14, 2022 · 0 comments
Open
1 of 2 tasks
Tracked by #1

Adapt to dataframes #4

angelip2303 opened this issue Nov 14, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@angelip2303
Copy link

angelip2303 commented Nov 14, 2022

In this issue I will address everything related to converting from GraphX to GraphFrame.

I've been looking through our possibilities for switching to Graphframes. Although it is obvious that using dataframes will improve the solution, there are certain advantages and disadvantages. And probably, as Pregel is an iterative process, they are even better and datasets.

Using dataframes or datasets might be a good approach for optimizations. Additionally, dataframes would probably help with memory usage, which is one of the greatest issues we now have (see image below). Additionally, Spark Catalyst may be used as an optimizer.

image

Last but not least, observe how switching between solutions might not be too difficult, according to Databrick's documentation:

What’s more, as you will note below, you can seamlessly move between DataFrame or Dataset and RDDs at will - by simple API method calls - and DataFrames and Datasets are built on top of RDDs.


Things that should be taken into account

  • Consider which Framework is more suitable for the dataframe solution.
  • Consider creating our own Framework.

To-Do list 👍🏻

  • Create a dataframe branch where the proposed solution is implemented.
  • Reproduce some benchmarks comparing RDDs and dataframes.
@angelip2303 angelip2303 self-assigned this Nov 14, 2022
@angelip2303 angelip2303 added documentation Improvements or additions to documentation enhancement New feature or request and removed documentation Improvements or additions to documentation labels Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant