-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement walk extraction #96
Comments
Holy smokes, this is huge (speedup of an order of magnitude)! Thank you so much @MartinBoeckling, would you be willing to perhaps create a PR from this so you can be credited as a contributor when this is added? You could put this in a simple example file and we can work from there. At first sight, it does seem that the igraph approach uses quite a bit more memory? Perhaps we should support both alternatives for those with lower RAM memory (although 0.5 GB is still not much at all) BTW: if you thesis gets published, I would be very interested in reading it! What will the title be so I can keep an eye open for it? |
Sure, I can definitely create a pull request with a sample implementation. Regarding the memory consumption I was able to reduce the memory consumption to 200 mb by using tuples. Maybe an implementation with numpy arrays could reduce the memory consumption further. |
@GillesVandewiele, regarding the Pull Request for this feature, should I also add test and documentation or should I just create the pull request with the example file of the igraph implementation? |
Hi @MartinBoeckling. Documentation would definitely be nice, but I already appreciate the PR a lot! We could take care of docs & tests later. The dev branch would indeed be ideal! Thank you |
Created Pull Request #122, therefore I would close this issue |
Will reopen this issue, Created a separate branch to update some and test some aspects. |
Hi @bsteenwi, |
Hi @MartinBoeckling, I had to make a lot of changes based on the code in your pull request to make it work... |
Hi @bsteenwi , |
🚀 Feature/ Enhancement
Increase walk extraction speed
I have experimented with the pyRDF2Vec package for my thesis and therefore appreciated the provided python package. During the course of my thesis I came across the described performance limitations for large knowledge graphs. As I faced memory issues and long computation times also after optimizing the different parameter settings for the walk extractor I partially implemented the walk extraction with the python igraph package which gave a performance boost regarding the walk extraction. Since I had benefited from the reduced calculation time I thought it would be useful to share my implemented approach.
Additional context
The igraph package is partially implemented in C and implements multiple graph algorithms, like the Breadth-first search and Depth-first search algorithm. A performance comparison can be seen in the picture attached below where the Pyrdf2vec approach took around 53 minutes and the implemented approach with igraph took around 2 minutes. The performance comparison was measured with the following Google Colab document on the FB15k train dataset. As I did not transfer the Knowledge Graph object to an igraph object I directly read in the csv file.
If additional information regarding the implemented approach are necessary, feel free to reach out.
Solution
A possible solution approach can be found here:
The text was updated successfully, but these errors were encountered: