Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Consumption #105

Merged
merged 6 commits into from
Dec 16, 2017
Merged

Memory Consumption #105

merged 6 commits into from
Dec 16, 2017

Conversation

mauricioaniche
Copy link
Owner

This PR focuses on writing a memory consumption test for RepoDriller. This way we can measure the impact of our changes in terms of memory usage.

@mauricioaniche
Copy link
Owner Author

Do not merge it. I am using this PR as a playground.

@mauricioaniche
Copy link
Owner Author

@davisjam this simple commit prints some memory consumption after analyzing 1k commits in Rails. It prints the memory consumption in the console, so we can see it from Travis. Although definitely not scientific, it will give us a first impression of the differences:

Max memory (median): 585518760
Min memory (median): 86717528

This is what we get now (82 to 558MB). This is quite a lot, actually. We should try and see how the features we implemented to reduce the consumption work.

For now, I think I'll merge this into master. Then, in your PR, you rebase it, and we wait for Travis to tell us the memory. What do you think?

@davisjam
Copy link
Contributor

Nice job, this will be a useful way to evaluate memory overheads.

The 558MB is presumably because RepositoryMining.processRepo keeps all of the ChangeSets it uses in memory at once, rather than streaming them. While #100 helps, it doesn't eliminate the "high water mark".

I'm confused about why your visitor finds different amounts of memory used at different times. In RepositoryMining.processRepo all of the ChangeSets are in memory before the visitor is called, and they remain so until after the final call to the visitor. Since your visitor isn't changing the amount of memory in use, I don't see why you would see more than one number. Can you think of any reason for this?

@mauricioaniche
Copy link
Owner Author

@davisjam, not really... My best guess is that the list of ChangeSet represents those 84 MB. We create Commit objects 'on-the-fly', but I suppose GC takes some time to remove them; there's nothing we can do here, as playing with GC can be harder than it looks.

Maybe the best thing is to plot an histogram, because hopefully the GC works from times to times. Makes sense?

@davisjam
Copy link
Contributor

You could also run the GC before you collect your memory statistics.

@davisjam
Copy link
Contributor

4ec77fe: Useful, but I think you should also run the GC for a precise value.

@mauricioaniche
Copy link
Owner Author

That will be the next test!

@mauricioaniche
Copy link
Owner Author

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  87.18  238.10  381.10  369.90  475.80  669.10 

memory

As we see, GC works like hell. :) I do not know how to explain the two blocks... maybe the GC learned something, and improved?

But still, we can plot the same chart for your branch. Can you merge it there, @davisjam ?

@davisjam davisjam mentioned this pull request Dec 15, 2017
@mauricioaniche mauricioaniche merged commit 6a94a46 into master Dec 16, 2017
@mauricioaniche mauricioaniche deleted the memorytest branch December 16, 2017 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants