Make ChangeSet more full-featured #100

A Changeset now includes a message as well as author/committer objects modeled on JGit's PersonIdent. This is a breaking change for anyone who uses ChangeSet. The payoff is that on complex git-based repositories, users can now better handle the distributed nature of git. The time-based CommitRanges provided by RepoDriller now use the committer time when doing time comparisons. This is what the user generally intends: "the time at which this commit entered the repository". Addresses mauricioaniche#96.

Address mauricioaniche's comments. Note: We are defaulting to "probably not the value the user wants". Is this a good idea?

Mark this deprecated. Existing analyses should migrate to getAuthor().time or getCommitter().time. I suspect everyone should use getCommitter().time.

Addresses concerns on mauricioaniche#100. This version still maintains two copies of a ChangeSet, one in the Commit and one in the list of ChangeSets that survived the filter. This is because the SCM interface doesn't have a way to pass in (and save a reference to) a Changeset. Just the Id is passed at the moment.

Per comments on mauricioaniche#100, don't duplicate the ChangeSet when creating a Commit. This is an optimization. Add an SCM.getCommit(ChangeSet cs) API. Refactor RepositoryMining to use this API. Deprecate the SCM.getCommit(String id) API.

Problem: We currently construct all ChangeSets in memory and then process each one. We maintain pointers to all of the ChangeSets during this process, so the overall memory footprint only increases due to the CommitVisitors. See RepositoryMining.processRepo. The "full-featured ChangeSet" PR increases the size of each ChangeSet, and there is concern that this may lead to OOM issues on small machines. Optimization: Discard the pointer to each ChangeSet after we process it. If a CommitVisitor does not save the Commits it visits, then a ChangeSet can be GC'd after a worker processes it. Longer-term suggestions: 1. If we stream ChangeSets instead of constructing them up front, this concern goes away entirely. 2. If we make the amount of data stored in a ChangeSet tunable, the old ChangeSet memory footprint can still be achieved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make ChangeSet more full-featured #100

Make ChangeSet more full-featured #100

Commits on Oct 19, 2017

Commits on Oct 20, 2017

Commits on Oct 24, 2017

Commits on Oct 26, 2017

Commits on Nov 1, 2017

Commits on Dec 16, 2017

Make ChangeSet more full-featured #100

Are you sure you want to change the base?

Make ChangeSet more full-featured #100

Commits on Oct 19, 2017

Commits on Oct 20, 2017

Commits on Oct 24, 2017

Commits on Oct 26, 2017

Commits on Nov 1, 2017

Commits on Dec 16, 2017