Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MonthlyCommits is unsafe #96

Open
davisjam opened this issue Oct 18, 2017 · 1 comment
Open

MonthlyCommits is unsafe #96

davisjam opened this issue Oct 18, 2017 · 1 comment

Comments

@davisjam
Copy link
Contributor

davisjam commented Oct 18, 2017

Example: I cloned the popular libuv library and tried to extract monthly commits. I ended up with 16 commits even though the project has had regular commits since 2011 (expected commits = 6 years x 12 months/year = 72 commits).

Explanation: The date being used by RepoDriller uses the author commit time rather than the committer commit time. Suppose someone forks libuv and adds commits. Years later they submit a PR with the fix. Taking a real example, suppose a commit authored in 2015 was merged in 2017. When MonthlyCommits encounters this commit, it will accept it as "at least X months before" the last commit (which was in 2017). This 2015 commit will then cause all of the remaining commits from 2017 and 2016 to be skipped.

Suggested fix: Add both author and committer time fields to ChangeSet. Use the committer time in MonthlyCommits.

Initial exploration: Changing from getAuthorIdent to getCommitterIdent in GitRepository.convertToDate seems to do the right thing: 16 -> 77 commits found, and the timestamps are well-ordered.

Thanks @ayaankazerouni for pointing out the difference between author and committer idents.

@davisjam
Copy link
Contributor Author

davisjam commented Oct 18, 2017

Almost done. @mauricioaniche I'm extending ChangeSet to include message and author/committer objects modeled on JGit's PersonIdent. This will break anyone who uses ChangeSet. Concerns?

See the branch MoreFilters in my fork if you want a sneak preview. Still fixing a test failure.

davisjam added a commit to davisjam/repodriller that referenced this issue Oct 18, 2017
A Changeset now includes a message as well as author/committer objects
modeled on JGit's PersonIdent.

This is a breaking change for anyone who uses ChangeSet.
The payoff is that on complex git-based repositories, users can now better handle the distributed nature of git.

The time-based CommitRanges provided by RepoDriller now use the committer time when doing time comparisons.
This is what the user generally intends: "the time at which this commit entered the repository".

Addresses mauricioaniche#96.
davisjam added a commit to davisjam/repodriller that referenced this issue Oct 19, 2017
A Changeset now includes a message as well as author/committer objects
modeled on JGit's PersonIdent.

This is a breaking change for anyone who uses ChangeSet.
The payoff is that on complex git-based repositories, users can now better handle the distributed nature of git.

The time-based CommitRanges provided by RepoDriller now use the committer time when doing time comparisons.
This is what the user generally intends: "the time at which this commit entered the repository".

Addresses mauricioaniche#96.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant