Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ChangeSet more full-featured #100

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Commits on Oct 19, 2017

  1. Make ChangeSet more full-featured

    A Changeset now includes a message as well as author/committer objects
    modeled on JGit's PersonIdent.
    
    This is a breaking change for anyone who uses ChangeSet.
    The payoff is that on complex git-based repositories, users can now better handle the distributed nature of git.
    
    The time-based CommitRanges provided by RepoDriller now use the committer time when doing time comparisons.
    This is what the user generally intends: "the time at which this commit entered the repository".
    
    Addresses mauricioaniche#96.
    davisjam authored and Jamie Davis committed Oct 19, 2017
    Configuration menu
    Copy the full SHA
    95f9814 View commit details
    Browse the repository at this point in the history

Commits on Oct 20, 2017

  1. Configuration menu
    Copy the full SHA
    08d27cb View commit details
    Browse the repository at this point in the history
  2. Preserve existing behavior in pre-defined CommitRanges

    Address mauricioaniche's comments.
    
    Note: We are defaulting to "probably not the value the user wants".
    Is this a good idea?
    davisjam committed Oct 20, 2017
    Configuration menu
    Copy the full SHA
    4b88f68 View commit details
    Browse the repository at this point in the history
  3. Restore getTime to ChangeSet for compatibility

    Mark this deprecated.
    
    Existing analyses should migrate to getAuthor().time or getCommitter().time.
    I suspect everyone should use getCommitter().time.
    davisjam committed Oct 20, 2017
    Configuration menu
    Copy the full SHA
    6ba0cf8 View commit details
    Browse the repository at this point in the history

Commits on Oct 24, 2017

  1. Embed ChangeSet within Commit

    Addresses concerns on mauricioaniche#100.
    
    This version still maintains two copies of a ChangeSet, one in
    the Commit and one in the list of ChangeSets that survived the filter.
    
    This is because the SCM interface doesn't have a way to pass in
    (and save a reference to) a Changeset. Just the Id is passed at the moment.
    davisjam committed Oct 24, 2017
    Configuration menu
    Copy the full SHA
    91f3bee View commit details
    Browse the repository at this point in the history
  2. Optimization: Avoid ChangeSet duplication

    Per comments on mauricioaniche#100, don't duplicate the ChangeSet when creating a Commit.
    This is an optimization.
    
    Add an SCM.getCommit(ChangeSet cs) API.
    Refactor RepositoryMining to use this API.
    Deprecate the SCM.getCommit(String id) API.
    davisjam committed Oct 24, 2017
    Configuration menu
    Copy the full SHA
    78537b3 View commit details
    Browse the repository at this point in the history

Commits on Oct 26, 2017

  1. Configuration menu
    Copy the full SHA
    ee75977 View commit details
    Browse the repository at this point in the history

Commits on Nov 1, 2017

  1. Minimize time spent at max ChangeSet footprint

    Problem:
    We currently construct all ChangeSets in memory and then process each one.
    We maintain pointers to all of the ChangeSets during this process, so the
    overall memory footprint only increases due to the CommitVisitors.
    See RepositoryMining.processRepo.
    
    The "full-featured ChangeSet" PR increases the size of each ChangeSet,
    and there is concern that this may lead to OOM issues on small machines.
    
    Optimization:
    Discard the pointer to each ChangeSet after we process it.
    If a CommitVisitor does not save the Commits it visits, then
    a ChangeSet can be GC'd after a worker processes it.
    
    Longer-term suggestions:
    1. If we stream ChangeSets instead of constructing them up front, this
       concern goes away entirely.
    2. If we make the amount of data stored in a ChangeSet tunable, the
       old ChangeSet memory footprint can still be achieved.
    davisjam committed Nov 1, 2017
    Configuration menu
    Copy the full SHA
    1b7affc View commit details
    Browse the repository at this point in the history

Commits on Dec 16, 2017

  1. Configuration menu
    Copy the full SHA
    1c60d27 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a0ace05 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    921d9cf View commit details
    Browse the repository at this point in the history