-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add and update commit frequency metrics #173
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks really neat; kudos! I left two comments, only one of which (the count_files()
one) I think needs to be considered.
@@ -213,3 +204,32 @@ def find_and_read_file(repo: pygit2.Repository, filename: str) -> Optional[str]: | |||
|
|||
# Decode and return content as a string | |||
return blob_data.decode(detect_encoding(blob_data)) | |||
|
|||
|
|||
def count_files(tree: Union[pygit2.Tree, pygit2.Blob]) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, I think this is a great use of recursion!
Now, why the Union
type for the tree
argument? It looks like it'll only ever be called recursively with pygit2.Tree
objects, and if someone were to pass a pygit2.Blob
as the initial element it looks like the function would attempt to iterate over it first, which I'm not sure is a defined operation for that type.
I could see a version of this function where you first check if tree
is a Blob
and return 1 (presumably because a blob could be thought of as a tree with 1 element), or a version that only takes a Tree
and deals with Blob
s non-recursively as you're doing now.
repo_path: pathlib.Path, | ||
files: list[dict], | ||
branch_name: str = "main", | ||
dates: Optional[list[datetime]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty neat that you can build up a commit history in a test git repo using just this function!
Since files
is now a list of commits which contains a dict of files per commit, I wonder if it makes sense to change the name, e.g. to commits
?
This one's debatable and may be more work than it's worth, but I see that dates
is a parallel array that gets zipped with the files
to produce dates for each commit. Perhaps it might make sense to have the commit dates be inlined in the files
data structure, e.g. something like:
commits = [
{'commit_date': Optional[datetime] = None, 'files': dict }, ...
]
where the current date is used for each if commit_date
wasn't provided?
FWIW, I'm fine keeping this all as-is, since in your test cases it seems you're mostly adding files in a single commit and my suggestions above would make those test cases more verbose.
Description
This PR adds metrics surrounding commits and commit frequency. Along the journey towards this work I found that some functionality wasn't working as expected (or maybe wasn't labeled clearly in the context of these changes). As a result I took time to fix things and evolve the work towards consistency. This also included making modifications to a few tests and the
repo_setup
function, which is becoming more and more important to creating and implementing tests for these changes.For commit frequency I used
commits per day
as a calculation to help avoid inconsistencies when it comes to weeks, months and years. I could also see how it might be better to perform a different type of calculation here, let me know if something seems more useful here.Closes #157
Closes #158
What is the nature of your change?
Checklist
Please ensure that all boxes are checked before indicating that this pull request is ready for review.