Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete Aardvark field method refactor #199

Merged
merged 3 commits into from
Jul 19, 2024

Conversation

ehanson8
Copy link
Contributor

Purpose and background context

Refactors remaining Aardvark field methods to align with the conventions developed since this class was initially created.

How can a reviewer manually see the effects of these changes?

Run the following command to see that the Aardvark class transform still transforms a source file:

pipenv run transform -o output/gismit.json -i  tests/fixtures/aardvark/aardvark_record_all_fields.jsonl -s gismit

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:
* Update Aardvark class to use current field method conventions for the remaining fields

How this addresses that need:
* Add field methods and associated private methods for format, identifiers, languages, links, locations, notes, provider, publishers, rights, subjects, and summary
* Refactor get_links and get_source_link method to avoid error if links field is None
* Update unit tests with current conventions

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-285
Comment on lines +378 to +379
@classmethod
def get_rights(cls, source_record: dict, source: str) -> list[timdex.Rights] | None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should be broken into submethods but I wanted to get your thoughts to see if it's worth it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate to say it, but I think I'd agree. I could see methods like:

  • get_access_rights()
  • get_license_rights()
  • get_rights_and_rights_holders()
  • get_gismit_rights() + get_gisogm_rights() (and perhaps these triggered by what source logic)

An intersting one for sure though, it's like a big collapsing of multiple fields into one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep down, I knew it had to be done 🙃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 Nice-nice

Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's looking great! Left a couple of comments, and agree with your suggestion about breaking get_rights() into sub-methods. But feel comfortable approving with any follow-up changes on these fronts.

Comment on lines 58 to 61
if links := cls.get_links(source_record):
url_links = [link for link in links if link.kind == "Website"]
if len(url_links) == 1:
return url_links[0].url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good reminder on how fuzzy things get over time... I don't think we'd ever expect to have multiple kind=Website coming back from gismit or gisogm records?

If we did, then we might want another scenario handler for when len() > 1, but if not, I think it's good as-is.

UPDATE: keeping comment, but after looking more closely at get_links(), I think we're okay, given that kind=Website is only set for a single possible key in the dct_references_s JSON/dictionary values.

Comment on lines +378 to +379
@classmethod
def get_rights(cls, source_record: dict, source: str) -> list[timdex.Rights] | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate to say it, but I think I'd agree. I could see methods like:

  • get_access_rights()
  • get_license_rights()
  • get_rights_and_rights_holders()
  • get_gismit_rights() + get_gisogm_rights() (and perhaps these triggered by what source logic)

An intersting one for sure though, it's like a big collapsing of multiple fields into one.

* Create submethods for get_rights
* Rename get_dates submethods for consistency
@ehanson8 ehanson8 requested a review from ghukill July 16, 2024 15:48
@ehanson8
Copy link
Contributor Author

Prompted a little more refactoring than expected so re-requesting review

Comment on lines +380 to +389
rights: list[timdex.Rights] = []
kind_access_to_files = "Access to files"
rights.extend(cls._get_access_rights(source_record))
rights.extend(cls._get_license_rights(source_record))
rights.extend(cls._get_rights_and_rights_holders(source_record))
if source == "gisogm":
rights.extend(cls._get_gisogm_rights(kind_access_to_files))
elif source == "gismit":
rights.extend(cls._get_gismit_rights(source_record, kind_access_to_files))
return rights or None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 Looks great! The sub-methods are clear, this higher level method is clear, easy to follow.

Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I had one non-blocking suggested syntax change, but feel free to ignore if you find that it is less readable. 🤓

if len(url_links) == 1:
return url_links[0].url
if links := cls.get_links(source_record):
url_links = [link for link in links if link.kind == "Website"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Non-blocking] Could be updated to:

if (links := cls.get_links(source_record)) and (url_links := [link for link in links if link.kind == "Website"]
):
   return url_links[0].url

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, thanks!

* Refactor get_source_link method
* Update dependencies to address new safety vulnerability
* Updates to address linting errors
* Add XMLTransformer.record_is_deleted unit test
@ehanson8 ehanson8 merged commit 0e67da5 into main Jul 19, 2024
5 checks passed
@ehanson8 ehanson8 deleted the TIMX-285-aardvark-fmr-the-wrong-trousers branch July 19, 2024 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants