Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gdt 54 aardvark transform #108

Merged
merged 5 commits into from
Dec 18, 2023
Merged

Gdt 54 aardvark transform #108

merged 5 commits into from
Dec 18, 2023

Conversation

ehanson8
Copy link
Contributor

Helpful background context

This is the first of several PRs to build out the Aardvark class with methods for each relevant TIMDEX field. I'm intending for this to be a template for the future PRs with examples of more complex methods (get_subjects) and calls in the get_optional_fields method for more straightforward values (languages).

How can a reviewer manually see the effects of these changes?

Updates to the CLI are needed before this can be tested manually.

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated or is unnecessary
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

Includes new or updated dependencies?

YES

Why these changes are being introduced:
* This is the initial structure for the Aardvark transform class. The class will be expanded with new methods in subsequent commits.

How this addresses that need:
* Add jsonlines to Pipfile
* Add fixtures for aardvark and generic JSONLines files
* Update argument type hinting for Transformer and JsonTransformer classes to clarify expected content types
* Update JsonTransformer.parse_source_file method to use jsonlines library
* Add Aardvark class with get_main_titles, get_source_record_id, record_is_deleted (in progress), get_optional_fields (in progress), and get_subjects methods and corresponding unit tests

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/GDT-54
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think it looks really good. I find it easy to follow the XML vs JSON transformer types. And while we may or may not move to more field-specific methods, I think breaking out complex ones like get_subjects is a good approach (and consistent with other transformers).

Left some questions, which I think are largely correlated with how closely we want this new Aaardvark transformer to be coupled with an "MITAardvark" record coming from GeoHarvester. Maybe that coupling is tightented through naming, maybe some is assumptions in data that will always be present. Did not explicitly approve or request changes, but curious your thoughts on that.

Otherwise, think looks like a great start and will setup nicely for the next fields to map.

transmogrifier/sources/transformer.py Show resolved Hide resolved
transmogrifier/sources/transformer.py Show resolved Hide resolved
transmogrifier/sources/json/aardvark.py Outdated Show resolved Hide resolved
transmogrifier/sources/json/aardvark.py Outdated Show resolved Hide resolved
transmogrifier/sources/json/aardvark.py Outdated Show resolved Hide resolved
transmogrifier/sources/json/aardvark.py Outdated Show resolved Hide resolved
transmogrifier/sources/json/aardvark.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hiya' Eric! Thank you for scoping out the skeleton for the MITAardvark transformer. I had a couple of questions!

Is my understanding correct that this is intended to be a "scaffolding"-level PR with future PRs (under the same ticket GDT-54) for the actual implementation? I plan to do a second pass after some clarification!

transmogrifier/sources/json/aardvark.py Outdated Show resolved Hide resolved
transmogrifier/sources/json/aardvark.py Outdated Show resolved Hide resolved
transmogrifier/sources/json/aardvark.py Outdated Show resolved Hide resolved
transmogrifier/sources/json/aardvark.py Show resolved Hide resolved
* Update json_records fixture to aardvark_records for more accurate unit tests
* Rename Aardvark > MITAardvark to unify terminology across repos
* Update get_main_titles method to reflect it is a required field
* Update Aardvark method docstrings to provide greater context
* Add Transformer._transform method to minimize code duplication between JsonTransformer and XmlTransformer methods
@ehanson8
Copy link
Contributor Author

@ghukill @jonavellecuerdo Pushed updates, let me know what you think!

Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

I left a comment about the docstring for Transformer._transform(), but it's just a suggestion, and not required. Approved with or without changes to that docstring.

transmogrifier/sources/json/aardvark.py Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
transmogrifier/sources/transformer.py Show resolved Hide resolved
@ehanson8 ehanson8 merged commit 1aba987 into main Dec 18, 2023
5 checks passed
@ehanson8 ehanson8 deleted the gdt-54-aardvark-transform branch December 18, 2023 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants