Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Austenem/CAT-960 Fix EPIC redirects #3583

Merged
merged 10 commits into from
Oct 29, 2024
Merged

Conversation

austenem
Copy link
Collaborator

Summary

Update support entity and processed/component dataset redirects to go only to the appropriate primary datasets. If the primary dataset does not exist, a 404 error will be shown. This addresses an issue that caused non-unified pages for EPIC datasets to be shown.

Design Documentation/Original Tickets

CAT-960 Jira ticket

Testing

Unit tested helper function and manually checked for regressions in redirects with EPIC, centrally processed, component, and primary datasets. There are currently no EPIC datasets with a parent processed dataset on dev that can be used to check the success case for this update.

Once EPICs fitting this case are released, this update should be tested again.

Note: the EPIC dataset linked in the Jira ticket (0739bdc773cd2c294e5bdb3fed1b4cb4) is missing a primary dataset ancestor, and so cannot be used to test this update. It can, however, be used to test the error case, which is successful (previously the page redirected to a processed dataset ancestor, and it now redirects to a 404 page).

Checklist

  • Code follows the project's coding standards
    • Lint checks pass locally
    • New CHANGELOG-your-feature-name-here.md is present in the root directory, describing the change(s) in full sentences.
  • Unit tests covering the new feature have been added
  • All existing tests pass
  • Any relevant documentation in JIRA/Confluence has been updated to reflect the new feature
  • Any new functionalities have appropriate analytics functionalities added

Copy link
Collaborator

@john-conroy john-conroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job always including tests!

Comment on lines 86 to 108
def find_earliest_dataset_ancestor(client, uuid):
dataset = client.get_entities(
'datasets',
query_override={
"bool": {
"must": {
"term": {
"uuid": uuid
}
}
}
},
non_metadata_fields=['hubmap_id', 'uuid', 'immediate_ancestors', 'entity_type']
)

# If no dataset is found or it has no ancestors, return the current dataset UUID
if not dataset or not dataset[0].get('immediate_ancestors'):
return uuid

# Traverse through immediate ancestors to find the earliest dataset ancestor
for ancestor in dataset[0]['immediate_ancestors']:
if ancestor.get('entity_type') == 'Dataset':
uuid = find_earliest_dataset_ancestor(client, ancestor.get('uuid'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't depend on immediate_ancestors since it's somewhat deprecated and we should also avoid recursion and multiple requests if possible.

Can we use a query that returns the dataset which has processing === 'raw', does not have the field ancestor_counts.entity_type.Dataset, and its uuid is included in the given dataset's ancestors_ids? Let me know if you need help drafting the query.

@austenem
Copy link
Collaborator Author

I spent a bit of time working on updating the util tests, and then realized that style of testing (mocking the client response) isn't applicable to this new approach. I removed the tests, but if there's a better approach/a way to verify this query works apart from manually testing redirects - which look fine so far - I'd be interested in it!

@john-conroy
Copy link
Collaborator

You could test the redirect with a couple of e2e tests.

redirectedFromPipeline=entity.get('pipeline')))
# Check whether the oldest found ancestor exists and is of an expected type. 404 is
# preferable to a page that shows only a support entity or processed/component dataset.
if not earliest_dataset or should_redirect_entity(earliest_dataset[0]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to call should_redirect_entity again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally to check that the retrieved entity was not actually a processed/support dataset - with the updated query, this shouldn't be necessary.

Copy link
Collaborator

@john-conroy john-conroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@austenem austenem merged commit 236afbf into main Oct 29, 2024
8 checks passed
@austenem austenem deleted the austenem/cat-960-fix-epic-redirects branch October 29, 2024 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants