Austenem/CAT-960 Fix EPIC redirects #3583

austenem · 2024-10-28T20:35:36Z

Summary

Update support entity and processed/component dataset redirects to go only to the appropriate primary datasets. If the primary dataset does not exist, a 404 error will be shown. This addresses an issue that caused non-unified pages for EPIC datasets to be shown.

Design Documentation/Original Tickets

CAT-960 Jira ticket

Testing

Unit tested helper function and manually checked for regressions in redirects with EPIC, centrally processed, component, and primary datasets. There are currently no EPIC datasets with a parent processed dataset on dev that can be used to check the success case for this update.

Once EPICs fitting this case are released, this update should be tested again.

Note: the EPIC dataset linked in the Jira ticket (0739bdc773cd2c294e5bdb3fed1b4cb4) is missing a primary dataset ancestor, and so cannot be used to test this update. It can, however, be used to test the error case, which is successful (previously the page redirected to a processed dataset ancestor, and it now redirects to a 404 page).

Checklist

Code follows the project's coding standards
- Lint checks pass locally
- New CHANGELOG-your-feature-name-here.md is present in the root directory, describing the change(s) in full sentences.
Unit tests covering the new feature have been added
All existing tests pass
Any relevant documentation in JIRA/Confluence has been updated to reflect the new feature
Any new functionalities have appropriate analytics functionalities added

john-conroy

Great job always including tests!

john-conroy · 2024-10-28T20:58:31Z

context/app/utils.py

+def find_earliest_dataset_ancestor(client, uuid):
+    dataset = client.get_entities(
+        'datasets',
+        query_override={
+            "bool": {
+                "must": {
+                    "term": {
+                        "uuid": uuid
+                    }
+                }
+            }
+        },
+        non_metadata_fields=['hubmap_id', 'uuid', 'immediate_ancestors', 'entity_type']
+    )
+
+    # If no dataset is found or it has no ancestors, return the current dataset UUID
+    if not dataset or not dataset[0].get('immediate_ancestors'):
+        return uuid
+
+    # Traverse through immediate ancestors to find the earliest dataset ancestor
+    for ancestor in dataset[0]['immediate_ancestors']:
+        if ancestor.get('entity_type') == 'Dataset':
+            uuid = find_earliest_dataset_ancestor(client, ancestor.get('uuid'))


We shouldn't depend on immediate_ancestors since it's somewhat deprecated and we should also avoid recursion and multiple requests if possible.

Can we use a query that returns the dataset which has processing === 'raw', does not have the field ancestor_counts.entity_type.Dataset, and its uuid is included in the given dataset's ancestors_ids? Let me know if you need help drafting the query.

austenem · 2024-10-29T16:00:26Z

I spent a bit of time working on updating the util tests, and then realized that style of testing (mocking the client response) isn't applicable to this new approach. I removed the tests, but if there's a better approach/a way to verify this query works apart from manually testing redirects - which look fine so far - I'd be interested in it!

john-conroy · 2024-10-29T16:05:16Z

You could test the redirect with a couple of e2e tests.

john-conroy · 2024-10-29T18:15:50Z

context/app/routes_browse.py

-                        redirectedFromPipeline=entity.get('pipeline')))
+        # Check whether the oldest found ancestor exists and is of an expected type. 404 is
+        # preferable to a page that shows only a support entity or processed/component dataset.
+        if not earliest_dataset or should_redirect_entity(earliest_dataset[0]):


Why do we need to call should_redirect_entity again?

This was originally to check that the retrieved entity was not actually a processed/support dataset - with the updated query, this shouldn't be necessary.

john-conroy

Looks good to me!

austenem added 6 commits October 25, 2024 12:58

add recursive search for primary dataset redirect

ac10bce

add unit tests

0e42e62

continue redirect tests

1c9c285

add to tests and add error case

8cb9cc1

adjust docs

c8de748

add changelog

2c6124c

austenem requested review from NickAkhmetov and john-conroy October 28, 2024 20:35

john-conroy reviewed Oct 28, 2024

View reviewed changes

adjust query and remove util tests

79b2074

rename util function

6371c99

austenem requested a review from john-conroy October 29, 2024 16:03

add e2e tests

83a1760

john-conroy reviewed Oct 29, 2024

View reviewed changes

remove unnecessary entity type check

9334d2d

john-conroy approved these changes Oct 29, 2024

View reviewed changes

austenem merged commit 236afbf into main Oct 29, 2024
8 checks passed

austenem deleted the austenem/cat-960-fix-epic-redirects branch October 29, 2024 19:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Austenem/CAT-960 Fix EPIC redirects #3583

Austenem/CAT-960 Fix EPIC redirects #3583

austenem commented Oct 28, 2024

john-conroy left a comment

john-conroy Oct 28, 2024

austenem commented Oct 29, 2024

john-conroy commented Oct 29, 2024

john-conroy Oct 29, 2024

austenem Oct 29, 2024

john-conroy left a comment

Austenem/CAT-960 Fix EPIC redirects #3583

Austenem/CAT-960 Fix EPIC redirects #3583

Conversation

austenem commented Oct 28, 2024

Summary

Design Documentation/Original Tickets

Testing

Checklist

john-conroy left a comment

Choose a reason for hiding this comment

john-conroy Oct 28, 2024

Choose a reason for hiding this comment

austenem commented Oct 29, 2024

john-conroy commented Oct 29, 2024

john-conroy Oct 29, 2024

Choose a reason for hiding this comment

austenem Oct 29, 2024

Choose a reason for hiding this comment

john-conroy left a comment

Choose a reason for hiding this comment