Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remaining GET endpoints to parity check #1847

Merged
merged 11 commits into from
Oct 25, 2024

Conversation

ethax-ross
Copy link
Collaborator

@ethax-ross ethax-ross commented Oct 23, 2024

Jira-3617
Jira-3619

Context

We want to expand the parity check to include all other GET endpoints so that we can compare the results. This also involves supporting individual GET resource endpoints, which require a different id in the path for each provider.

Changes proposed in this pull request

  • Optimise the ResponseComparison deletion on re-run

Deleting the existing parity check results can be slow as they can be very large records in the database. We don't care about callbacks or validation here, so we can optimise the deletion with delete_all.

  • Support checking an individual resource

We need to be able to request an individual resource as part of the parity check. The difficulty with this is that we need a valid ID for each lead provider. Instead of specifying these manually, I've opted to allow dynamic substitution by evaluating code from the YAML file.

We can now specify an ID attribute in the path with :id and a corresponding option in the YAML file that, when evaluated (in the context of the Client) should return a valid id for the lead provider.

  • Add remaining GET endpoints to YAML file

Add all the listing and individual GET resource endpoints to the YAML file.

  • Convert CSV responses to hexdigest

The CSV responses are very large and cause issues when trying to insert into Postgres. To be able to include them for now we are converting the response CSV to a hexdigest that can be stored and compared. If we want more details we will have to make the requests manually to inspect the responses.

  • Stop paginating on first match

In order to finish the parity check as soon as possible, we stop paginating responses as soon as we encounter a discrepency. A mismatch in a earlier page is likely to effect later pages anyway, so this is all we will be interested in (and other issues would be surfaced on future runs when its fixed).

  • Remove CSV endpoints

These cause issues with timeouts on the database; I'm not sure why as we hash the large CSV response so it shouldn't cause issues.

Remove CSV endpoints; if we want to include them we can add another ticket to make them work.

  • Persist formatted path to ResponseComparison

As the id we grab for the lead provider is random, we need to persist it in the ResponseComparison to be able to replicate the request.

Add id to request_path before saving (avoids adding a new attribute to ResponseComparison).

  • Pretty format JSON in diff

When we output the JSON diff we want to pretty format it or we end up with a single line which is hard to compare.

  • Deep sort responses

If the response is JSON we deep sort the resulting hash so that we can compare the contents irrespective of order (as the new serializers in NPQ reg may serialize the fields in a different order).

Guidance for review

Best reviewed by-commit.

I left the CSV hexdigest commit in as we will still need it if we bring the CSV endpoints in again (we return all results, which would be a really big diff/thing to put into the database without hashing it).

Results Results (expanded) Updated diff
screencapture-0-0-0-0-3000-npq-separation-migration-parity-checks-2024-10-23-13_22_16 screencapture-0-0-0-0-3000-npq-separation-migration-parity-checks-2024-10-23-13_22_36 screencapture-0-0-0-0-3000-npq-separation-migration-parity-checks-response-comparisons-15-2024-10-23-16_14_57

Deleting the existing parity check results can be slow as they can be very
large records in the database. We don't care about callbacks or validation
here, so we can optimise the deletion with `delete_all`.
We need to be able to request an individual resource as part of the parity
check. The difficulty with this is that we need a valid ID for each lead
provider. Instead of specifying these manually, I've opted to allow dynamic
substitution by evaluating code from the YAML file.

We can now specify an ID attribute in the path with `:id` and a corresponding
option in the YAML file that, when evaluated (in the context of the `Client`)
should return a valid id for the lead provider.
Copy link

Add all the listing and individual GET resource endpoints to the YAML file.
The CSV responses are very large and cause issues when trying to insert into
Postgres. To be able to include them for now we are converting the response CSV
to a hexdigest that can be stored and compared. If we want more details we will
have to make the requests manually to inspect the responses.
In order to finish the parity check as soon as possible, we stop paginating
responses as soon as we encounter a discrepency. A mismatch in a earlier page
is likely to effect later pages anyway, so this is all we will be interested in
(and other issues would be surfaced on future runs when its fixed).
paginate: true

"/api/v1/participants/npq/:id":
id: 'User.includes(:applications).where(applications: { lead_provider: }).order("RANDOM()").limit(1).pick(:ecf_id)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make sure those are accepted applications?

@@ -87,7 +91,11 @@ def url(app:)
def formatted_path
return path unless options[:id] && path.include?(":id")

path.sub(":id", eval(options[:id]).to_s) # rubocop:disable Security/Eval
id = eval(options[:id]) # rubocop:disable Security/Eval
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious about this, why wouldn't we able to find an id 🤔 ideally all should be there?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so I put this in when I was finding some returning no results, but it was down to a bug in one of my queries; I'll remove this

Copy link
Collaborator

@cwrw cwrw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, some minor comments 🙌

These cause issues with timeouts on the database; I'm not sure why as we hash
the large CSV response so it shouldn't cause issues.

Remove CSV endpoints; if we want to include them we can add another ticket to
make them work.
As the id we grab for the lead provider is random, we need to persist it in the
`ResponseComparison` to be able to replicate the request.

Add `id` to `request_path` before saving (avoids adding a new attribute to
`ResponseComparison`).
When we output the JSON diff we want to pretty format it or we end up with a
single line which is hard to compare.
If the response is JSON we deep sort the resulting hash so that we can compare
the contents irrespective of order (as the new serializers in NPQ reg may
serialize the fields in a different order).
@formatted_path ||= begin
return path unless options[:id] && path.include?(":id")

path.sub(":id", eval(options[:id]).to_s) # rubocop:disable Security/Eval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing eval from the yaml file content, can we pass in instructions instead?
eg:

  "/api/v1/npq-applications/:id":
    id: 'random_lead_provider_ecf_id`

And then in Client have method to translate to action:

def process_instruct(n)
  case n
  when "random_lead_provider_ecf_id"
    lead_provider.applications.order("RANDOM()").limit(1).pick(:ecf_id)'
  end
end

ecf_id = process_instruct(options[:id])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm somehow prefer it in the yaml file 🤔 otherwise we'll have a huge list in the client

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer it out of the YAML file; easier to test, avoids eval and less duplication

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fair enough, I can see it looks better 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for the meme

paginate: true

"/api/v1/npq-applications/:id":
id: 'lead_provider.applications.order("RANDOM()").limit(1).pick(:ecf_id)'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible feature, option to run the single request multiple times? eg: "run_count: 100"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah one for the future maybe as this would have knock on effects to how we display/store the results

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, i'm thinking we could do it now for free by duplicating the same URL like this:

  • /api/v1/npq-applications/:id#1
  • /api/v1/npq-applications/:id#2
  • /api/v1/npq-applications/:id#3

lol

Instead of evaluating the YAML contents directly, we can specify a method in
the YAML file and call in the `Client`. This also has the benefit of being able
to more easily test the possible `id` options.
Copy link

sonarcloud bot commented Oct 24, 2024

Copy link
Contributor

@leandroalemao leandroalemao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 tks @ethax-ross 👍

@ethax-ross ethax-ross added this pull request to the merge queue Oct 25, 2024
Merged via the queue into main with commit 37ca1c6 Oct 25, 2024
18 checks passed
@ethax-ross ethax-ross deleted the 3617-support-get-single-resource branch October 25, 2024 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants