-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add remaining GET endpoints to parity check #1847
Conversation
Deleting the existing parity check results can be slow as they can be very large records in the database. We don't care about callbacks or validation here, so we can optimise the deletion with `delete_all`.
We need to be able to request an individual resource as part of the parity check. The difficulty with this is that we need a valid ID for each lead provider. Instead of specifying these manually, I've opted to allow dynamic substitution by evaluating code from the YAML file. We can now specify an ID attribute in the path with `:id` and a corresponding option in the YAML file that, when evaluated (in the context of the `Client`) should return a valid id for the lead provider.
Review app deployed to https://npq-registration-review-1847-web.test.teacherservices.cloud/ |
a4b7ba3
to
218490f
Compare
9d0192a
to
2b08acf
Compare
Add all the listing and individual GET resource endpoints to the YAML file.
The CSV responses are very large and cause issues when trying to insert into Postgres. To be able to include them for now we are converting the response CSV to a hexdigest that can be stored and compared. If we want more details we will have to make the requests manually to inspect the responses.
In order to finish the parity check as soon as possible, we stop paginating responses as soon as we encounter a discrepency. A mismatch in a earlier page is likely to effect later pages anyway, so this is all we will be interested in (and other issues would be surfaced on future runs when its fixed).
2b08acf
to
885afa4
Compare
config/parity_check_endpoints.yml
Outdated
paginate: true | ||
|
||
"/api/v1/participants/npq/:id": | ||
id: 'User.includes(:applications).where(applications: { lead_provider: }).order("RANDOM()").limit(1).pick(:ecf_id)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make sure those are accepted
applications?
@@ -87,7 +91,11 @@ def url(app:) | |||
def formatted_path | |||
return path unless options[:id] && path.include?(":id") | |||
|
|||
path.sub(":id", eval(options[:id]).to_s) # rubocop:disable Security/Eval | |||
id = eval(options[:id]) # rubocop:disable Security/Eval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious about this, why wouldn't we able to find an id 🤔 ideally all should be there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so I put this in when I was finding some returning no results, but it was down to a bug in one of my queries; I'll remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, some minor comments 🙌
These cause issues with timeouts on the database; I'm not sure why as we hash the large CSV response so it shouldn't cause issues. Remove CSV endpoints; if we want to include them we can add another ticket to make them work.
As the id we grab for the lead provider is random, we need to persist it in the `ResponseComparison` to be able to replicate the request. Add `id` to `request_path` before saving (avoids adding a new attribute to `ResponseComparison`).
When we output the JSON diff we want to pretty format it or we end up with a single line which is hard to compare.
If the response is JSON we deep sort the resulting hash so that we can compare the contents irrespective of order (as the new serializers in NPQ reg may serialize the fields in a different order).
885afa4
to
ffb3de2
Compare
@formatted_path ||= begin | ||
return path unless options[:id] && path.include?(":id") | ||
|
||
path.sub(":id", eval(options[:id]).to_s) # rubocop:disable Security/Eval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing eval from the yaml file content, can we pass in instructions instead?
eg:
"/api/v1/npq-applications/:id":
id: 'random_lead_provider_ecf_id`
And then in Client
have method to translate to action:
def process_instruct(n)
case n
when "random_lead_provider_ecf_id"
lead_provider.applications.order("RANDOM()").limit(1).pick(:ecf_id)'
end
end
ecf_id = process_instruct(options[:id])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm somehow prefer it in the yaml file 🤔 otherwise we'll have a huge list in the client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer it out of the YAML file; easier to test, avoids eval
and less duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok fair enough, I can see it looks better 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for the meme
config/parity_check_endpoints.yml
Outdated
paginate: true | ||
|
||
"/api/v1/npq-applications/:id": | ||
id: 'lead_provider.applications.order("RANDOM()").limit(1).pick(:ecf_id)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible feature, option to run the single request multiple times? eg: "run_count: 100"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah one for the future maybe as this would have knock on effects to how we display/store the results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, i'm thinking we could do it now for free by duplicating the same URL like this:
/api/v1/npq-applications/:id#1
/api/v1/npq-applications/:id#2
/api/v1/npq-applications/:id#3
lol
Instead of evaluating the YAML contents directly, we can specify a method in the YAML file and call in the `Client`. This also has the benefit of being able to more easily test the possible `id` options.
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 tks @ethax-ross 👍
Jira-3617
Jira-3619
Context
We want to expand the parity check to include all other GET endpoints so that we can compare the results. This also involves supporting individual GET resource endpoints, which require a different
id
in the path for each provider.Changes proposed in this pull request
Deleting the existing parity check results can be slow as they can be very large records in the database. We don't care about callbacks or validation here, so we can optimise the deletion with
delete_all
.We need to be able to request an individual resource as part of the parity check. The difficulty with this is that we need a valid ID for each lead provider. Instead of specifying these manually, I've opted to allow dynamic substitution by evaluating code from the YAML file.
We can now specify an ID attribute in the path with
:id
and a corresponding option in the YAML file that, when evaluated (in the context of theClient
) should return a valid id for the lead provider.Add all the listing and individual GET resource endpoints to the YAML file.
The CSV responses are very large and cause issues when trying to insert into Postgres. To be able to include them for now we are converting the response CSV to a hexdigest that can be stored and compared. If we want more details we will have to make the requests manually to inspect the responses.
In order to finish the parity check as soon as possible, we stop paginating responses as soon as we encounter a discrepency. A mismatch in a earlier page is likely to effect later pages anyway, so this is all we will be interested in (and other issues would be surfaced on future runs when its fixed).
These cause issues with timeouts on the database; I'm not sure why as we hash the large CSV response so it shouldn't cause issues.
Remove CSV endpoints; if we want to include them we can add another ticket to make them work.
As the id we grab for the lead provider is random, we need to persist it in the
ResponseComparison
to be able to replicate the request.Add
id
torequest_path
before saving (avoids adding a new attribute toResponseComparison
).When we output the JSON diff we want to pretty format it or we end up with a single line which is hard to compare.
If the response is JSON we deep sort the resulting hash so that we can compare the contents irrespective of order (as the new serializers in NPQ reg may serialize the fields in a different order).
Guidance for review
Best reviewed by-commit.
I left the CSV hexdigest commit in as we will still need it if we bring the CSV endpoints in again (we return all results, which would be a really big diff/thing to put into the database without hashing it).