Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cost (for methods cromwell), add explicit example #278

Merged
merged 5 commits into from
Apr 2, 2024
Merged

Conversation

sjfleming
Copy link
Member

I didn't have a clue how to get cromshell cost to work. @SHuang-Broad helped me out.

I added the information I needed to the README.md so others can get it to work.

There also seemed to be a bug in cromshell cost: the code expected a table column called partition_time but the table here (Methods cromwell server v77 location)
https://console.cloud.google.com/bigquery?project=broad-dsde-methods&organizationId=548622027621&j=bq:US:bquxjob_85a71e8_16fa747f47d&page=queryresults&ws=!1m10!1m4!4m3!1sbroad-dsde-methods!2sMethods_billing_dump!3sgcp_billing_export_v1_009C7D_923007_219A6F!1m4!1m3!1sbroad-dsde-methods!2sbquxjob_85a71e8_16fa747f47d!3sUS
has a column called export_time instead.

When I changed the code from partition_time to export_time, I was able to get the command to work and see my job costs.

@sjfleming sjfleming added Cromshell 2 Issues related to Cromshell 2.0 bug Something isn't working labels Jan 19, 2024
@sjfleming
Copy link
Member Author

The test errors I'm getting appear to be #277

The github actions seem to be failing when installing the repo, unless I'm misunderstanding the logs.

@sjfleming sjfleming marked this pull request as ready for review January 19, 2024 18:10
README.md Outdated
{
"cromwell_server": "https://cromwell-v77.dsde-methods.broadinstitute.org/",
"requests_timeout": 5,
"bq_cost_table": "Methods_billing_dump.gcp_billing_export_v1_009C7D_923007_219A6F"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer we hide these values from the open internet, just in case...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so how about the following:
"<bq_dataset_hosting_the_billing_datatable>.<the_billing_datatable>"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that if we need to yeah

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(But if there's no risk in exposing these, I'd like the example to be 100% explicit. Because I never would have foudn those values.
So... is there a risk?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or... better... if we included directions for how to find that table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't know what those directions are

@@ -182,15 +182,15 @@ def create_bq_query(detailed: bool, bq_cost_table: str) -> str:
AND task.key LIKE "wdl-task-name"
AND wfid.key LIKE "cromwell-workflow-id"
AND wfid.value like @workflow_id
AND partition_time BETWEEN @start_date AND @end_date
AND export_time BETWEEN @start_date AND @end_date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at our cost tabe and do see this key in it, instead of the partition_time key. But I wonder why it was partition_time at the beginning, starting from the shell version. Did Google change things underneath us?

export_time is documented here, but not partition_time
https://cloud.google.com/billing/docs/how-to/export-data-bigquery-tables/detailed-usage

And confusing the situation, even more, are the two other *time values: usage_start_time and usage_end_time. And now I wonder, should we use them instead of export_time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good questions. I do remember some email about google changing things. I think this is what happened, but I can't verify it.

Not sure about usage_end_time versus export_time

@bshifaw
Copy link
Collaborator

bshifaw commented Feb 22, 2024

The unit/integration tests are failing for an unrelated reason and is being fixed here. Once that PR is merged, updating your branch from main should pass your tests

Copy link
Collaborator

@bshifaw bshifaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pulled in the changes to fix the previous git action test fails.
Now, just need to update the unit test script that checks the for the expected query. Update it the "export_time"
https://github.com/broadinstitute/cromshell/blob/f1344e208a8a0a1375651f70ce8e0261ff6747b9/tests/unit/test_cost.py#L26C21-L26C35

WHERE value LIKE @workflow_id AND partition_time BETWEEN @start_date AND @end_date

@sjfleming
Copy link
Member Author

I think it's ready pending @SHuang-Broad 's comment above about hiding the values from the README. Not sure what the consensus is there. I see Steve's point, but also if the example is less explicit, I would never be able to figure out how to use cost, because I don't know how to find the location of the bq table.

@SHuang-Broad
Copy link
Contributor

@sjfleming
How about this:

The URL for accessing the BigQuery cost table is formatted as 
https://console.cloud.google.com/bigquery?project=<your_gcp_billing_project>

where you can see the table like displayed in the following figure (actual project hidden)
Screenshot 2024-03-29 at 2 29 18 PM

Click on the table itself (in this case, gcp_billing_...) and open the "DETAILS" tab, you'll find the exact path to the table in the "Table ID" section.
Screenshot 2024-03-29 at 2 33 59 PM

@sjfleming
Copy link
Member Author

Thanks @SHuang-Broad !

Alright how about this README? (Tried to just use one picture, but I think these are super helpful.)

@SHuang-Broad
Copy link
Contributor

Looks good to me! Thanks for getting all these together!

@bshifaw bshifaw merged commit 9c1b901 into main Apr 2, 2024
6 checks passed
@bshifaw bshifaw deleted the sf_cost branch April 2, 2024 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cromshell 2 Issues related to Cromshell 2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants