Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(deletes): bulk delete consumer #6510

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

Conversation

MeredithAnya
Copy link
Member

@MeredithAnya MeredithAnya commented Nov 5, 2024

context:
The work for bulk deleting in snuba has so far included the following:

what's left:
Now that we have the topics created we can finish up the consumer side.

  • Add the consumer logic to snuba (This PR)
  • Add the consumer deployment to S4S region in the ops repository
  • Set up datadog alerts/metrics and other observability

this PR:
It's a bit of a larger PR but it can be reviewed in a couple sections:

  • The main consumer logic and strategy
    • This has the logic to create the strategy factory for the consumer and composes the strategy steps. The strategy.py file has the details for actually executing the delete query
  • The batching.py file - I have an arroyo PR that makes this file obsolete but in the meantime I don't think it needs to block this PR since it will be easy to remove after
  • The formatters are going to be the only code that someone will need to write in the future when deploying the deletions consumer for a different storage. How one formats the conditions for the DELETE query is up to that logic.

Copy link

codecov bot commented Nov 5, 2024

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
2684 1 2683 5
View the top 1 failed tests by shortest run time
tests.web.rpc.v1.test_endpoint_trace_item_attribute_names.TestTraceItemAttributeNames test_basic
Stack Traces | 0.358s run time
Traceback (most recent call last):
  File ".../rpc/v1/test_endpoint_trace_item_attribute_names.py", line 133, in test_basic
    assert res.attributes == expected
AssertionError: assert [name: "a_tag_000"\ntype: TYPE_STRING\n, name: "a_tag_001"\ntype: TYPE_STRING\n, name: "a_tag_002"\ntype: TYPE_STRING\n, name: "a_tag_003"\ntype: TYPE_STRING\n, name: "a_tag_004"\ntype: TYPE_STRING\n, name: "a_tag_005"\ntype: TYPE_STRING\n, name: "a_tag_006"\ntype: TYPE_STRING\n, name: "a_tag_007"\ntype: TYPE_STRING\n, name: "a_tag_008"\ntype: TYPE_STRING\n, name: "a_tag_009"\ntype: TYPE_STRING\n, name: "a_tag_010"\ntype: TYPE_STRING\n, name: "a_tag_011"\ntype: TYPE_STRING\n, name: "a_tag_012"\ntype: TYPE_STRING\n, name: "a_tag_013"\ntype: TYPE_STRING\n, name: "a_tag_014"\ntype: TYPE_STRING\n, name: "a_tag_015"\ntype: TYPE_STRING\n, name: "a_tag_016"\ntype: TYPE_STRING\n, name: "a_tag_017"\ntype: TYPE_STRING\n, name: "a_tag_018"\ntype: TYPE_STRING\n, name: "a_tag_019"\ntype: TYPE_STRING\n, name: "a_tag_020"\ntype: TYPE_STRING\n, name: "a_tag_021"\ntype: TYPE_STRING\n, name: "a_tag_022"\ntype: TYPE_STRING\n, name: "a_tag_023"\ntype: TYPE_STRING\n, name: "a_tag_024"\ntype: TYPE_STRING\n, name: "a_tag_025"\ntype: TYPE_STRING\n, name: "a_tag_026"\ntype: TYPE_STRING\n, name: "a_tag_027"\ntype: TYPE_STRING\n, name: "a_tag_028"\ntype: TYPE_STRING\n, name: "a_tag_029"\ntype: TYPE_STRING\n, name: "consumer_group"\ntype: TYPE_STRING\n, name: "customer"\ntype: TYPE_STRING\n, name: "environment"\ntype: TYPE_STRING\n, name: "http.status_code"\ntype: TYPE_STRING\n, name: "sentry.category"\ntype: TYPE_STRING\n, name: "sentry.environment"\ntype: TYPE_STRING\n, name: "sentry.name"\ntype: TYPE_STRING\n, name: "sentry.op"\ntype: TYPE_STRING\n, name: "sentry.platform"\ntype: TYPE_STRING\n, name: "sentry.release"\ntype: TYPE_STRING\n, name: "sentry.sdk.name"\ntype: TYPE_STRING\n, name: "sentry.sdk.version"\ntype: TYPE_STRING\n, name: "sentry.segment.name"\ntype: TYPE_STRING\n, name: "sentry.segment_name"\ntype: TYPE_STRING\n, name: "sentry.service"\ntype: TYPE_STRING\n, name: "sentry.status"\ntype: TYPE_STRING\n, name: "sentry.status_code"\ntype: TYPE_STRING\n, name: "sentry.thread.id"\ntype: TYPE_STRING\n, name: "sentry.thread.name"\ntype: TYPE_STRING\n, name: "sentry.trace.status"\ntype: TYPE_STRING\n, name: "sentry.transaction.method"\ntype: TYPE_STRING\n, name: "sentry.transaction.op"\ntype: TYPE_STRING\n, name: "sentry.user"\ntype: TYPE_STRING\n, name: "thread.id"\ntype: TYPE_STRING\n, name: "thread.name"\ntype: TYPE_STRING\n] == [name: "a_tag_000"\ntype: TYPE_STRING\n,\n name: "a_tag_001"\ntype: TYPE_STRING\n,\n name: "a_tag_002"\ntype: TYPE_STRING\n,\n name: "a_tag_003"\ntype: TYPE_STRING\n,\n name: "a_tag_004"\ntype: TYPE_STRING\n,\n name: "a_tag_005"\ntype: TYPE_STRING\n,\n name: "a_tag_006"\ntype: TYPE_STRING\n,\n name: "a_tag_007"\ntype: TYPE_STRING\n,\n name: "a_tag_008"\ntype: TYPE_STRING\n,\n name: "a_tag_009"\ntype: TYPE_STRING\n,\n name: "a_tag_010"\ntype: TYPE_STRING\n,\n name: "a_tag_011"\ntype: TYPE_STRING\n,\n name: "a_tag_012"\ntype: TYPE_STRING\n,\n name: "a_tag_013"\ntype: TYPE_STRING\n,\n name: "a_tag_014"\ntype: TYPE_STRING\n,\n name: "a_tag_015"\ntype: TYPE_STRING\n,\n name: "a_tag_016"\ntype: TYPE_STRING\n,\n name: "a_tag_017"\ntype: TYPE_STRING\n,\n name: "a_tag_018"\ntype: TYPE_STRING\n,\n name: "a_tag_019"\ntype: TYPE_STRING\n,\n name: "a_tag_020"\ntype: TYPE_STRING\n,\n name: "a_tag_021"\ntype: TYPE_STRING\n,\n name: "a_tag_022"\ntype: TYPE_STRING\n,\n name: "a_tag_023"\ntype: TYPE_STRING\n,\n name: "a_tag_024"\ntype: TYPE_STRING\n,\n name: "a_tag_025"\ntype: TYPE_STRING\n,\n name: "a_tag_026"\ntype: TYPE_STRING\n,\n name: "a_tag_027"\ntype: TYPE_STRING\n,\n name: "a_tag_028"\ntype: TYPE_STRING\n,\n name: "a_tag_029"\ntype: TYPE_STRING\n,\n name: "http.status_code"\ntype: TYPE_STRING\n,\n name: "sentry.category"\ntype: TYPE_STRING\n,\n name: "sentry.name"\ntype: TYPE_STRING\n,\n name: "sentry.segment_name"\ntype: TYPE_STRING\n,\n name: "sentry.service"\ntype: TYPE_STRING\n]
  At index 30 diff: name: "consumer_group"\ntype: TYPE_STRING\n != name: "http.status_code"\ntype: TYPE_STRING\n
  Left contains 20 more items, first extra item: name: "sentry.environment"\ntype: TYPE_STRING\n
  Full diff:
    [
     name: "a_tag_000"
    type: TYPE_STRING
  - ,
  -  name: "a_tag_001"
  + , name: "a_tag_001"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_002"
  + , name: "a_tag_002"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_003"
  + , name: "a_tag_003"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_004"
  + , name: "a_tag_004"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_005"
  + , name: "a_tag_005"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_006"
  + , name: "a_tag_006"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_007"
  + , name: "a_tag_007"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_008"
  + , name: "a_tag_008"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_009"
  + , name: "a_tag_009"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_010"
  + , name: "a_tag_010"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_011"
  + , name: "a_tag_011"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_012"
  + , name: "a_tag_012"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_013"
  + , name: "a_tag_013"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_014"
  + , name: "a_tag_014"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_015"
  + , name: "a_tag_015"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_016"
  + , name: "a_tag_016"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_017"
  + , name: "a_tag_017"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_018"
  + , name: "a_tag_018"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_019"
  + , name: "a_tag_019"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_020"
  + , name: "a_tag_020"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_021"
  + , name: "a_tag_021"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_022"
  + , name: "a_tag_022"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_023"
  + , name: "a_tag_023"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_024"
  + , name: "a_tag_024"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_025"
  + , name: "a_tag_025"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_026"
  + , name: "a_tag_026"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_027"
  + , name: "a_tag_027"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_028"
  + , name: "a_tag_028"
  ? +
    type: TYPE_STRING
  - ,
  -  name: "a_tag_029"
  + , name: "a_tag_029"
  ? +
    type: TYPE_STRING
  + , name: "consumer_group"
  - ,
  -  name: "http.status_code"
    type: TYPE_STRING
  + , name: "customer"
  - ,
  -  name: "sentry.category"
    type: TYPE_STRING
  + , name: "environment"
  - ,
  -  name: "sentry.name"
    type: TYPE_STRING
  + , name: "http.status_code"
  - ,
  -  name: "sentry.segment_name"
    type: TYPE_STRING
  - ,
  + , name: "sentry.category"
  + type: TYPE_STRING
  + , name: "sentry.environment"
  + type: TYPE_STRING
  + , name: "sentry.name"
  + type: TYPE_STRING
  + , name: "sentry.op"
  + type: TYPE_STRING
  + , name: "sentry.platform"
  + type: TYPE_STRING
  + , name: "sentry.release"
  + type: TYPE_STRING
  + , name: "sentry.sdk.name"
  + type: TYPE_STRING
  + , name: "sentry.sdk.version"
  + type: TYPE_STRING
  + , name: "sentry.segment.name"
  + type: TYPE_STRING
  + , name: "sentry.segment_name"
  + type: TYPE_STRING
  -  name: "sentry.service"
  + , name: "sentry.service"
  ? +
  + type: TYPE_STRING
  + , name: "sentry.status"
  + type: TYPE_STRING
  + , name: "sentry.status_code"
  + type: TYPE_STRING
  + , name: "sentry.thread.id"
  + type: TYPE_STRING
  + , name: "sentry.thread.name"
  + type: TYPE_STRING
  + , name: "sentry.trace.status"
  + type: TYPE_STRING
  + , name: "sentry.transaction.method"
  + type: TYPE_STRING
  + , name: "sentry.transaction.op"
  + type: TYPE_STRING
  + , name: "sentry.user"
  + type: TYPE_STRING
  + , name: "thread.id"
  + type: TYPE_STRING
  + , name: "thread.name"
    type: TYPE_STRING
    ,
    ]

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

MeredithAnya added a commit that referenced this pull request Nov 5, 2024
Adds better typing to the delete pipeline as well separates some code
out from #6510. To avoid circular
imports, it's easier to have `bulk_delete_query` file almost import from
`delete_query`, and never have `delete_query` import from
`bulk_delete_query` (where the ConditionsType was before)
@MeredithAnya
Copy link
Member Author

For the batching file I added, I've also added it into arroyo itself: getsentry/arroyo#390 though I don't think that should be a blocker for this PR. Just was something I thought would clean up the duplicate work - if folks think it makes sense to add to arroyo

Copy link
Member

@onkar onkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Left a few comments and questions.

snuba/cli/lw_deletions_consumer.py Show resolved Hide resolved
snuba/cli/lw_deletions_consumer.py Show resolved Hide resolved
snuba/cli/lw_deletions_consumer.py Show resolved Hide resolved
snuba/consumers/consumer_builder.py Show resolved Hide resolved
snuba/lw_deletions/batching.py Show resolved Hide resolved
snuba/lw_deletions/formatters.py Outdated Show resolved Hide resolved
snuba/lw_deletions/formatters.py Outdated Show resolved Hide resolved
snuba/lw_deletions/strategy.py Show resolved Hide resolved
snuba/lw_deletions/strategy.py Outdated Show resolved Hide resolved
snuba/lw_deletions/strategy.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants