Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tableExists() method #32510

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

NimzyMaina
Copy link

Fix for tableExists() method that causes a Spanner Change Stream consumer to be unable to recover from a restart.

fixes #32509


See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @kennknowles for label java.
R: @damondouglas for label io.
R: @nielm for label spanner.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

Reminder, please take a look at this pr: @kennknowles @damondouglas @nielm

+ "WHERE t.table_catalog = '' AND "
+ "t.table_schema = '' AND "
+ "t.table_name = '"
+ "WHERE t.table_name = '"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of removing the filtering altogether can you fork to code depending on this.isPostgres() (see getPartition below for an example)?

For GoogleSQL (else) you can leave the query as is.
For Postgres simply remove t.table_catalog and only keep t.table_schema = "public"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dedocibula

Okay. If we go down that approach, then we need a way of specifying the metadata table schema name into the options as "public" is just the default one. Someone can specify a custom table_schema as this is the Postgres Dialect. What are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, what you are referring to are named schemas (https://cloud.google.com/spanner/docs/named-schemas). I believe that can be addressed in a separate issue as it has to be handled for both dialects and tested. I would keep the scope of this fix to the Postgres regression.

Today's Cloud Spanner Postgres syntax will allow to create "table" or "public"."table" -> both will be added to default/public schema. Anything else such as "schema"."table" will require named schema creation so my proposal should be sufficient to unblock this use case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dedocibula not sure what to do with the tests due to the fork. Please guide on that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dedocibula please advice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, missed this. So it seems there are two test files in which you could add this:

  1. https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dao/PartitionMetadataDaoTest.java
    This has mostly unit tests which currently only run under GoogleSQL dialect (see setUp). We could probably ask parallel tests here for Postgres, that said the actual engine evaluating these is mocked out so the only thing that comes to mind is to add a verification that the transaction is invoked with a working SQL - partial example

  2. https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/it/SpannerChangeStreamPostgresIT.java
    This is e2e integration test which will actually run a pipeline. It should be possible to add another test case that runs two pipelines in sequence using the same parameters although I feel like for this type of change it might be bit excessive. I would suggest starting with the first one

Copy link
Contributor

github-actions bot commented Oct 4, 2024

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @Abacn for label java.
R: @johnjcasey for label io.
R: @nielm for label spanner.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

Copy link
Contributor

Reminder, please take a look at this pr: @Abacn @johnjcasey @nielm

Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @damondouglas for label java.
R: @chamikaramj for label io.
R: @nielm for label spanner.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

Copy link
Contributor

Reminder, please take a look at this pr: @damondouglas @chamikaramj @nielm

@nielm
Copy link
Contributor

nielm commented Oct 25, 2024

waiting on author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Unable to Restart Google Spanner Change Streams Consumer due to tableExists(table_name) bug
3 participants