Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to preserve sequence number generators #34

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

JQuezada0
Copy link
Contributor

No description provided.

src/main.rs Outdated
Comment on lines 356 to 372
fn current_sequence_value_query(&self, sequence: &Sequence) -> String {
format!(
r#"SELECT COALESCE(MAX("{column_name}"), 1)::text FROM {schema}."{table_name}""#,
schema = &self.schema,
column_name = sequence.column_name,
table_name = &self.name,
)
}
fn restore_sequences_query(&self, sequence: &Sequence, current_sequence_value: u64) -> String {
format!(
r#"SELECT SETVAL('{schema}."{sequence_name}"', {current_sequence_value}) FROM {schema}."{table_name}""#,
schema = &self.schema,
sequence_name = sequence.sequence_name,
current_sequence_value = current_sequence_value,
table_name = &self.name,
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the SqlString trait here to escape, plus a couple of other things:

Suggested change
fn current_sequence_value_query(&self, sequence: &Sequence) -> String {
format!(
r#"SELECT COALESCE(MAX("{column_name}"), 1)::text FROM {schema}."{table_name}""#,
schema = &self.schema,
column_name = sequence.column_name,
table_name = &self.name,
)
}
fn restore_sequences_query(&self, sequence: &Sequence, current_sequence_value: u64) -> String {
format!(
r#"SELECT SETVAL('{schema}."{sequence_name}"', {current_sequence_value}) FROM {schema}."{table_name}""#,
schema = &self.schema,
sequence_name = sequence.sequence_name,
current_sequence_value = current_sequence_value,
table_name = &self.name,
)
}
fn current_sequence_value_query(&self, sequence: &Sequence) -> String {
format!(
r#"SELECT COALESCE(MAX({column_name}), 1)::text FROM {qualified_table_name}"#,
column_name = sequence.column_name.sql_identifier(),
qualified_table_name = self.sql_identifier(),
)
}
fn restore_sequences_query(&self, sequence: &Sequence, current_sequence_value: u64) -> String {
format!(
r#"SELECT SETVAL('{schema}.{sequence_name}', {current_sequence_value}) FROM {qualified_table_name}"#,
schema = &self.schema.sql_identifier(),
sequence_name = sequence.sequence_name.sql_identifier(),
qualified_table_name = self.sql_identifier(),
)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to get a sequence's last value by querying it like a table. Also, we can use sql_value to construct the argument to SETVAL.

Suggested change
fn current_sequence_value_query(&self, sequence: &Sequence) -> String {
format!(
r#"SELECT COALESCE(MAX("{column_name}"), 1)::text FROM {schema}."{table_name}""#,
schema = &self.schema,
column_name = sequence.column_name,
table_name = &self.name,
)
}
fn restore_sequences_query(&self, sequence: &Sequence, current_sequence_value: u64) -> String {
format!(
r#"SELECT SETVAL('{schema}."{sequence_name}"', {current_sequence_value}) FROM {schema}."{table_name}""#,
schema = &self.schema,
sequence_name = sequence.sequence_name,
current_sequence_value = current_sequence_value,
table_name = &self.name,
)
}
fn current_sequence_value_query(&self, sequence: &Sequence) -> String {
let qualified_sequence_name = format!(
"{}.{}",
self.schema.sql_identifier(),
sequence.sequence_name.sql_identifier()
);
format!(r#"SELECT last_value::text FROM {qualified_sequence_name}"#,)
}
fn restore_sequences_query(&self, sequence: &Sequence, current_sequence_value: u64) -> String {
let qualified_sequence_name = format!(
"{}.{}",
self.schema.sql_identifier(),
sequence.sequence_name.sql_identifier()
);
format!(
r#"SELECT SETVAL({qualified_sequence_name_as_value}, {current_sequence_value}) FROM {qualified_table_name}"#,
qualified_sequence_name_as_value = qualified_sequence_name.sql_value(),
qualified_table_name = self.sql_identifier(),
)
}

https://www.postgresql.org/docs/12/functions-sequence.html

src/main.rs Outdated Show resolved Hide resolved
from pg_class sequences
join pg_depend ON pg_depend.objid = sequences.oid
where sequences.relkind = 'S'
) seqs ON seqs.tbl_id = pg_class.oid AND seqs.col_id = pg_attribute.attnum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgreSQL sequences don't have to be associated with a column. They're all free-standing, but can be "owned" by a table, but I think that only means that they're dropped when the table is dropped (see OWNED BY in the docs for CREATE SEQUENCE).

I think sequences should be considered as a separate top-level object, and not grouped in with tables. They exist in the same namespace, so the same include/exclude rules can be applied to select which sequences to restore.

Copy link
Contributor Author

@JQuezada0 JQuezada0 Apr 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kinda ties into why I didn't use last_value. I originally ran it against a few sequences and found that the last_value doesn't always line up with the max(col); sequences can be shared, can be depended upon by columns, and as you mentioned they can be owned by tables or not, so their scope may or may not be tied to a single table.

For both figuring out what the current value of the sequence should be, and which sequences actually even get their current value preserved, I stuck to the table as the source of truth with the understanding that what pg_parcel primarily slices data by is tables, and so to the extent that we care to copy a slice of table data then we have to care about restoring the value of sequences that each table depends on, but not necessarily other sequences not owned by or depended on by the targeted tables, or the values of the sequences that aren't pertinent to the table.

i.e. what happens if we intentionally skip a table, but then don't explicitly also skip any sequences that the table depends on? If we always restore the sequence value, then the sequence value would be current but the table would be empty, which might be unexpected? Should you always have to explicitly skip both at the same time?

But, that was an assumption I made, would be happy to change both of these with the understanding that pg_parcel restores all sequence values regardless of what particular slice of data is being dumped/restored (if passing the keep flag), it's separately configurable, and that is a deliberate choice.

Copy link
Contributor

@jelder jelder Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be fair to say that if we are dumping a table, we should implicitly also dump any associated sequences?

I think that maybe the ability to dump sequences that aren't connected to any (requested) tables should be considered a separate feature.

Copy link
Contributor Author

@JQuezada0 JQuezada0 Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how do we feel about two options, one called --keep-table-sequences which does what this PR adds, and then another option called --keep-all-sequences which does all of them?

Or, just one option called --sequences=(none | tables | all) which defaults to tables?

(conceptually; naming can be whatever)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the README, pg_parcel is:

A very minimal subset of pg_dump --data-only ...

From the pg_dump man page (emphasis mine):

--data-only

Dump only the data, not the schema (data definitions). Table data, large objects, and sequence values are dumped.

Ergo: pg_parcel should dump sequence values (and large objects, but that's not in scope here).

To elaborate, sequences are first-class entities within a schema, on a level with tables. They can be associated with a table but they don't belong to a table. A sequence's value is its data, just as rows are a table's data. I think it's conceptually sound to dump values for all sequences in the schema. In fact I think it's a bug or at least a missing feature that we didn't do that before, by default, and pg_parcel should do it now without any more optional command-line flags.

What are the consequences if we just do that? What are we trying to prevent by not doing it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants