-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to preserve sequence number generators #34
base: main
Are you sure you want to change the base?
Conversation
src/main.rs
Outdated
fn current_sequence_value_query(&self, sequence: &Sequence) -> String { | ||
format!( | ||
r#"SELECT COALESCE(MAX("{column_name}"), 1)::text FROM {schema}."{table_name}""#, | ||
schema = &self.schema, | ||
column_name = sequence.column_name, | ||
table_name = &self.name, | ||
) | ||
} | ||
fn restore_sequences_query(&self, sequence: &Sequence, current_sequence_value: u64) -> String { | ||
format!( | ||
r#"SELECT SETVAL('{schema}."{sequence_name}"', {current_sequence_value}) FROM {schema}."{table_name}""#, | ||
schema = &self.schema, | ||
sequence_name = sequence.sequence_name, | ||
current_sequence_value = current_sequence_value, | ||
table_name = &self.name, | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the SqlString
trait here to escape, plus a couple of other things:
fn current_sequence_value_query(&self, sequence: &Sequence) -> String { | |
format!( | |
r#"SELECT COALESCE(MAX("{column_name}"), 1)::text FROM {schema}."{table_name}""#, | |
schema = &self.schema, | |
column_name = sequence.column_name, | |
table_name = &self.name, | |
) | |
} | |
fn restore_sequences_query(&self, sequence: &Sequence, current_sequence_value: u64) -> String { | |
format!( | |
r#"SELECT SETVAL('{schema}."{sequence_name}"', {current_sequence_value}) FROM {schema}."{table_name}""#, | |
schema = &self.schema, | |
sequence_name = sequence.sequence_name, | |
current_sequence_value = current_sequence_value, | |
table_name = &self.name, | |
) | |
} | |
fn current_sequence_value_query(&self, sequence: &Sequence) -> String { | |
format!( | |
r#"SELECT COALESCE(MAX({column_name}), 1)::text FROM {qualified_table_name}"#, | |
column_name = sequence.column_name.sql_identifier(), | |
qualified_table_name = self.sql_identifier(), | |
) | |
} | |
fn restore_sequences_query(&self, sequence: &Sequence, current_sequence_value: u64) -> String { | |
format!( | |
r#"SELECT SETVAL('{schema}.{sequence_name}', {current_sequence_value}) FROM {qualified_table_name}"#, | |
schema = &self.schema.sql_identifier(), | |
sequence_name = sequence.sequence_name.sql_identifier(), | |
qualified_table_name = self.sql_identifier(), | |
) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to get a sequence's last value by querying it like a table. Also, we can use sql_value
to construct the argument to SETVAL
.
fn current_sequence_value_query(&self, sequence: &Sequence) -> String { | |
format!( | |
r#"SELECT COALESCE(MAX("{column_name}"), 1)::text FROM {schema}."{table_name}""#, | |
schema = &self.schema, | |
column_name = sequence.column_name, | |
table_name = &self.name, | |
) | |
} | |
fn restore_sequences_query(&self, sequence: &Sequence, current_sequence_value: u64) -> String { | |
format!( | |
r#"SELECT SETVAL('{schema}."{sequence_name}"', {current_sequence_value}) FROM {schema}."{table_name}""#, | |
schema = &self.schema, | |
sequence_name = sequence.sequence_name, | |
current_sequence_value = current_sequence_value, | |
table_name = &self.name, | |
) | |
} | |
fn current_sequence_value_query(&self, sequence: &Sequence) -> String { | |
let qualified_sequence_name = format!( | |
"{}.{}", | |
self.schema.sql_identifier(), | |
sequence.sequence_name.sql_identifier() | |
); | |
format!(r#"SELECT last_value::text FROM {qualified_sequence_name}"#,) | |
} | |
fn restore_sequences_query(&self, sequence: &Sequence, current_sequence_value: u64) -> String { | |
let qualified_sequence_name = format!( | |
"{}.{}", | |
self.schema.sql_identifier(), | |
sequence.sequence_name.sql_identifier() | |
); | |
format!( | |
r#"SELECT SETVAL({qualified_sequence_name_as_value}, {current_sequence_value}) FROM {qualified_table_name}"#, | |
qualified_sequence_name_as_value = qualified_sequence_name.sql_value(), | |
qualified_table_name = self.sql_identifier(), | |
) | |
} |
from pg_class sequences | ||
join pg_depend ON pg_depend.objid = sequences.oid | ||
where sequences.relkind = 'S' | ||
) seqs ON seqs.tbl_id = pg_class.oid AND seqs.col_id = pg_attribute.attnum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PostgreSQL sequences don't have to be associated with a column. They're all free-standing, but can be "owned" by a table, but I think that only means that they're dropped when the table is dropped (see OWNED BY
in the docs for CREATE SEQUENCE
).
I think sequences should be considered as a separate top-level object, and not grouped in with tables. They exist in the same namespace, so the same include/exclude rules can be applied to select which sequences to restore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kinda ties into why I didn't use last_value
. I originally ran it against a few sequences and found that the last_value doesn't always line up with the max(col)
; sequences can be shared, can be depended upon by columns, and as you mentioned they can be owned by tables or not, so their scope may or may not be tied to a single table.
For both figuring out what the current value of the sequence should be, and which sequences actually even get their current value preserved, I stuck to the table as the source of truth with the understanding that what pg_parcel
primarily slices data by is tables, and so to the extent that we care to copy a slice of table data then we have to care about restoring the value of sequences that each table depends on, but not necessarily other sequences not owned by or depended on by the targeted tables, or the values of the sequences that aren't pertinent to the table.
i.e. what happens if we intentionally skip a table, but then don't explicitly also skip any sequences that the table depends on? If we always restore the sequence value, then the sequence value would be current but the table would be empty, which might be unexpected? Should you always have to explicitly skip both at the same time?
But, that was an assumption I made, would be happy to change both of these with the understanding that pg_parcel
restores all sequence values regardless of what particular slice of data is being dumped/restored (if passing the keep flag), it's separately configurable, and that is a deliberate choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be fair to say that if we are dumping a table, we should implicitly also dump any associated sequences?
I think that maybe the ability to dump sequences that aren't connected to any (requested) tables should be considered a separate feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how do we feel about two options, one called --keep-table-sequences
which does what this PR adds, and then another option called --keep-all-sequences
which does all of them?
Or, just one option called --sequences=(none | tables | all)
which defaults to tables
?
(conceptually; naming can be whatever)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the README, pg_parcel
is:
A very minimal subset of
pg_dump --data-only
...
From the pg_dump
man page (emphasis mine):
--data-only
Dump only the data, not the schema (data definitions). Table data, large objects, and sequence values are dumped.
Ergo: pg_parcel
should dump sequence values (and large objects, but that's not in scope here).
To elaborate, sequences are first-class entities within a schema, on a level with tables. They can be associated with a table but they don't belong to a table. A sequence's value is its data, just as rows are a table's data. I think it's conceptually sound to dump values for all sequences in the schema. In fact I think it's a bug or at least a missing feature that we didn't do that before, by default, and pg_parcel
should do it now without any more optional command-line flags.
What are the consequences if we just do that? What are we trying to prevent by not doing it?
No description provided.