diff --git a/src/current/molt/molt-fetch.md b/src/current/molt/molt-fetch.md index 6e31f9c3905..54441fff56b 100644 --- a/src/current/molt/molt-fetch.md +++ b/src/current/molt/molt-fetch.md @@ -48,8 +48,9 @@ Complete the following items before using MOLT Fetch: molt escape-password 'a$52&' ~~~ - ~~~ Substitute the following encoded password in your original connection url string: + + ~~~ a%2452%26 ~~~ @@ -91,7 +92,8 @@ Cockroach Labs **strongly** recommends the following: ### Secure connections - Use secure connections to the source and [target CockroachDB database]({% link {{site.current_cloud_version}}/connection-parameters.md %}#additional-connection-parameters) whenever possible. -- By default, insecure connections (i.e., `sslmode=disable` on PostgreSQL; `sslmode` not set on MySQL) are disallowed. When using an insecure connection, `molt fetch` returns an error. To override this check, you can enable the `--allow-tls-mode-disable` flag. Do this **only** for testing, or if a secure SSL/TLS connection to the source or target database is not possible. +- When performing [failback](#fail-back-to-source-database), use a secure changefeed connection by [overriding the default configuration](#changefeed-override-settings). +- By default, insecure connections (i.e., `sslmode=disable` on PostgreSQL; `sslmode` not set on MySQL) are disallowed. When using an insecure connection, `molt fetch` returns an error. To override this check, you can enable the `--allow-tls-mode-disable` flag. Do this **only** when testing, or if a secure SSL/TLS connection to the source or target database is not possible. ### Connection strings @@ -181,48 +183,49 @@ To verify that your connections and configuration work properly, run MOLT Fetch ### Global flags -| Flag | Description | -|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `--source` | (Required) Connection string for the source database. For details, see [Source and target databases](#source-and-target-databases). | -| `--target` | (Required) Connection string for the target database. For details, see [Source and target databases](#source-and-target-databases). | -| `--allow-tls-mode-disable` | Allow insecure connections to databases. Secure SSL/TLS connections should be used by default. This should be enabled **only** if secure SSL/TLS connections to the source or target database are not possible. | -| `--bucket-path` | The path within the [cloud storage](#cloud-storage) bucket where intermediate files are written (e.g., `'s3://bucket/path'` or `'gs://bucket/path'`). Only the path is used; query parameters (e.g., credentials) are ignored. | -| `--cleanup` | Whether to delete intermediate files after moving data using [cloud or local storage](#data-path). **Note:** Cleanup does not occur on [continuation](#fetch-continuation). | -| `--compression` | Compression method for data when using [`IMPORT INTO`](#data-movement) (`gzip`/`none`).

**Default:** `gzip` | -| `--continuation-file-name` | Restart fetch at the specified filename if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation](#fetch-continuation). | -| `--continuation-token` | Restart fetch at a specific table, using the specified continuation token, if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation](#fetch-continuation). | -| `--crdb-pts-duration` | The duration for which each timestamp used in data export from a CockroachDB source is protected from garbage collection. This ensures that the data snapshot remains consistent. For example, if set to `24h`, each timestamp is protected for 24 hours from the initiation of the export job. This duration is extended at regular intervals specified in `--crdb-pts-refresh-interval`.

**Default:** `24h0m0s` | -| `--crdb-pts-refresh-interval` | The frequency at which the protected timestamp's validity is extended. This interval maintains protection of the data snapshot until data export from a CockroachDB source is completed. For example, if set to `10m`, the protected timestamp's expiration will be extended by the duration specified in `--crdb-pts-duration` (e.g., `24h`) every 10 minutes while export is not complete.

**Default:** `10m0s` | -| `--direct-copy` | Enables [direct copy](#direct-copy), which copies data directly from source to target without using an intermediate store. | +| Flag | Description | +|-----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `--source` | (Required) Connection string for the source database. For details, see [Source and target databases](#source-and-target-databases). | +| `--target` | (Required) Connection string for the target database. For details, see [Source and target databases](#source-and-target-databases). | +| `--allow-tls-mode-disable` | Allow insecure connections to databases. Secure SSL/TLS connections should be used by default. This should be enabled **only** if secure SSL/TLS connections to the source or target database are not possible. | +| `--bucket-path` | The path within the [cloud storage](#cloud-storage) bucket where intermediate files are written (e.g., `'s3://bucket/path'` or `'gs://bucket/path'`). Only the path is used; query parameters (e.g., credentials) are ignored. | +| `--changefeeds-path` | Path to a JSON file that contains changefeed override settings for [failback](#fail-back-to-source-database), when enabled with `--mode failback`. If not specified, an insecure default configuration is used, and `--allow-tls-mode-disable` must be included. For details, see [Fail back to source database](#fail-back-to-source-database). | +| `--cleanup` | Whether to delete intermediate files after moving data using [cloud or local storage](#data-path). **Note:** Cleanup does not occur on [continuation](#fetch-continuation). | +| `--compression` | Compression method for data when using [`IMPORT INTO`](#data-movement) (`gzip`/`none`).

**Default:** `gzip` | +| `--continuation-file-name` | Restart fetch at the specified filename if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation](#fetch-continuation). | +| `--continuation-token` | Restart fetch at a specific table, using the specified continuation token, if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation](#fetch-continuation). | +| `--crdb-pts-duration` | The duration for which each timestamp used in data export from a CockroachDB source is protected from garbage collection. This ensures that the data snapshot remains consistent. For example, if set to `24h`, each timestamp is protected for 24 hours from the initiation of the export job. This duration is extended at regular intervals specified in `--crdb-pts-refresh-interval`.

**Default:** `24h0m0s` | +| `--crdb-pts-refresh-interval` | The frequency at which the protected timestamp's validity is extended. This interval maintains protection of the data snapshot until data export from a CockroachDB source is completed. For example, if set to `10m`, the protected timestamp's expiration will be extended by the duration specified in `--crdb-pts-duration` (e.g., `24h`) every 10 minutes while export is not complete.

**Default:** `10m0s` | +| `--direct-copy` | Enables [direct copy](#direct-copy), which copies data directly from source to target without using an intermediate store. | | `--export-concurrency` | Number of shards to export at a time, each on a dedicated thread. This only applies when exporting data from the source database, not when loading data into the target database. Only tables with [primary key]({% link {{ site.current_cloud_version }}/primary-key.md %}) types of [`INT`]({% link {{ site.current_cloud_version }}/int.md %}), [`FLOAT`]({% link {{ site.current_cloud_version }}/float.md %}), or [`UUID`]({% link {{ site.current_cloud_version }}/uuid.md %}) can be sharded. The number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`.

This value **cannot** be set higher than `1` when moving data from MySQL. Refer to [Best practices](#best-practices).

**Default:** `4` with a PostgreSQL source; `1` with a MySQL source | -| `--fetch-id` | Restart fetch task corresponding to the specified ID. If `--continuation-file-name` or `--continuation-token` are not specified, fetch restarts for all failed tables. | -| `--flush-rows` | Number of rows before the source data is flushed to intermediate files. **Note:** If `--flush-size` is also specified, the fetch behavior is based on the flag whose criterion is met first. | -| `--flush-size` | Size (in bytes) before the source data is flushed to intermediate files. **Note:** If `--flush-rows` is also specified, the fetch behavior is based on the flag whose criterion is met first. | -| `--import-batch-size` | The number of files to be imported at a time to the target database. This applies only when using [`IMPORT INTO`](#data-movement) to load data into the target. **Note:** Increasing this value can improve the performance of full-scan queries on the target database shortly after fetch completes, but very high values are not recommended. If any individual file in the import batch fails, you must [retry](#fetch-continuation) the entire batch.

**Default:** `1000` | -| `--local-path` | The path within the [local file server](#local-file-server) where intermediate files are written (e.g., `data/migration/cockroach`). `--local-path-listen-addr` must be specified. | -| `--local-path-crdb-access-addr` | Address of a [local file server](#local-file-server) that is **publicly accessible**. This flag is only necessary if CockroachDB cannot reach the local address specified with `--local-path-listen-addr` (e.g., when moving data to a CockroachDB {{ site.data.products.cloud }} deployment). `--local-path` and `--local-path-listen-addr` must be specified.

**Default:** Value of `--local-path-listen-addr`. | -| `--local-path-listen-addr` | Write intermediate files to a [local file server](#local-file-server) at the specified address (e.g., `'localhost:3000'`). `--local-path` must be specified. | -| `--log-file` | Write messages to the specified log filename. If no filename is provided, messages write to `fetch-{datetime}.log`. If `"stdout"` is provided, messages write to `stdout`. | -| `--logging` | Level at which to log messages (`trace`/`debug`/`info`/`warn`/`error`/`fatal`/`panic`).

**Default:** `info` | -| `--metrics-listen-addr` | Address of the Prometheus metrics endpoint, which has the path `{address}/metrics`. For details on important metrics to monitor, see [Metrics](#metrics).

**Default:** `'127.0.0.1:3030'` | -| `--mode` | Configure the MOLT Fetch behavior: `data-load`, `data-load-and-replication`, `replication-only`, `export-only`, or `import-only`. For details, refer to [Fetch mode](#fetch-mode).

**Default:** `data-load` | -| `--non-interactive` | Run the fetch task without interactive prompts. This is recommended **only** when running `molt fetch` in an automated process (i.e., a job or continuous integration). | -| `--pglogical-replication-slot-drop-if-exists` | Drop the replication slot, if specified with `--pglogical-replication-slot-name`. Otherwise, the default replication slot is not dropped. | -| `--pglogical-replication-slot-name` | The name of a replication slot to create before taking a snapshot of data (e.g., `'fetch'`). **Required** in order to perform continuous [replication](#load-data-and-replicate-changes) from a source PostgreSQL database. | -| `--pglogical-replication-slot-plugin` | The output plugin used for logical replication under `--pglogical-replication-slot-name`.

**Default:** `pgoutput` | -| `--pprof-listen-addr` | Address of the pprof endpoint.

**Default:** `'127.0.0.1:3031'` | -| `--replicator-flags` | If continuous [replication](#load-data-and-replicate-changes) is enabled with `--mode data-load-and-replication` or `--mode replication-only`, specify replication flags ([PostgreSQL](https://github.com/cockroachdb/replicator/wiki/PGLogical#postgresql-logical-replication) or [MySQL](https://github.com/cockroachdb/replicator/wiki/MYLogical#mysqlmariadb-replication)) to override. | -| `--row-batch-size` | Number of rows per shard to export at a time. See [Best practices](#best-practices).

**Default:** `100000` | -| `--schema-filter` | Move schemas that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` | -| `--table-concurrency` | Number of tables to export at a time. The number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`.

This value **cannot** be set higher than `1` when moving data from MySQL. Refer to [Best practices](#best-practices).

**Default:** `4` with a PostgreSQL source; `1` with a MySQL source | -| `--table-exclusion-filter` | Exclude tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

This value **cannot** be set to `'.*'`, which would cause every table to be excluded.

**Default:** Empty string | -| `--table-filter` | Move tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` | -| `--table-handling` | How tables are initialized on the target database (`none`/`drop-on-target-and-recreate`/`truncate-if-exists`). For details, see [Target table handling](#target-table-handling).

**Default:** `none` | -| `--transformations-file` | Path to a JSON file that defines transformations to be performed on the target schema during the fetch task. Refer to [Transformations](#transformations). | -| `--type-map-file` | Path to a JSON file that contains explicit type mappings for automatic schema creation, when enabled with `--table-handling drop-on-target-and-recreate`. For details on the JSON format and valid type mappings, see [type mapping](#type-mapping). | -| `--use-console-writer` | Use the console writer, which has cleaner log output but introduces more latency.

**Default:** `false` (log as structured JSON) | -| `--use-copy` | Use [`COPY FROM`](#data-movement) to move data. This makes tables queryable during data load, but is slower than using `IMPORT INTO`. For details, refer to [Data movement](#data-movement). | -| `--use-implicit-auth` | Use [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}) for [cloud storage](#cloud-storage) URIs. | +| `--fetch-id` | Restart fetch task corresponding to the specified ID. If `--continuation-file-name` or `--continuation-token` are not specified, fetch restarts for all failed tables. | +| `--flush-rows` | Number of rows before the source data is flushed to intermediate files. **Note:** If `--flush-size` is also specified, the fetch behavior is based on the flag whose criterion is met first. | +| `--flush-size` | Size (in bytes) before the source data is flushed to intermediate files. **Note:** If `--flush-rows` is also specified, the fetch behavior is based on the flag whose criterion is met first. | +| `--import-batch-size` | The number of files to be imported at a time to the target database. This applies only when using [`IMPORT INTO`](#data-movement) to load data into the target. **Note:** Increasing this value can improve the performance of full-scan queries on the target database shortly after fetch completes, but very high values are not recommended. If any individual file in the import batch fails, you must [retry](#fetch-continuation) the entire batch.

**Default:** `1000` | +| `--local-path` | The path within the [local file server](#local-file-server) where intermediate files are written (e.g., `data/migration/cockroach`). `--local-path-listen-addr` must be specified. | +| `--local-path-crdb-access-addr` | Address of a [local file server](#local-file-server) that is **publicly accessible**. This flag is only necessary if CockroachDB cannot reach the local address specified with `--local-path-listen-addr` (e.g., when moving data to a CockroachDB {{ site.data.products.cloud }} deployment). `--local-path` and `--local-path-listen-addr` must be specified.

**Default:** Value of `--local-path-listen-addr`. | +| `--local-path-listen-addr` | Write intermediate files to a [local file server](#local-file-server) at the specified address (e.g., `'localhost:3000'`). `--local-path` must be specified. | +| `--log-file` | Write messages to the specified log filename. If no filename is provided, messages write to `fetch-{datetime}.log`. If `"stdout"` is provided, messages write to `stdout`. | +| `--logging` | Level at which to log messages (`trace`/`debug`/`info`/`warn`/`error`/`fatal`/`panic`).

**Default:** `info` | +| `--metrics-listen-addr` | Address of the Prometheus metrics endpoint, which has the path `{address}/metrics`. For details on important metrics to monitor, see [Metrics](#metrics).

**Default:** `'127.0.0.1:3030'` | +| `--mode` | Configure the MOLT Fetch behavior: `data-load`, `data-load-and-replication`, `replication-only`, `export-only`, or `import-only`. For details, refer to [Fetch mode](#fetch-mode).

**Default:** `data-load` | +| `--non-interactive` | Run the fetch task without interactive prompts. This is recommended **only** when running `molt fetch` in an automated process (i.e., a job or continuous integration). | +| `--pglogical-replication-slot-drop-if-exists` | Drop the replication slot, if specified with `--pglogical-replication-slot-name`. Otherwise, the default replication slot is not dropped. | +| `--pglogical-replication-slot-name` | The name of a replication slot to create before taking a snapshot of data (e.g., `'fetch'`). **Required** in order to perform continuous [replication](#load-data-and-replicate-changes) from a source PostgreSQL database. | +| `--pglogical-replication-slot-plugin` | The output plugin used for logical replication under `--pglogical-replication-slot-name`.

**Default:** `pgoutput` | +| `--pprof-listen-addr` | Address of the pprof endpoint.

**Default:** `'127.0.0.1:3031'` | +| `--replicator-flags` | If continuous [replication](#load-data-and-replicate-changes) is enabled with `--mode data-load-and-replication`, `--mode replication-only`, or `--mode failback`, specify replication flags ([PostgreSQL](https://github.com/cockroachdb/replicator/wiki/PGLogical#postgresql-logical-replication) or [MySQL](https://github.com/cockroachdb/replicator/wiki/MYLogical#mysqlmariadb-replication)) to override. | +| `--row-batch-size` | Number of rows per shard to export at a time. See [Best practices](#best-practices).

**Default:** `100000` | +| `--schema-filter` | Move schemas that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` | +| `--table-concurrency` | Number of tables to export at a time. The number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`.

This value **cannot** be set higher than `1` when moving data from MySQL. Refer to [Best practices](#best-practices).

**Default:** `4` with a PostgreSQL source; `1` with a MySQL source | +| `--table-exclusion-filter` | Exclude tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

This value **cannot** be set to `'.*'`, which would cause every table to be excluded.

**Default:** Empty string | +| `--table-filter` | Move tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` | +| `--table-handling` | How tables are initialized on the target database (`none`/`drop-on-target-and-recreate`/`truncate-if-exists`). For details, see [Target table handling](#target-table-handling).

**Default:** `none` | +| `--transformations-file` | Path to a JSON file that defines transformations to be performed on the target schema during the fetch task. Refer to [Transformations](#transformations). | +| `--type-map-file` | Path to a JSON file that contains explicit type mappings for automatic schema creation, when enabled with `--table-handling drop-on-target-and-recreate`. For details on the JSON format and valid type mappings, see [type mapping](#type-mapping). | +| `--use-console-writer` | Use the console writer, which has cleaner log output but introduces more latency.

**Default:** `false` (log as structured JSON) | +| `--use-copy` | Use [`COPY FROM`](#data-movement) to move data. This makes tables queryable during data load, but is slower than using `IMPORT INTO`. For details, refer to [Data movement](#data-movement). | +| `--use-implicit-auth` | Use [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}) for [cloud storage](#cloud-storage) URIs. | ### `tokens list` flags @@ -272,33 +275,34 @@ MySQL: - [Load data and replicate changes to CockroachDB](#load-data-and-replicate-changes) - [Replicate changes to CockroachDB](#replicate-changes) - [Export the data to storage](#export-data-to-storage) +- [Fail back to source database](#fail-back-to-source-database) #### Load data `data-load` (default) instructs MOLT Fetch to load the source data into CockroachDB. It does not replicate any subsequent changes on the source. {% include_cached copy-clipboard.html %} -~~~ +~~~ --mode data-load ~~~ #### Load data and replicate changes {{site.data.alerts.callout_info}} -Before using this option, the source PostgreSQL or MySQL database **must** be configured for continuous replication, as described in [Setup](#replication-setup). +Before using this option, the source PostgreSQL or MySQL database **must** be configured for continuous replication, as described in [Setup](#replication-setup). MySQL 8.0 and later are supported. {{site.data.alerts.end}} `data-load-and-replication` instructs MOLT Fetch to load the source data into CockroachDB, and replicate any subsequent changes on the source. {% include_cached copy-clipboard.html %} -~~~ +~~~ --mode data-load-and-replication ~~~ If the source is a PostgreSQL database, you must also specify a replication slot name. For example, the following snippet instructs MOLT Fetch to create a slot named `replication_slot` to use for replication: {% include_cached copy-clipboard.html %} -~~~ +~~~ --mode data-load-and-replication --pglogical-replication-slot-name 'replication_slot' ~~~ @@ -318,7 +322,7 @@ To customize the replication behavior (an advanced use case), use `--replicator- #### Replicate changes {{site.data.alerts.callout_info}} -Before using this option, the source PostgreSQL or MySQL database **must** be configured for continuous replication, as described in [Setup](#replication-setup). +Before using this option, the source PostgreSQL or MySQL database **must** be configured for continuous replication, as described in [Setup](#replication-setup). MySQL 8.0 and later are supported. {{site.data.alerts.end}} `replication-only` instructs MOLT Fetch to replicate ongoing changes on the source to CockroachDB, using the specified replication marker. @@ -333,7 +337,7 @@ Before using this option, the source PostgreSQL or MySQL database **must** be co In the `molt fetch` command, specify the replication slot name using `--pglogical-replication-slot-name`. For example: {% include_cached copy-clipboard.html %} - ~~~ + ~~~ --mode replication-only --pglogical-replication-slot-name 'replication_slot' ~~~ @@ -350,7 +354,7 @@ Before using this option, the source PostgreSQL or MySQL database **must** be co In the `molt fetch` command, specify a GTID set using the format `source_uuid:min(interval_start)-max(interval_end)`. For example: {% include_cached copy-clipboard.html %} - ~~~ + ~~~ --mode replication-only --replicator-flags "--defaultGTIDSet 'b7f9e0fa-2753-1e1f-5d9b-2402ac810003:3-21'" ~~~ @@ -362,7 +366,7 @@ To cancel replication, enter `ctrl-c` to issue a `SIGTERM` signal. This returns `export-only` instructs MOLT Fetch to export the source data to the specified [cloud storage](#cloud-storage) or [local file server](#local-file-server). It does not load the data into CockroachDB. {% include_cached copy-clipboard.html %} -~~~ +~~~ --mode export-only ~~~ @@ -371,10 +375,110 @@ To cancel replication, enter `ctrl-c` to issue a `SIGTERM` signal. This returns `import-only` instructs MOLT Fetch to load the source data in the specified [cloud storage](#cloud-storage) or [local file server](#local-file-server) into the CockroachDB target. {% include_cached copy-clipboard.html %} -~~~ +~~~ --mode import-only ~~~ +#### Fail back to source database + +{{site.data.alerts.callout_danger}} +Before using `failback` mode, refer to the [technical advisory]({% link advisories/a123371.md %}) about a bug that affects changefeeds on CockroachDB v22.2, v23.1.0 to v23.1.21, v23.2.0 to v23.2.5, and testing versions of v24.1 through v24.1.0-rc.1. +{{site.data.alerts.end}} + +If you encounter issues after moving data to CockroachDB, you can use `failback` mode to replicate changes on CockroachDB back to the initial source database. In case you need to roll back the migration, this ensures that data is consistent on the initial source database. + +`failback` mode creates a [CockroachDB changefeed]({% link {{ site.current_cloud_version }}/change-data-capture-overview.md %}) and sets up a [webhook sink]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-sink) to pass change events from CockroachDB to the failback target. In production, you should **secure the connection** by specifying [changefeed override settings](#changefeed-override-settings) in a JSON file. These settings override the [default insecure changefeed]#default-insecure-changefeed] values, which are suited for testing only. Include the [`--changefeeds-path`](#global-flags) flag to indicate the path to the JSON file. + +{% include_cached copy-clipboard.html %} +~~~ +--mode failback +--changefeeds-path 'changefeed-settings.json' +~~~ + +When running `molt fetch --mode failback`, `--source` is the CockroachDB connection string and `--target` is the connection string of the database you migrated from. `--table-filter` specifies the tables to watch for change events. For example: + +{% include_cached copy-clipboard.html %} +~~~ +--source 'postgresql://{username}:{password}@{host}:{port}/{database}' +--target 'mysql://{username}:{password}@{protocol}({host}:{port})/{database}' +--table-filter 'employees, payments' +~~~ + +{{site.data.alerts.callout_info}} +MySQL 8.0 and later are supported as MySQL failback targets. +{{site.data.alerts.end}} + +##### Changefeed override settings + +You can specify the following [`CREATE CHANGEFEED` parameters]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#parameters) in the override JSON. If any parameter is not specified, its [default value](#default-insecure-changefeed) is used. + +- The following [`CREATE CHANGEFEED` sink URI parameters]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#sink-uri): + - `host`: The hostname or IP address of the [webhook sink]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-sink) where change events are sent. The applicable certificates of the failback target (i.e., the [source database](#source-and-target-databases) from which you migrated) **must** be located on this machine. + - `port`: The port of the [webhook sink]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-sink). + - `sink_query_parameters`: A comma-separated list of [`CREATE CHANGEFEED` query parameters]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#query-parameters). This includes the base64-encoded client certificate ([`client_cert`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#client-cert)), key ([`client_key`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#client-key)), and CA ([`ca_cert`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#ca-cert)) for a secure webhook sink. +- The following [`CREATE CHANGEFEED` options]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#options): + - [`resolved`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#resolved) + - [`min_checkpoint_frequency`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#min-checkpoint-frequency) + - [`initial_scan`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#initial-scan) + - [`webhook_sink_config`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#webhook-sink-config) + +{{site.data.alerts.callout_info}} +If there is already a running CockroachDB changefeed with the same webhook sink URL (excluding query parameters) and [watched tables]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}), the existing changefeed is used for `failback`. +{{site.data.alerts.end}} + +**Use a secure changefeed connection whenever possible.** The [default insecure configuration](#default-insecure-changefeed) is **not** recommended in production. To secure the changefeed connection, define `sink_query_parameters` in the JSON as follows: + +{% include_cached copy-clipboard.html %} +~~~ json +{ + "sink_query_parameters": "client_cert={base64 cert}&client_key={base64 key}&ca_cert={base64 CA cert}" +} +~~~ + +`client_cert`, `client_key`, and `ca_cert` are [webhook sink parameters]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-parameters) that must be base64- and URL-encoded (for example, use the command `base64 -i ./client.crt | jq -R -r '@uri'`). + +In the `molt fetch` command, also include [`--replicator-flags`](#global-flags) to specify the paths to the server certificate and key that correspond to the client certs defined in `sink_query_parameters`. For example: + +{% include_cached copy-clipboard.html %} +~~~ +--changefeeds-path 'changefeed-secure.json' +--replicator-flags "--tlsCertificate ./certs/server.crt --tlsPrivateKey ./certs/server.key" +~~~ + +For a complete example of using `molt fetch` in `failback` mode, see [Fail back securely from CockroachDB](#fail-back-securely-from-cockroachdb). + +##### Default insecure changefeed + +{{site.data.alerts.callout_danger}} +Insecure configurations are **not** recommended. In production, run failback with a secure changefeed connection. For details, see [Changefeed override settings](#changefeed-override-settings). +{{site.data.alerts.end}} + +When `molt fetch --mode failback` is run without specifying `--changefeeds-path`, the following [`CREATE CHANGEFEED` parameters]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#parameters) are used for the changefeed: + +~~~ json +{ + "host": "localhost", + "port": 30004, + "sink_query_parameters": "insecure_tls_skip_verify=true", + "resolved": "1s", + "min_checkpoint_frequency": "1s", + "initial_scan": "no", + "webhook_sink_config": "{\"Flush\":{\"Bytes\":1048576}}" +} +~~~ + +The default parameters specify a local webhook sink (`"localhost"`) and an insecure sink connection (`"insecure_tls_skip_verify=true"`), which are suited for testing only. In order to run `failback` with the default insecure configuration, you must also include the following flags: + +{% include_cached copy-clipboard.html %} +~~~ +--allow-tls-mode-disable +--replicator-flags '--tlsSelfSigned --disableAuthentication' +~~~ + +{{site.data.alerts.callout_info}} +Either `--changefeeds-path`, which overrides the default insecure configuration; or `--allow-tls-mode-disable`, which enables the use of the default insecure configuration, must be specified in `failback` mode. Otherwise, `molt fetch` will error. +{{site.data.alerts.end}} + ### Data movement MOLT Fetch can use either [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}) or [`COPY FROM`]({% link {{site.current_cloud_version}}/copy-from.md %}) to load data into CockroachDB. @@ -810,7 +914,7 @@ After successfully running MOLT Fetch, you can run [`molt verify`]({% link molt/ ### Load PostgreSQL data via S3 with continuous replication -The following `molt fetch` command uses `IMPORT INTO` to load a subset of tables from a PostgreSQL database to CockroachDB. +The following `molt fetch` command uses [`IMPORT INTO`](#data-movement) to load a subset of tables from a PostgreSQL database to CockroachDB. {% include_cached copy-clipboard.html %} ~~~ shell @@ -853,7 +957,7 @@ To cancel replication, enter `ctrl-c` to issue a `SIGTERM` signal. ### Load MySQL data via GCP with continuous replication -The following `molt fetch` command uses `COPY FROM` to load a subset of tables from a MySQL database to CockroachDB. +The following `molt fetch` command uses [`COPY FROM`](#data-movement) to load a subset of tables from a MySQL database to CockroachDB. {% include_cached copy-clipboard.html %} ~~~ shell @@ -896,7 +1000,7 @@ To cancel replication, enter `ctrl-c` to issue a `SIGTERM` signal. ### Load CockroachDB data via direct copy -The following `molt fetch` command uses `COPY FROM` to load all tables directly from one CockroachDB database to another. +The following `molt fetch` command uses [`COPY FROM`](#data-movement) to load all tables directly from one CockroachDB database to another. {% include_cached copy-clipboard.html %} ~~~ shell @@ -947,6 +1051,65 @@ molt fetch \ --non-interactive ~~~ +### Fail back securely from CockroachDB + +{{site.data.alerts.callout_danger}} +Before using `failback` mode, refer to the [technical advisory]({% link advisories/a123371.md %}) about a bug that affects changefeeds on CockroachDB v22.2, v23.1.0 to v23.1.21, v23.2.0 to v23.2.5, and testing versions of v24.1 through v24.1.0-rc.1. +{{site.data.alerts.end}} + +The following `molt fetch` command uses [`failback` mode](#fail-back-to-source-database) to securely replicate changes from CockroachDB back to a MySQL database. This assumes that you migrated data from MySQL to CockroachDB, and want to keep the data consistent on MySQL in case you need to roll back the migration. + +{% include_cached copy-clipboard.html %} +~~~ shell +molt fetch \ +--source 'postgres://root@localhost:26257/defaultdb?sslmode=verify-full' \ +--target 'mysql://root:password@localhost/molt?sslcert=.%2fsource_certs%2fclient.root.crt&sslkey=.%2fsource_certs%2fclient.root.key&sslmode=verify-full&sslrootcert=.%2fsource_certs%2fca.crt' \ +--table-filter 'employees, payments' \ +--non-interactive \ +--logging debug \ +--replicator-flags "--tlsCertificate ./certs/server.crt --tlsPrivateKey ./certs/server.key" \ +--mode failback \ +--changefeeds-path 'changefeed-secure.json' +~~~ + +- `--source` specifies the connection string of the CockroachDB database to which you migrated. +- `--target` specifies the connection string of the MySQL database acting as the failback target. +- `--table-filter` specifies that the `employees` and `payments` tables should be watched for change events. +- `--replicator-flags` specifies the paths to the server certificate (`--tlsCertificate`) and key (`--tlsPrivateKey`) that correspond to the client certs defined by `sink_query_parameters` in the changefeed override JSON file. +- `--changefeeds-path` specifies the path to `changefeed-secure.json`, which contains the following setting override: + + {% include_cached copy-clipboard.html %} + ~~~ json + { + "sink_query_parameters": "client_cert={base64 cert}&client_key={base64 key}&ca_cert={base64 CA cert}" + } + ~~~ + + `client_cert`, `client_key`, and `ca_cert` are [webhook sink parameters]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-parameters) that must be base64- and URL-encoded (for example, use the command `base64 -i ./client.crt | jq -R -r '@uri'`). + + {{site.data.alerts.callout_success}} + For details on the default changefeed settings and how to override them, see [Changefeed override settings](#changefeed-override-settings). + {{site.data.alerts.end}} + +The preceding `molt fetch` command issues the equivalent [`CREATE CHANGEFEED`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}) command, using the default and explicitly overriden changefeed settings: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE CHANGEFEED FOR TABLE employees, payments + INTO 'webhook-https://localhost:30004/defaultdb/public?client_cert={base64 cert}&client_key={base64 key}&ca_cert={base64 CA cert}' + WITH updated, resolved = '1s', min_checkpoint_frequency = '1s', initial_scan = 'no', cursor = '2024-09-11T16:33:35Z', webhook_sink_config = '{\"Flush\":{\"Bytes\":1048576,\"Frequency\":\"1s\"}}' +~~~ + +The initial output looks like the following: + +~~~ +INFO [Sep 11 11:03:54] Replicator starting -buildmode=exe -compiler=gc CGO_CFLAGS= CGO_CPPFLAGS= CGO_CXXFLAGS= CGO_ENABLED=1 CGO_LDFLAGS= GOARCH=arm64 GOOS=darwin vcs=git vcs.modified=true vcs.revision=c948b78081a37aacf37a82eac213aa91a2828f92 vcs.time="2024-08-19T13:39:37Z" +INFO [Sep 11 11:03:54] Server listening address="[::]:30004" +DEBUG [Sep 11 11:04:00] httpRequest="&{0x14000156ea0 0 401 32 101.042µs false false}" +DEBUG [Sep 11 11:04:00] httpRequest="&{0x14000018b40 0 401 32 104.417µs false false}" +DEBUG [Sep 11 11:04:01] httpRequest="&{0x140000190e0 0 401 32 27.958µs false false}" +~~~ + ## See also - [MOLT Verify]({% link molt/molt-verify.md %})