pglogical crashes AWS Aurora during replication with Segmentation fault #458

JulienAndonov · 2024-01-25T14:34:47Z

Hey guys. I have the following issues.
During normal operation, pglogical crashes on the destination side, which is RDS aurora PGSQL 14.8 using pglogical. 2.4.2:

#Destination side
2024-01-25 13:17:09 UTC::@:[537]:LOG: background worker "pglogical apply 131082:4047160452" (PID 6709) was terminated by signal 11: Segmentation fault
2024-01-25 13:17:09 UTC::@:[537]:LOG: terminating any other active server processes
2024-01-25 13:17:09 UTC::@:[537]:FATAL: Can't handle storage runtime process crash
2024-01-25 13:17:09 UTC::@:[537]:LOG: database system is shutess crash
2024-01-25 13:17:09 UTC::@:[537]:LOG: database system is shut down

After that this initial error, the cluster enters into continuous rebooting and crashing, causing significant CPU usage and resources.

On source side we have some queries which are done couple seconds before that crash, but they don't seem to cause the problem as after re-creating the environment and re-executing the queries, the problem doesn't occur.

On the source cluster we are having these errors after the initial error on the destination:
2024-01-25 13:17:09 UTC:(63772):user@database_name:[26536]:LOG: could not receive data from client: Connection reset by peer
2024-01-25 13:17:09 UTC:(63772):user@database_name:[26536]:STATEMENT: START_REPLICATION SLOT "replication_slot_name" LOGICAL 12/28C9A430 (expected_encoding 'UTF8', min_proto_version '1', max_proto_version '1', startup_params_format '1', "binary.want_internal_basetypes" '1', "binary.want_binary_basetypes" '1', "binary.basetypes_major_version" '1400', "binary.sizeof_datum" '8', "binary.sizeof_int" '4', "binary.sizeof_long" '8', "binary.bigendian" '0', "binary.float4_byval" '0', "binary.float8_byval" '1', "binary.integer_datetimes" '0', "hooks.setup_function" 'pglogical.pglogical_hooks_setup', "pglogical.forward_origins" '"all"', "pglogical.replication_set_names" 'tenant_service', "relmeta_cache_size" '-1', pg_version '140008', pglogical_version '2.4.2', pglogical_version_num '20402', pglogical_apply_pid '6709')
2024-01-25 13:17:09 UTC:*(63772):user@database_name:[26536]:LOG: unexpected EOF on standby connection

Source and Destination:
RDS Aurora PostgreSQL 14.8
pglogical: 2.4.2

Source:
1 Writer
1 Reader

Destination:
1 Writer

Karthik-Colligence · 2024-09-13T11:41:34Z

Hey, Any update on this issue? I had the same issue popping up when i try pglogical in a similar scenario. Let us know if any updates on this "Segmentation fault" issue

andonovj · 2024-09-13T11:43:35Z

Yes, the problem was related to virtual column. Check if any of the tables you try to migrate has a virtual column. If yes, you have to remove it from the replication and add it on the destination. That worked for me :-)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pglogical crashes AWS Aurora during replication with Segmentation fault #458

pglogical crashes AWS Aurora during replication with Segmentation fault #458

JulienAndonov commented Jan 25, 2024

Karthik-Colligence commented Sep 13, 2024

andonovj commented Sep 13, 2024

pglogical crashes AWS Aurora during replication with Segmentation fault #458

pglogical crashes AWS Aurora during replication with Segmentation fault #458

Comments

JulienAndonov commented Jan 25, 2024

Karthik-Colligence commented Sep 13, 2024

andonovj commented Sep 13, 2024