-
Notifications
You must be signed in to change notification settings - Fork 37
The first lesson learned
oracdc developed to solve a practical task related to the life cycle of support and development of Oracle E-Business Suite of two of our customers. Oracle E-Business Suite includes a restricted use license of the Oracle Database and other Oracle technology products. This implies the use of these technology products only for standard forms, reports and Oracle E-Business Suite programs. Adding not only a database table, but also a new form or report requires licensing the Oracle database and other technological products (Re.: Applications Licensing Table). We will not dwell on the description of the pitfalls of licensing Oracle products (if you have questions about Oracle licensing, please write to us at [email protected]), the purpose of this article is to show how you can solve the problems of system extensibility, moving extensions and custom forms, reports to open source systems and, consequently, reducing the total cost of ownership of Oracle by using the Apache Kafka ecosystem and oracdc to replicate Oracle transaction data without additional Oracle licensing. With the advent of Oracle Database 12c all previous features related to data replication - Oracle Advanced Replication Option, Oracle Change Data Capture и Oracle Streams are deprecated and desupported (Ref .: Deprecated and Desupported Features for Oracle Database 12c). Now if you need replication or CDC use Oracle Golden Gate, which must be licensed. And the cost of the Oracle Golden Gate license is comparable to Oracle Database Enterprise Edition license cost (Ref.: Oracle Technology Global Price List), compatible only with only for the Enterprise Edition and you need to license based on the number of core not only for the source system, but also for the receiver system .. Therefore, we developed oracdc and share the experience of its use with one of our customers.
The task was to replicate information related to material and warehouse management. The only KPI is a transfer of up to three million new and/or changed rows of the INV.MTL_MATERIAL_TRANSACTIONS
table within an hour. The number of changes in other tables compared to INV.MTL_MATERIAL_TRANSACTIONS
can be neglected. A bit about the INV.MTL_MATERIAL_TRANSACTIONS
table:
There are 174 columns in total, mainly NUMBER
without PRECISION
and SCALE
, the recommended approach for Kafka Connect in the explicit PRECISION
and SCALE
declaration (Ref.: https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector/#bytes-decimals-numerics) but what to do with the existing system and existing multi terabyte data? One more data migration with type change and full scale testing? And the NUMBER
itself is not only an integer
, but also a decimal
or float
ing point.
To solve this challenge, we created an editor to display numeric data types in an Oracle table with Java/Kafka Connect data types
The issue of data migration is no longer on the agenda and type redefinition has increased replication speed
534 rows per second! However, customer expectations were higher … Data research was carried out - even, a column that is NULL for all rows of the table uses resources both on the Oracle Database (read it) side and in the Source Connector (parse it) The customer uses PostgreSQL 12.1 as target system and the analysis was carried out both on the Oracle side and in the data already migrated:
oebs=> SELECT schemaname, tablename, attname
oebs-> FROM pg_stats
oebs-> WHERE most_common_vals is null
oebs-> AND most_common_freqs is null
oebs-> AND histogram_bounds is null
oebs-> AND correlation is null
oebs-> AND null_frac = 1
oebs-> AND tablename = 'mtl_material_transactions';
schemaname | tablename | attname
------------+---------------------------+--------------------------------
public | mtl_material_transactions | revision
public | mtl_material_transactions | lpn_id
… … …
… … …
… … …
public | mtl_material_transactions | common_routing_seq_id
public | mtl_material_transactions | task_group_id
(56 rows)
56 “empty” columns was excluded from oracdc Source Connector table definition with the help of our GUI schema editor and test results have become different:
Nearly 9 million INSERT
/UPDATE
/DELETE
operations in 2 hours 3 minutes. Processing speed exceeding customer expectations!
And the lesson is that for practical tasks related to the processing of millions of records it is necessary
- as accurate conversion of data as possible, not only eliminating their distortion during transformation (
DECIMAL
->FLOAT
/DOUBLE
), but also not loading the CPU with unnecessary serialization and deserialization operations. oracdc provides these capabilities by supporting the Oracle Database internal format forNUMBER
andTIMESTAMP WITH [LOCAL] TIMEZONE
data and by providing an editor for mapping Oracle and Java/Kafka Connect data types - Size matters - it makes no sense to transfer empty columns between systems. oracdc’s built-in GUI mapping editor eliminates this kind of information from replication
Last but not least, this hands-on experience and the lessons of the first projects demonstrate a willingness to solve problems, lower the total cost of ownership and increase IT efficiency using the oracdc and Apache Kafka ecosystem.
If you have additional questions, please write to us at - [email protected]