Skip to content

The first lesson learned

averemee-si edited this page Apr 5, 2020 · 1 revision

oracdc developed to solve a practical task related to the life cycle of support and development of Oracle E-Business Suite of two of our customers. Oracle E-Business Suite includes a restricted use license of the Oracle Database and other Oracle technology products. This implies the use of these technology products only for standard forms, reports and Oracle E-Business Suite programs. Adding not only a database table, but also a new form or report requires licensing the Oracle database and other technological products (Re.: Applications Licensing Table). We will not dwell on the description of the pitfalls of licensing Oracle products (if you have questions about Oracle licensing, please write to us at [email protected]), the purpose of this article is to show how you can solve the problems of system extensibility, moving extensions and custom forms, reports to open source systems and, consequently, reducing the total cost of ownership of Oracle by using the Apache Kafka ecosystem and oracdc to replicate Oracle transaction data without additional Oracle licensing. With the advent of Oracle Database 12c all previous features related to data replication - Oracle Advanced Replication Option, Oracle Change Data Capture и Oracle Streams are deprecated and desupported (Ref .: Deprecated and Desupported Features for Oracle Database 12c). Now if you need replication or CDC use Oracle Golden Gate, which must be licensed. And the cost of the Oracle Golden Gate license is comparable to Oracle Database Enterprise Edition license cost (Ref.: Oracle Technology Global Price List), compatible only with only for the Enterprise Edition and you need to license based on the number of core not only for the source system, but also for the receiver system .. Therefore, we developed oracdc and share the experience of its use with one of our customers.

The task was to replicate information related to material and warehouse management. The only KPI is a transfer of up to three million new and/or changed rows of the INV.MTL_MATERIAL_TRANSACTIONS table within an hour. The number of changes in other tables compared to INV.MTL_MATERIAL_TRANSACTIONS can be neglected. A bit about the INV.MTL_MATERIAL_TRANSACTIONS table: INV.MTL_MATERIAL_TRANSACTIONS There are 174 columns in total, mainly NUMBER without PRECISION and SCALE, the recommended approach for Kafka Connect in the explicit PRECISION and SCALE declaration (Ref.: https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector/#bytes-decimals-numerics) but what to do with the existing system and existing multi terabyte data? One more data migration with type change and full scale testing? And the NUMBER itself is not only an integer, but also a decimal or floating point. To solve this challenge, we created an editor to display numeric data types in an Oracle table with Java/Kafka Connect data types INV.MTL_MATERIAL_TRANSACTIONS The issue of data migration is no longer on the agenda and type redefinition has increased replication speed

INV.MTL_MATERIAL_TRANSACTIONS

534 rows per second! However, customer expectations were higher … Data research was carried out - even, a column that is NULL for all rows of the table uses resources both on the Oracle Database (read it) side and in the Source Connector (parse it) The customer uses PostgreSQL 12.1 as target system and the analysis was carried out both on the Oracle side and in the data already migrated:


oebs=> SELECT schemaname, tablename, attname
oebs-> FROM pg_stats
oebs-> WHERE most_common_vals is null
oebs->   AND most_common_freqs is null
oebs->   AND histogram_bounds is null
oebs->   AND correlation is null
oebs->   AND null_frac = 1
oebs->   AND tablename = 'mtl_material_transactions';
 schemaname |         tablename         |            attname             
------------+---------------------------+--------------------------------
 public     | mtl_material_transactions | revision
 public     | mtl_material_transactions | lpn_id
… … …
… … …
… … …
 public     | mtl_material_transactions | common_routing_seq_id
 public     | mtl_material_transactions | task_group_id
(56 rows)

56 “empty” columns was excluded from oracdc Source Connector table definition with the help of our GUI schema editor and test results have become different:

INV.MTL_MATERIAL_TRANSACTIONS

Nearly 9 million INSERT/UPDATE/DELETE operations in 2 hours 3 minutes. Processing speed exceeding customer expectations! And the lesson is that for practical tasks related to the processing of millions of records it is necessary

  1. as accurate conversion of data as possible, not only eliminating their distortion during transformation (DECIMAL -> FLOAT / DOUBLE), but also not loading the CPU with unnecessary serialization and deserialization operations. oracdc provides these capabilities by supporting the Oracle Database internal format for NUMBER and TIMESTAMP WITH [LOCAL] TIMEZONE data and by providing an editor for mapping Oracle and Java/Kafka Connect data types
  2. Size matters - it makes no sense to transfer empty columns between systems. oracdc’s built-in GUI mapping editor eliminates this kind of information from replication

Last but not least, this hands-on experience and the lessons of the first projects demonstrate a willingness to solve problems, lower the total cost of ownership and increase IT efficiency using the oracdc and Apache Kafka ecosystem.

If you have additional questions, please write to us at - [email protected]