The OracleReader plugin enables data reading from Oracle databases. Under the hood, OracleReader connects to remote Oracle databases using github.com/godror/godror and database/sql, executing corresponding SQL statements to query data from Oracle. Unlike other databases, Oracle's interaction protocol is not publicly available, so the Golang driver for Oracle is based on ODPI-C, which requires the use of Oracle Instant Client for connectivity. For example, connecting to Oracle 11g requires version 12.x.
OracleReader connects to remote Oracle databases using Oracle Instant Client via github.com/godror/godror. It generates SQL queries based on user-provided configurations, sends them to the remote Oracle database, and assembles the returned results into an abstract dataset using go-etl's custom data types, which are then passed to downstream Writer processes.
OracleReader implements specific queries by invoking the query process defined in dbmsreader, using go-etl's custom storage/database DBWrapper. DBWrapper encapsulates many interfaces of database/sql and abstracts the database dialect, Dialect. In this case, Oracle utilizes the Dialect implemented in storage/database/oracle.
Configuring a job to synchronize data from an Oracle database to a local destination:
{
"job":{
"content":[
{
"reader":{
"name": "oraclereader",
"parameter": {
"connection": {
"url": "connectString=\"192.168.15.130:1521/xe\" heterogeneousPool=false standaloneConnection=true",
"table": {
"schema":"TEST",
"name":"SRC"
}
},
"username": "system",
"password": "oracle",
"column": ["*"],
"split" : {
"key":"id"
},
"where": "",
"querySql":["select a,b from table_a join table_b on table_a.id = table_b.id"]
}
}
}
]
}
}
- Description: Primarily used to configure the connection information for the remote Oracle database. The basic configuration format for connecting to an Oracle database is:
connectString="192.168.15.130:1521/xe" heterogeneousPool=false standaloneConnection=true
. TheconnectString
represents the connection information for the Oracle database. If using a server name for the connection, useip:port/servername
or(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip)(PORT=port))(CONNECT_DATA=(SERVICE_NAME=servername)))
. If using a SID for the connection, use(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip)(PORT=port))(CONNECT_DATA=(SID=sid)))
. This is similar to the connection configuration information in the Godror User Guide, except that the username and password are extracted from the connection configuration information for easier encryption. - Required: Yes
- Default: None
- Description: Primarily used to configure the username for the Oracle database.
- Required: Yes
- Default: None
- Description: Primarily used to configure the password for the Oracle database.
- Required: Yes
- Default: None
Describes the Oracle table information.
- Description: Primarily used to configure the schema name for the Oracle table.
- Required: Yes
- Default: None
- Description: Primarily used to configure the table name for the Oracle table.
- Required: Yes
- Default: None
- Description: An array of column names to be synchronized from the configured table. Users can use the asterisk () to indicate that all columns should be used by default, for example, [""]. Column pruning is supported, meaning that only a subset of columns can be exported. Column reordering is also supported, meaning that columns do not need to be exported in the same order as the table schema. Constant configuration is supported, where users need to follow the Oracle SQL syntax format: ["id", "
table
", "1", "'bazhen.csy'", "null", "left(a,10)", "2.3", "true"]. In this example, "id" is a regular column name, "table
" is a column name that contains reserved words, "1" is an integer constant, "'bazhen.csy'" is a string constant, "null" is a null pointer, "left(a,10)" is an expression, "2.3" is a floating-point number, and "true" is a boolean value. - Required: Yes
- Default: None
- Description: Primarily used to configure the splitting key for the Oracle table. The splitting key must be of type bigInt/string/time, assuming that the data is evenly distributed based on the splitting key.
- Required: No
- Default: None
- Description: Primarily used to configure the time splitting key for the Oracle table. It is mainly used to describe the smallest unit of time, such as day, minute, second, millisecond, microsecond, nanosecond.
- Required: No
- Default: None
- Description: Primarily used to configure the default type for the splitting key of the Oracle table. The value can be bigInt/string/time. The system does not check the type in the table splitting key, but it is important to ensure the correct type.
- Required: No
- Default: None
- Description: Primarily used to configure the default minimum value for the splitting key of the Oracle table.
- Required: No
- Default: None
- Description: Primarily used to configure the default maximum value for the splitting key of the Oracle table.
- Required: No
- Default: None
- Description: Primarily used to configure the WHERE condition for the SELECT statement.
- Required: No
- Default: None
- Description: In some business scenarios, the
where
configuration item may not be sufficient to describe the filtering conditions. Users can use this configuration item to define custom SQL queries for filtering. When users configure this item, the DataX system will ignore thetable
,column
, and other related configurations and directly use the content of this configuration item for data filtering. For example, it can be used for data synchronization after performing a join operation on multiple tables, such asselect a,b from table_a join table_b on table_a.id = table_b.id
. WhenquerySql
is configured in OracleReader, it directly ignores the configuration oftable
,column
, andwhere
conditions, and the priority ofquerySql
is higher than that oftable
,column
, andwhere
options. - Required: No
- Default: None
- Description: Specifies whether to remove leading and trailing spaces for Oracle's char and nchar types.
- Required: No
- Default: false
Currently, OracleReader supports most Oracle types, but there are some individual types that are not supported. Please check your data types carefully.
Below is a conversion table for OracleReader regarding Oracle types:
go-etl Type | Oracle Data Type |
---|---|
bool | BOOLEAN |
bigInt | NUMBER, INTEGER, SMALLINT |
decimal | BINARY_FLOAT, FLOAT, BINARY_DOUBLE, REAL, DECIMAL, NUMERIC |
string | VARCHAR, CHAR, NCHAR, VARCHAR2, NVARCHAR2, CLOB, NCLOB |
time | DATE, TIMESTAMP |
bytes | BLOB, RAW, LONG RAW, LONG |
To be tested.
Currently, only the UTF-8 character set is supported.
- How to configure Oracle Instant Client for Oracle?
Example configurations:
- Note that on Linux, you may need to set an environment variable like
export LD_LIBRARY_PATH=/opt/oracle/instantclient_21_1:$LD_LIBRARY_PATH
. Additionally, you may need to installlibaio
. - On Windows, you may need to set a path variable like
set path=%path%;%GOPATH%\oracle\instantclient_21_1
. Please note that Oracle Instant Client 19 no longer supports Windows 7.
- How to eliminate
godor WARNING: discrepancy between SESSIONTIMEZONE and SYSTIMESTAMP
Either speak with your DBA to synchronize the DB's time zone (DBTIMEZONE) with the underlying OS' time zone, or use
ALTER SESSION SET TIME_ZONE='Europe/Berlin'
or set one chosen timezone in the [./connection.md](connection string):
timezone="Europe/Berlin"
(it is parsed with time.LoadLocation, so such names can be used, or local, or a numeric +0500 fixed zone).
WARNING: time zone altered with ALTER SESSION may not be read each and every time, so either always ALTER SESSION consistently to the same timezone, or use the
perSessionTimezone=1
connection parameter, to force checking the time zone for each session (and not cache it per DB).