The CsvReader plugin enables data extraction from CSV files. Under the hood, it utilizes the standard libraries os
and encoding/csv
for file reading.
CsvReader leverages the os
and encoding/csv
standard libraries to read files. Each row is assembled into an abstract dataset using go-etl's custom data types and passed downstream for further processing by a Writer.
The specific reading process is implemented by invoking go-etl's custom file.InStreamer
from the reading flow defined in file.Task
.
Configuring a job to synchronously extract data from a CSV file to a local destination:
{
"job":{
"content":[
{
"reader":{
"name": "csvreader",
"parameter": {
"path":["a.txt","b.txt"],
"column":[
{
"index":"1",
"type":"time",
"format":"yyyy-MM-dd"
}
],
"encoding":"utf-8",
"delimiter":","
}
}
}
]
}
}
- Description: Specifies the absolute path(s) of the CSV file(s). Multiple files can be configured.
- Required: Yes
- Default: None
- Description: Configures the column information array for the CSV file. If not specified, the corresponding columns are assumed to be of type string.
- Required: Yes
- Default: None
- Description: Specifies the column number in the CSV file, starting from 1.
- Required: Yes
- Default: None
- Description: Configures the data type of the CSV column, including options like boolean, bigInt, decimal, string, time, etc.
- Required: Yes
- Default: None
- Description: Specifies the format for the column type, particularly useful for the time type. It uses the Java Joda time format, e.g., yyyy-MM-dd.
- Required: Yes, for time type
- Default: None
- Description: Configures the encoding type of the CSV file, currently supporting utf-8 and gbk.
- Required: No
- Default: utf-8
- Description: Specifies the delimiter used in the CSV file. It supports not only visible symbols like commas or semicolons but also invisible characters such as 0x10 (configured as "\u0010").
- Required: No
- Default: , (comma)
- Description: CSV files cannot represent null (empty pointers) using standard strings. The nullFormat parameter defines which strings can be interpreted as null. For example, if nullFormat is set to "\N", then DataX will treat the source data "\N" as a null field.
- Required: No
- Default: Empty string
- Description: Specifies the row number from which to start reading in the CSV file, starting from 1.
- Required: No
- Default: 1
- Description: Provides a comment for the CSV file.
- Required: No
- Default: None
- Description: Specifies the compression method used for the CSV file, currently supporting gz (gzip compression) and zip (zip compression).
- Required: No
- Default: No compression
The CsvReader currently supports CSV data types that need to be configured in the "column" setting. Please ensure you check your data types.
Below is a list of type conversions supported by CsvReader for CSV data:
go-etl Type | CSV Data Type |
---|---|
bigInt | bigInt |
decimal | decimal |
string | string |
time | time |
bool | bool |
Pending testing.
(Note: The FAQ section would typically include common questions and answers related to the plugin's usage, troubleshooting, or best practices. However, as no specific questions were provided, this section remains empty. It can be populated as questions arise from users or developers.)