Skip to content

Commit

Permalink
[SPARK-48854][DOCS] Add missing options in CSV documentation
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

This PR added documents for missing CSV options, including `delimiter` as an alternative to `sep`, `charset` as an alternative to `encoding`, `codec` as an alternative to `compression`, and `timeZone`, excluding `columnPruning` which falls back to an internal SQL config.

### Why are the changes needed?

improvement for user guide

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

doc build

![image](https://github.com/apache/spark/assets/8326978/d8ff888b-cafa-44e6-ab74-7bf69702a267)

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#47278 from yaooqinn/SPARK-48854.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
  • Loading branch information
yaooqinn committed Jul 10, 2024
1 parent f2dd0b3 commit 35fbedb
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 4 deletions.
18 changes: 15 additions & 3 deletions docs/sql-data-sources-csv.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,13 +55,13 @@ Data source options of CSV can be set via:
<table>
<thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr></thead>
<tr>
<td><code>sep</code></td>
<td><code>sep</code><br><code>delimiter</code></td>
<td>,</td>
<td>Sets a separator for each field and value. This separator can be one or more characters.</td>
<td>read/write</td>
</tr>
<tr>
<td><code>encoding</code></td>
<td><code>encoding</code><br><code>charset</code></td>
<td>UTF-8</td>
<td>For reading, decodes the CSV files by the given encoding type. For writing, specifies encoding (charset) of saved CSV files. CSV built-in functions ignore this option.</td>
<td>read/write</td>
Expand Down Expand Up @@ -261,10 +261,22 @@ Data source options of CSV can be set via:
<td>read</td>
</tr>
<tr>
<td><code>compression</code></td>
<td><code>compression</code><br><code>codec</code></td>
<td>(none)</td>
<td>Compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (<code>none</code>, <code>bzip2</code>, <code>gzip</code>, <code>lz4</code>, <code>snappy</code> and <code>deflate</code>). CSV built-in functions ignore this option.</td>
<td>write</td>
</tr>
<tr>
<td><code>timeZone</code></td>
<td>(value of <code>spark.sql.session.timeZone</code> configuration)</td>
<td>Sets the string that indicates a time zone ID to be used to format timestamps in the JSON datasources or partition values. The following formats of <code>timeZone</code> are supported:<br>
<ul>
<li>Region-based zone ID: It should have the form 'area/city', such as 'America/Los_Angeles'.</li>
<li>Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
</ul>
Other short names like 'CST' are not recommended to use because they can be ambiguous.
</td>
<td>read/write</td>
</tr>
</table>
Other generic options can be found in <a href="https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html">Generic File Source Options</a>.
1 change: 0 additions & 1 deletion docs/sql-data-sources-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,6 @@ Data source options of JSON can be set via:
<table>
<thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr></thead>
<tr>
<!-- TODO(SPARK-35433): Add timeZone to Data Source Option for CSV, too. -->
<td><code>timeZone</code></td>
<td>(value of <code>spark.sql.session.timeZone</code> configuration)</td>
<td>Sets the string that indicates a time zone ID to be used to format timestamps in the JSON datasources or partition values. The following formats of <code>timeZone</code> are supported:<br>
Expand Down

0 comments on commit 35fbedb

Please sign in to comment.