Skip to content

Commit

Permalink
Merge pull request #11218 from NVIDIA/merge-branch-24.06-to-main
Browse files Browse the repository at this point in the history
Merge branch-24.06 into main [skip ci]
  • Loading branch information
NvTimLiu authored Jul 18, 2024
2 parents f932a78 + 54c9d96 commit fa8a691
Show file tree
Hide file tree
Showing 73 changed files with 240 additions and 148 deletions.
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Change log
Generated on 2024-06-13
Generated on 2024-07-18

## Release 24.06

Expand Down Expand Up @@ -48,6 +48,12 @@ Generated on 2024-06-13
### PRs
|||
|:---|:---|
|[#11221](https://github.com/NVIDIA/spark-rapids/pull/11221)|Change cudf version back to 24.06.0-SNAPSHOT [skip ci]|
|[#11217](https://github.com/NVIDIA/spark-rapids/pull/11217)|Update latest changelog [skip ci]|
|[#11211](https://github.com/NVIDIA/spark-rapids/pull/11211)|Use fixed seed for test_from_json_struct_decimal|
|[#11203](https://github.com/NVIDIA/spark-rapids/pull/11203)|Update version to 24.06.1-SNAPSHOT|
|[#11205](https://github.com/NVIDIA/spark-rapids/pull/11205)|Update docs for 24.06.1 release [skip ci]|
|[#11056](https://github.com/NVIDIA/spark-rapids/pull/11056)|Update latest changelog [skip ci]|
|[#11052](https://github.com/NVIDIA/spark-rapids/pull/11052)|Add spark343 shim for scala2.13 dist jar|
|[#10981](https://github.com/NVIDIA/spark-rapids/pull/10981)|Update latest changelog [skip ci]|
|[#10984](https://github.com/NVIDIA/spark-rapids/pull/10984)|[DOC] Update docs for 24.06.0 release [skip ci]|
Expand Down
8 changes: 4 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,15 +130,15 @@ mvn -pl dist -PnoSnapshots package -DskipTests
Verify that shim-specific classes are hidden from a conventional classloader.

```bash
$ javap -cp dist/target/rapids-4-spark_2.12-24.06.0-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl
$ javap -cp dist/target/rapids-4-spark_2.12-24.06.1-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl
Error: class not found: com.nvidia.spark.rapids.shims.SparkShimImpl
```

However, its bytecode can be loaded if prefixed with `spark3XY` not contained in the package name

```bash
$ javap -cp dist/target/rapids-4-spark_2.12-24.06.0-cuda11.jar spark320.com.nvidia.spark.rapids.shims.SparkShimImpl | head -2
Warning: File dist/target/rapids-4-spark_2.12-24.06.0-cuda11.jar(/spark320/com/nvidia/spark/rapids/shims/SparkShimImpl.class) does not contain class spark320.com.nvidia.spark.rapids.shims.SparkShimImpl
$ javap -cp dist/target/rapids-4-spark_2.12-24.06.1-cuda11.jar spark320.com.nvidia.spark.rapids.shims.SparkShimImpl | head -2
Warning: File dist/target/rapids-4-spark_2.12-24.06.1-cuda11.jar(/spark320/com/nvidia/spark/rapids/shims/SparkShimImpl.class) does not contain class spark320.com.nvidia.spark.rapids.shims.SparkShimImpl
Compiled from "SparkShims.scala"
public final class com.nvidia.spark.rapids.shims.SparkShimImpl {
```
Expand Down Expand Up @@ -181,7 +181,7 @@ mvn package -pl dist -am -Dbuildver=340 -DallowConventionalDistJar=true
Verify `com.nvidia.spark.rapids.shims.SparkShimImpl` is conventionally loadable:
```bash
$ javap -cp dist/target/rapids-4-spark_2.12-24.06.0-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl | head -2
$ javap -cp dist/target/rapids-4-spark_2.12-24.06.1-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl | head -2
Compiled from "SparkShims.scala"
public final class com.nvidia.spark.rapids.shims.SparkShimImpl {
```
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ as a `provided` dependency.
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>
<scope>provided</scope>
</dependency>
```
4 changes: 2 additions & 2 deletions aggregator/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-jdk-profiles_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>
<relativePath>../jdk-profiles/pom.xml</relativePath>
</parent>
<artifactId>rapids-4-spark-aggregator_2.12</artifactId>
<name>RAPIDS Accelerator for Apache Spark Aggregator</name>
<description>Creates an aggregated shaded package of the RAPIDS plugin for Apache Spark</description>
<version>24.06.0</version>
<version>24.06.1</version>

<properties>
<rapids.module>aggregator</rapids.module>
Expand Down
4 changes: 2 additions & 2 deletions api_validation/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-shim-deps-parent_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>
<relativePath>../shim-deps/pom.xml</relativePath>
</parent>
<artifactId>rapids-4-spark-api-validation_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>

<properties>
<rapids.module>api_validation</rapids.module>
Expand Down
6 changes: 3 additions & 3 deletions datagen/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,12 @@ Where `$SPARK_VERSION` is a compressed version number, like 330 for Spark 3.3.0.

After this the jar should be at
`target/datagen_2.12-$PLUGIN_VERSION-spark$SPARK_VERSION.jar`
for example a Spark 3.3.0 jar for the 24.06.0 release would be
`target/datagen_2.12-24.06.0-spark330.jar`
for example a Spark 3.3.0 jar for the 24.06.1 release would be
`target/datagen_2.12-24.06.1-spark330.jar`

To get a spark shell with this you can run
```shell
spark-shell --jars target/datagen_2.12-24.06.0-spark330.jar
spark-shell --jars target/datagen_2.12-24.06.1-spark330.jar
```

After that you should be good to go.
Expand Down
2 changes: 1 addition & 1 deletion datagen/ScaleTest.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ $SPARK_HOME/bin/spark-submit \
--conf spark.sql.parquet.datetimeRebaseModeInWrite=CORRECTED \
--class com.nvidia.rapids.tests.scaletest.ScaleTestDataGen \ # the main class
--jars $SPARK_HOME/examples/jars/scopt_2.12-3.7.1.jar \ # one dependency jar just shipped with Spark under $SPARK_HOME
./target/datagen_2.12-24.06.0-spark332.jar \
./target/datagen_2.12-24.06.1-spark332.jar \
1 \
10 \
parquet \
Expand Down
4 changes: 2 additions & 2 deletions datagen/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,13 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-shim-deps-parent_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>
<relativePath>../shim-deps/pom.xml</relativePath>
</parent>
<artifactId>datagen_2.12</artifactId>
<name>Data Generator</name>
<description>Tools for generating large amounts of data</description>
<version>24.06.0</version>
<version>24.06.1</version>
<properties>
<rapids.module>datagen</rapids.module>
<target.classifier/>
Expand Down
4 changes: 2 additions & 2 deletions delta-lake/delta-20x/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-jdk-profiles_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>
<relativePath>../../jdk-profiles/pom.xml</relativePath>
</parent>

<artifactId>rapids-4-spark-delta-20x_2.12</artifactId>
<name>RAPIDS Accelerator for Apache Spark Delta Lake 2.0.x Support</name>
<description>Delta Lake 2.0.x support for the RAPIDS Accelerator for Apache Spark</description>
<version>24.06.0</version>
<version>24.06.1</version>

<properties>
<rapids.module>../delta-lake/delta-20x</rapids.module>
Expand Down
4 changes: 2 additions & 2 deletions delta-lake/delta-21x/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-jdk-profiles_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>
<relativePath>../../jdk-profiles/pom.xml</relativePath>
</parent>

<artifactId>rapids-4-spark-delta-21x_2.12</artifactId>
<name>RAPIDS Accelerator for Apache Spark Delta Lake 2.1.x Support</name>
<description>Delta Lake 2.1.x support for the RAPIDS Accelerator for Apache Spark</description>
<version>24.06.0</version>
<version>24.06.1</version>

<properties>
<rapids.module>../delta-lake/delta-21x</rapids.module>
Expand Down
4 changes: 2 additions & 2 deletions delta-lake/delta-22x/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-jdk-profiles_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>
<relativePath>../../jdk-profiles/pom.xml</relativePath>
</parent>

<artifactId>rapids-4-spark-delta-22x_2.12</artifactId>
<name>RAPIDS Accelerator for Apache Spark Delta Lake 2.2.x Support</name>
<description>Delta Lake 2.2.x support for the RAPIDS Accelerator for Apache Spark</description>
<version>24.06.0</version>
<version>24.06.1</version>

<properties>
<rapids.module>../delta-lake/delta-22x</rapids.module>
Expand Down
4 changes: 2 additions & 2 deletions delta-lake/delta-23x/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-parent_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>
<relativePath>../../pom.xml</relativePath>
</parent>

<artifactId>rapids-4-spark-delta-23x_2.12</artifactId>
<name>RAPIDS Accelerator for Apache Spark Delta Lake 2.3.x Support</name>
<description>Delta Lake 2.3.x support for the RAPIDS Accelerator for Apache Spark</description>
<version>24.06.0</version>
<version>24.06.1</version>

<properties>
<rapids.module>../delta-lake/delta-23x</rapids.module>
Expand Down
4 changes: 2 additions & 2 deletions delta-lake/delta-24x/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-jdk-profiles_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>
<relativePath>../../jdk-profiles/pom.xml</relativePath>
</parent>

<artifactId>rapids-4-spark-delta-24x_2.12</artifactId>
<name>RAPIDS Accelerator for Apache Spark Delta Lake 2.4.x Support</name>
<description>Delta Lake 2.4.x support for the RAPIDS Accelerator for Apache Spark</description>
<version>24.06.0</version>
<version>24.06.1</version>

<properties>
<rapids.module>../delta-lake/delta-24x</rapids.module>
Expand Down
4 changes: 2 additions & 2 deletions delta-lake/delta-spark330db/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-jdk-profiles_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>
<relativePath>../../jdk-profiles/pom.xml</relativePath>
</parent>

<artifactId>rapids-4-spark-delta-spark330db_2.12</artifactId>
<name>RAPIDS Accelerator for Apache Spark Databricks 11.3 Delta Lake Support</name>
<description>Databricks 11.3 Delta Lake support for the RAPIDS Accelerator for Apache Spark</description>
<version>24.06.0</version>
<version>24.06.1</version>

<properties>
<rapids.module>../delta-lake/delta-spark330db</rapids.module>
Expand Down
4 changes: 2 additions & 2 deletions delta-lake/delta-spark332db/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-jdk-profiles_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>
<relativePath>../../jdk-profiles/pom.xml</relativePath>
</parent>

<artifactId>rapids-4-spark-delta-spark332db_2.12</artifactId>
<name>RAPIDS Accelerator for Apache Spark Databricks 12.2 Delta Lake Support</name>
<description>Databricks 12.2 Delta Lake support for the RAPIDS Accelerator for Apache Spark</description>
<version>24.06.0</version>
<version>24.06.1</version>

<properties>
<rapids.module>../delta-lake/delta-spark332db</rapids.module>
Expand Down
4 changes: 2 additions & 2 deletions delta-lake/delta-spark341db/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-jdk-profiles_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>
<relativePath>../../jdk-profiles/pom.xml</relativePath>
</parent>

<artifactId>rapids-4-spark-delta-spark341db_2.12</artifactId>
<name>RAPIDS Accelerator for Apache Spark Databricks 13.3 Delta Lake Support</name>
<description>Databricks 13.3 Delta Lake support for the RAPIDS Accelerator for Apache Spark</description>
<version>24.06.0</version>
<version>24.06.1</version>

<properties>
<rapids.compressed.artifact>false</rapids.compressed.artifact>
Expand Down
4 changes: 2 additions & 2 deletions delta-lake/delta-stub/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-jdk-profiles_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>
<relativePath>../../jdk-profiles/pom.xml</relativePath>
</parent>

<artifactId>rapids-4-spark-delta-stub_2.12</artifactId>
<name>RAPIDS Accelerator for Apache Spark Delta Lake Stub</name>
<description>Delta Lake stub for the RAPIDS Accelerator for Apache Spark</description>
<version>24.06.0</version>
<version>24.06.1</version>

<properties>
<rapids.module>../delta-lake/delta-stub</rapids.module>
Expand Down
4 changes: 2 additions & 2 deletions dist/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-jdk-profiles_2.12</artifactId>
<version>24.06.0</version>
<version>24.06.1</version>
<relativePath>../jdk-profiles/pom.xml</relativePath>
</parent>
<artifactId>rapids-4-spark_2.12</artifactId>
<name>RAPIDS Accelerator for Apache Spark Distribution</name>
<description>Creates the distribution package of the RAPIDS plugin for Apache Spark</description>
<version>24.06.0</version>
<version>24.06.1</version>
<dependencies>
<dependency>
<groupId>com.nvidia</groupId>
Expand Down
84 changes: 84 additions & 0 deletions docs/archive.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,90 @@ nav_order: 15
---
Below are archived releases for RAPIDS Accelerator for Apache Spark.

## Release v24.06.0
### Hardware Requirements:

The plugin is tested on the following architectures:

GPU Models: NVIDIA V100, T4, A10/A100, L4 and H100 GPUs

### Software Requirements:

OS: Ubuntu 20.04, Ubuntu 22.04, CentOS 7, or Rocky Linux 8

NVIDIA Driver*: R470+

Runtime:
Scala 2.12, 2.13
Python, Java Virtual Machine (JVM) compatible with your spark-version.

* Check the Spark documentation for Python and Java version compatibility with your specific
Spark version. For instance, visit `https://spark.apache.org/docs/3.4.1` for Spark 3.4.1.

Supported Spark versions:
Apache Spark 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4
Apache Spark 3.3.0, 3.3.1, 3.3.2, 3.3.3, 3.3.4
Apache Spark 3.4.0, 3.4.1, 3.4.2, 3.4.3
Apache Spark 3.5.0, 3.5.1

Supported Databricks runtime versions for Azure and AWS:
Databricks 11.3 ML LTS (GPU, Scala 2.12, Spark 3.3.0)
Databricks 12.2 ML LTS (GPU, Scala 2.12, Spark 3.3.2)
Databricks 13.3 ML LTS (GPU, Scala 2.12, Spark 3.4.1)

Supported Dataproc versions (Debian/Ubuntu):
GCP Dataproc 2.0
GCP Dataproc 2.1

Supported Dataproc Serverless versions:
Spark runtime 1.1 LTS
Spark runtime 2.0
Spark runtime 2.1
Spark runtime 2.2

*Some hardware may have a minimum driver version greater than R470. Check the GPU spec sheet
for your hardware's minimum driver version.

*For Cloudera and EMR support, please refer to the
[Distributions](https://docs.nvidia.com/spark-rapids/user-guide/latest/faq.html#which-distributions-are-supported) section of the FAQ.

### RAPIDS Accelerator's Support Policy for Apache Spark
The RAPIDS Accelerator maintains support for Apache Spark versions available for download from [Apache Spark](https://spark.apache.org/downloads.html)

### Download RAPIDS Accelerator for Apache Spark v24.06.0

| Processor | Scala Version | Download Jar | Download Signature |
|-----------|---------------|--------------|--------------------|
| x86_64 | Scala 2.12 | [RAPIDS Accelerator v24.06.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.06.0/rapids-4-spark_2.12-24.06.0.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.06.0/rapids-4-spark_2.12-24.06.0.jar.asc) |
| x86_64 | Scala 2.13 | [RAPIDS Accelerator v24.06.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.06.0/rapids-4-spark_2.13-24.06.0.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.06.0/rapids-4-spark_2.13-24.06.0.jar.asc) |
| arm64 | Scala 2.12 | [RAPIDS Accelerator v24.06.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.06.0/rapids-4-spark_2.12-24.06.0-cuda11-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.06.0/rapids-4-spark_2.12-24.06.0-cuda11-arm64.jar.asc) |
| arm64 | Scala 2.13 | [RAPIDS Accelerator v24.06.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.06.0/rapids-4-spark_2.13-24.06.0-cuda11-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.06.0/rapids-4-spark_2.13-24.06.0-cuda11-arm64.jar.asc) |

This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with
CUDA 11.8 through CUDA 12.0.

### Verify signature
* Download the [PUB_KEY](https://keys.openpgp.org/[email protected]).
* Import the public key: `gpg --import PUB_KEY`
* Verify the signature for Scala 2.12 jar:
`gpg --verify rapids-4-spark_2.12-24.06.0.jar.asc rapids-4-spark_2.12-24.06.0.jar`
* Verify the signature for Scala 2.13 jar:
`gpg --verify rapids-4-spark_2.13-24.06.0.jar.asc rapids-4-spark_2.13-24.06.0.jar`

The output of signature verify:

gpg: Good signature from "NVIDIA Spark (For the signature of spark-rapids release jars) <[email protected]>"

### Release Notes
* Improve support for Unity Catalog on Databricks
* Added support for parse_url PATH
* Added support for array_filter
* Added support for Spark 3.4.3
* For updates on RAPIDS Accelerator Tools, please visit [this link](https://github.com/NVIDIA/spark-rapids-tools/releases)

For a detailed list of changes, please refer to the
[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).

## Release v24.04.1
### Hardware Requirements:

Expand Down
2 changes: 1 addition & 1 deletion docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The following is the list of options that `rapids-plugin-4-spark` supports.
On startup use: `--conf [conf key]=[conf value]`. For example:

```
${SPARK_HOME}/bin/spark-shell --jars rapids-4-spark_2.12-24.06.0-cuda11.jar \
${SPARK_HOME}/bin/spark-shell --jars rapids-4-spark_2.12-24.06.1-cuda11.jar \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.rapids.sql.concurrentGpuTasks=2
```
Expand Down
Loading

0 comments on commit fa8a691

Please sign in to comment.