Skip to content

Commit

Permalink
Rebasing branch with latest 1.0 upstream
Browse files Browse the repository at this point in the history
  • Loading branch information
kunaljubce committed Sep 7, 2024
1 parent 9d034d4 commit 086a5c3
Show file tree
Hide file tree
Showing 3 changed files with 73 additions and 13 deletions.
9 changes: 2 additions & 7 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,6 @@ jobs:
PIP_PACKAGES: ${{ matrix.pip-packages }}
run: poetry run pip install $PIP_PACKAGES # Using pip shouldn't mess up poetry cache

- name: Run pre-commit hooks
run: |
pre-commit install
pre-commit run -a
- name: Run tests with pytest against PySpark ${{ matrix.pyspark-version }}
run: make test

Expand All @@ -79,9 +74,9 @@ jobs:
run: |
if [[ "${SPARK_VERSION}" > "3.4" ]]; then
sh scripts/run_spark_connect_server.sh
# The tests should be called from here.
make test
else
echo "Skipping Spark-Connect tests for Spark version <= 3.4"
echo "Skipping Spark-Connect tests for Spark version ${SPARK_VERSION}, which is <= 3.4"
fi
check-license-headers:
Expand Down
10 changes: 6 additions & 4 deletions .github/workflows/lint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,9 @@ jobs:
- uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Run Ruff
uses: chartboost/ruff-action@v1
with:
version: 0.5.2

- name: Run pre-commit hooks
run: |
pip install pre-commit
pre-commit install
pre-commit run -a
67 changes: 65 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,25 +73,88 @@ You can run test as following:
make test
```

### GitHub Actions local setup using 'act'
#### GitHub Actions local setup using 'act'

You can run GitHub Actions locally using the `act` tool. The configuration for GitHub Actions is in
the `.github/workflows/ci.yml` file. To install `act`, follow the
instructions [here](https://github.com/nektos/act#installation). To run a specific job, use:

You can run GitHub Actions locally using the `act` tool. The configuration for GitHub Actions is in the `.github/workflows/ci.yml` file. To install `act`, follow the instructions [here](https://github.com/nektos/act#installation). To run a specific job, use:
```shell
act -j <job-name>
```

For example, to run the `test` job, use:

```shell
act -j test
```

If you need help with `act`, use:

```shell
act --help
```

For MacBooks with M1 processors, you might have to add the `--container-architecture` tag:

```shell
act -j <job-name> --container-architecture linux/arm64
```

#### Running Spark-Connect tests locally

To run the Spark-Connect tests locally, follow the below steps. Please note, this only works on Mac/UNIX-based systems.

1. **Set up the required environment variables:** Following variables need to be setup, so that the shell script that
is used to install the Spark-Connect binary & start the server picks the version.

The version can either be `3.5.1` or `3.4.3`, as those are the ones used in our CI.

```shell
export SPARK_VERSION=3.5.1
export SPARK_CONNECT_MODE_ENABLED=1
```

2. **Check if the required environment variables are set:** Run the below command to check if the required environment
variables are set.

```shell
echo $SPARK_VERSION
echo $SPARK_CONNECT_MODE_ENABLED
```

3. **Install required system packages:** Run the below command to install wget.

For Mac users:
```shell
brew install wget
```

For Ubuntu users:
```shell
sudo apt-get install wget
```

4. **Execute the shell script:** Run the below command to execute the shell script that installs the Spark-Connect &
starts the server.

```shell
sh scripts/run_spark_connect_server.sh
```

5. **Run the tests:** Run the below command to execute the tests using Spark-Connect.

```shell
make test
```

6. **Cleanups:** After running the tests, you can stop the Spark-Connect server and unset the environment variables.

```shell
unset SPARK_VERSION
unset SPARK_CONNECT_MODE_ENABLED
```

### Code style

This project follows the [PySpark style guide](https://github.com/MrPowers/spark-style-guide/blob/main/PYSPARK_STYLE_GUIDE.md). All public functions and methods should be documented in `README.md` and also should have docstrings in `sphinx format`:
Expand Down

0 comments on commit 086a5c3

Please sign in to comment.