Skip to content

Commit

Permalink
Merge pull request #407 from IBM/Readme-Changes
Browse files Browse the repository at this point in the history
Readme changes
  • Loading branch information
daw3rd authored Jul 12, 2024
2 parents 240bfae + 28d2310 commit c334038
Show file tree
Hide file tree
Showing 9 changed files with 7 additions and 131 deletions.
10 changes: 7 additions & 3 deletions doc/quick-start/new-transform-outside.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ class HelloTransformConfiguration(TransformConfiguration):
'''
def __init__(self):
super().__init__(
name="add_column",
name="hello",
transform_class=HelloTransform,
)

Expand Down Expand Up @@ -110,6 +110,7 @@ To run the transform in the pure python runtime, we create
```python
from data_processing.runtime.pure_python import PythonTransformRuntimeConfiguration, PythonTransformLauncher
from hello_transform import HelloTransformConfiguration

class HelloPythonConfiguration(PythonTransformRuntimeConfiguration):
'''
Configures the python runtime to use the Hello transform
Expand All @@ -125,12 +126,15 @@ if __name__ == "__main__":
```

### Running
In the following `parquet-tools` will be helpful here. Install with
In the following, `parquet-tools` will be helpful. Install with
```shell
% source venv/bin/activate
(venv) % pip install parquet-tools
```
We will the transform on a single parquet file in a directory named `input`.
We will the transform a single parquet file in a directory named
`input`.
The directory may contain more than one parquet file,
in which case they will all be processed.
We can examine the input as follows:
```shell
% source venv/bin/activate
Expand Down
16 changes: 0 additions & 16 deletions transforms/code/code2parquet/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,22 +123,6 @@ To see results of the transform.
---------------------------------


### Transforming local data

Beginning with version 0.2.1, most/all python transform images are built with directories for mounting local data for processing.
Those directories are `/home/dpk/input` and `/home/dpk/output`.

After using `make image` to build the transform image, you can process the data
in the `/home/me/input` directory and place it in the `/home/me/output` directory, for example, using the 0.2.1 tagged image as follows:

```shell
docker run --rm -v /home/me/input:/home/dpk/input -v /home/me/output:/home/dpk/output code2parquet-python:0.2.1 \
python code2parquet_transform_python.py --data_local_config "{ 'input_folder' : '/home/dpk/input', 'output_folder' : '/home/dpk/output'}"
```

You may also use the pre-built images on quay.io using `quay.io/dataprep1/data-prep-kit//code2parquet-python:0.2.1` as the image name.


### Transforming data using the transform image

To use the transform image to transform your data, please refer to the
Expand Down
16 changes: 0 additions & 16 deletions transforms/code/code_quality/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,22 +66,6 @@ ls output
```
To see results of the transform.

### Transforming local data

Beginning with version 0.2.1, most/all python transform images are built with directories for mounting local data for processing.
Those directories are `/home/dpk/input` and `/home/dpk/output`.

After using `make image` to build the transform image, you can process the data
in the `/home/me/input` directory and place it in the `/home/me/output` directory, for example, using the 0.2.1 tagged image as follows:

```shell
docker run --rm -v /home/me/input:/home/dpk/input -v /home/me/output:/home/dpk/output code_quality-python:0.2.1 \
python code_quality_transform_python.py --data_local_config "{ 'input_folder' : '/home/dpk/input', 'output_folder' : '/home/dpk/output'}"
```

You may also use the pre-built images on quay.io using `quay.io/dataprep1/data-prep-kit//code_quality-python:0.2.1` as the image name.


### Transforming data using the transform image

To use the transform image to transform your data, please refer to the
Expand Down
16 changes: 0 additions & 16 deletions transforms/code/malware/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,22 +102,6 @@ ls output
```
To see results of the transform.

### Transforming local data

Beginning with version 0.2.1, most/all python transform images are built with directories for mounting local data for processing.
Those directories are `/home/dpk/input` and `/home/dpk/output`.

After using `make image` to build the transform image, you can process the data
in the `/home/me/input` directory and place it in the `/home/me/output` directory, for example, using the 0.2.1 tagged image as follows:

```shell
docker run --rm -v /home/me/input:/home/dpk/input -v /home/me/output:/home/dpk/output malware-python:0.2.1 \
python malware_transform_python.py --data_local_config "{ 'input_folder' : '/home/dpk/input', 'output_folder' : '/home/dpk/output'}"
```

You may also use the pre-built images on quay.io using `quay.io/dataprep1/data-prep-kit//malware-python:0.2.1` as the image name.


### Transforming data using the transform image

To use the transform image to transform your data, please refer to the
Expand Down
16 changes: 0 additions & 16 deletions transforms/code/proglang_select/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,22 +73,6 @@ ls output
```
To see results of the transform.

### Transforming local data

Beginning with version 0.2.1, most/all python transform images are built with directories for mounting local data for processing.
Those directories are `/home/dpk/input` and `/home/dpk/output`.

After using `make image` to build the transform image, you can process the data
in the `/home/me/input` directory and place it in the `/home/me/output` directory, for example, using the 0.2.1 tagged image as follows:

```shell
docker run --rm -v /home/me/input:/home/dpk/input -v /home/me/output:/home/dpk/output proglang_select-python:0.2.1 \
python proglang_select_transform_python.py --data_local_config "{ 'input_folder' : '/home/dpk/input', 'output_folder' : '/home/dpk/output'}"
```

You may also use the pre-built images on quay.io using `quay.io/dataprep1/data-prep-kit//proglang_select-python:0.2.1` as the image name.


### Transforming data using the transform image

To use the transform image to transform your data, please refer to the
Expand Down
16 changes: 0 additions & 16 deletions transforms/language/lang_id/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,22 +56,6 @@ To see results of the transform.
For M1 Mac user, if you see following error during make command, `error: command '/usr/bin/clang' failed with exit code 1`, you may better follow [this step](https://freeman.vc/notes/installing-fasttext-on-an-m1-mac)


### Transforming local data

Beginning with version 0.2.1, most/all python transform images are built with directories for mounting local data for processing.
Those directories are `/home/dpk/input` and `/home/dpk/output`.

After using `make image` to build the transform image, you can process the data
in the `/home/me/input` directory and place it in the `/home/me/output` directory, for example, using the 0.2.1 tagged image as follows:

```shell
docker run --rm -v /home/me/input:/home/dpk/input -v /home/me/output:/home/dpk/output lang_id-python:0.2.1 \
python lang_id_transform_python.py --data_local_config "{ 'input_folder' : '/home/dpk/input', 'output_folder' : '/home/dpk/output'}"
```

You may also use the pre-built images on quay.io using `quay.io/dataprep1/data-prep-kit//lang_id-python:0.2.1` as the image name.


### Transforming data using the transform image

To use the transform image to transform your data, please refer to the
Expand Down
16 changes: 0 additions & 16 deletions transforms/universal/noop/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,22 +56,6 @@ ls output
```
To see results of the transform.

### Transforming local data

Beginning with version 0.2.1, most/all python transform images are built with directories for mounting local data for processing.
Those directories are `/home/dpk/input` and `/home/dpk/output`.

After using `make image` to build the transform image, you can process the data
in the `/home/me/input` directory and place it in the `/home/me/output` directory, for example, using the 0.2.1 tagged image as follows:

```shell
docker run --rm -v /home/me/input:/home/dpk/input -v /home/me/output:/home/dpk/output noop-python:0.2.1 \
python noop_transform_python.py --data_local_config "{ 'input_folder' : '/home/dpk/input', 'output_folder' : '/home/dpk/output'}"
```

You may also use the pre-built images on quay.io using `quay.io/dataprep1/data-prep-kit//noop-python:0.2.1` as the image name.


### Transforming data using the transform image

To use the transform image to transform your data, please refer to the
Expand Down
16 changes: 0 additions & 16 deletions transforms/universal/resize/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,22 +56,6 @@ the following command line arguments are available in addition to
'disk' makes an estimate of the resulting parquet file size.
```

### Transforming local data

Beginning with version 0.2.1, most/all python transform images are built with directories for mounting local data for processing.
Those directories are `/home/dpk/input` and `/home/dpk/output`.

After using `make image` to build the transform image, you can process the data
in the `/home/me/input` directory and place it in the `/home/me/output` directory, for example, using the 0.2.1 tagged image as follows:

```shell
docker run --rm -v /home/me/input:/home/dpk/input -v /home/me/output:/home/dpk/output resize-python:0.2.1 \
python resize_transform_python.py --data_local_config "{ 'input_folder' : '/home/dpk/input', 'output_folder' : '/home/dpk/output'}"
```

You may also use the pre-built images on quay.io using `quay.io/dataprep1/data-prep-kit//resize-python:0.2.1` as the image name.


### Transforming data using the transform image

To use the transform image to transform your data, please refer to the
Expand Down
16 changes: 0 additions & 16 deletions transforms/universal/tokenization/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,22 +93,6 @@ ls output
To see results of the transform.


### Transforming local data

Beginning with version 0.2.1, most/all python transform images are built with directories for mounting local data for processing.
Those directories are `/home/dpk/input` and `/home/dpk/output`.

After using `make image` to build the transform image, you can process the data
in the `/home/me/input` directory and place it in the `/home/me/output` directory, for example, using the 0.2.1 tagged image as follows:

```shell
docker run --rm -v /home/me/input:/home/dpk/input -v /home/me/output:/home/dpk/output tokenization-python:0.2.1 \
python tokenization_transform_python.py --data_local_config "{ 'input_folder' : '/home/dpk/input', 'output_folder' : '/home/dpk/output'}"
```

You may also use the pre-built images on quay.io using `quay.io/dataprep1/data-prep-kit//tokenization-python:0.2.1` as the image name.


### Transforming data using the transform image

To use the transform image to transform your data, please refer to the
Expand Down

0 comments on commit c334038

Please sign in to comment.