-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add license check transform #257
add license check transform #257
Conversation
378a170
to
b539226
Compare
transforms/code/license_check/python/src/dpk_license_check_python/internal/__init__.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/license_check_local_pure.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/license_check_local_python.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/dpk_license_check_python/transform_config.py
Outdated
Show resolved
Hide resolved
3b48683
to
2066cb8
Compare
transforms/code/license_check/python/src/license_check_local_python.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/license_check_transform.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/license_check_transform.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/test-data/expected/metadata.json
Outdated
Show resolved
Hide resolved
@shivdeep-singh-ibm This module (folder) should be called license filtering |
@shivdeep-singh-ibm it seems that the make[4]: Leaving directory '/home/runner/work/data-prep-kit/data-prep-kit/transforms/code/ingest_2_parquet'
Using recursive workflow-build rule in license_check/
make[4]: Entering directory '/home/runner/work/data-prep-kit/data-prep-kit/transforms/code/license_check'
make -C kfp_ray/v1 workflow-build
make[5]: *** kfp_ray/v1: No such file or directory. Stop. |
2758803
to
7b7597c
Compare
41bcc4f
to
d3d6b41
Compare
transforms/code/license_check/python/src/license_check_transform.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/license_check_transform.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/license_check_transform.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/ray/src/license_check_local_ray.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/ray/test/test_license_check_ray.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor doc/readme comment. otherwise changes good to me.
@@ -0,0 +1,13 @@ | |||
# License Check | |||
|
|||
The License Check transform checks if the '`license` of input data is in approved/denied list as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence looks incomplete. The transformer verifies and marks the sample accordingly in the output right? should we convey that in this description.
c39d8ea
to
09221dd
Compare
@shivdeep-singh-ibm suggest you merge dev into this branch given some recent issues with versioning. See #355 |
bdf4f98
to
cff9a6d
Compare
transforms/code/license_check/python/src/license_check_transform.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/license_check_transform.py
Outdated
Show resolved
Hide resolved
`--lc_license_column_name` - set the name of the column holds license to process | ||
`--lc_allow_no_license` - allow entries with no associated license (default: false) | ||
`--lc_licenses_file` - S3 or local path to allowed/denied licenses JSON file | ||
`--lc_deny_licenses` - allow all licences except those in licenses_file (default: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you have your own DataAccessFactory, aren't there some lc_data params that would allow reading the license file from S3, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A hidden feature, to see the --help text for a given transform
make .transforms.test-image-help
This should show the -lc_data_* options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no additional data options used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, the code below from line 134 of this file will add the --lc_data_* parameters (specifically the last line below).
# Create the DataAccessFactor to use CLI args
self.daf = DataAccessFactory(CLI_PREFIX, False)
# Add the DataAccessFactory parameters to the transform's configuration parameters.
self.daf.add_input_params(parser)
3ffc512
to
c43993e
Compare
c43993e
to
24121c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addtion to my specific comments. a more global comment is to rename this to license_select which then matches proglang_select naming.
@@ -0,0 +1,11 @@ | |||
# License Check | |||
|
|||
The License Check transform checks if the `license` of input data is in approved/denied list. It is implemented as per the set of [transform project conventions](../../README.md#transform-project-conventions) the following runtimes are available: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more specific and say it selects only the specified licenses and filters out the other licenses..
task_image = "quay.io/dataprep1/data-prep-kit/license_check-ray:0.4.0.dev6" | ||
|
||
# components | ||
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.0.dev6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think you need to run make set-versions
as these are old. Probably should be 0.2.1.dev0 everywhere
|
||
# distribution versions is the same as image version. | ||
set-versions: | ||
$(MAKE) TRANSFORM_PYTHON_VERSION=${LICENSE_CHECK_PYTHON_VERSION} .transforms.set-versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now back to requiring TOML_VERSION. For example, from noop
$(MAKE) TRANSFORM_PYTHON_VERSION=$(NOOP_PYTHON_VERSION) TOML_VERSION=$(NOOP_PYTHON_VERSION) .transforms.set-versions ```
set-versions: | ||
$(MAKE) TRANSFORM_PYTHON_VERSION=${LICENSE_CHECK_PYTHON_VERSION} .transforms.set-versions | ||
|
||
build-dist:: set-versions .defaults.build-dist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove set-versions
here. The convention now is that this generally only to be called from the top of the repo outside of the makefiles (unless you get out of sync, like you are in the the _wf.py file and maybe others)
|
||
## Summary | ||
|
||
This filter scans the license column of an input dataset and appends the `license_status` column to the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more specific and say it selects only the specified licenses and filters out the other licenses. (same comment as on the top level readme)
Then | ||
|
||
ls output | ||
To see results of the transform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add the same last paragraph that all transforms now have...https://github.com/IBM/data-prep-kit/tree/dev/transforms/universal/noop/python#transforming-data-using-the-transform-image
set-versions: | ||
$(MAKE) TRANSFORM_PYTHON_VERSION=${LICENSE_CHECK_RAY_VERSION} .transforms.set-versions | ||
|
||
build-dist:: set-versions .defaults.build-dist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove set-versions
here
Also, not sure how far behind dev this is. It's probably a good idea to merge with dev |
fed1461
to
e4d6efa
Compare
EXEC_SCRIPT_NAME: str = "license_select_transform_ray.py" | ||
PREFIX: str = "" | ||
|
||
task_image = "quay.io/dataprep1/data-prep-kit/license_select-ray:0.4.0.dev6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are now using latest
tag on images
RUN pip install --no-cache-dir -e . | ||
|
||
# Added to enable running test | ||
RUN cp src/license_select_transform_python.py . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can/should be removed.
|
||
# Set environment | ||
ENV PYTHONPATH /home/dpk | ||
ENV PATH $PATH:/home/dpk/.local/bin/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why is this added?
|
||
TRANSFORM_NAME=license_select | ||
# $(REPOROOT)/.make.versions file contains the versions | ||
DOCKER_IMAGE_VERSION=${LICENSE_SELECT_PYTHON_VERSION} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DOCKER_IMAGE_VERSION can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shivdeep-singh-ibm can you please check this input & address it
|
||
TRANSFORM_NAME=license_select | ||
# $(REPOROOT)/.make.versions file contains the versions | ||
DOCKER_IMAGE_VERSION=${LICENSE_SELECT_RAY_VERSION} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DOCKER_IMAGE_VERSION can be removed.
# copy test | ||
COPY test/ test/ | ||
COPY test-data/ test-data/ | ||
ENV PATH $PATH:/home/dpk/.local/bin/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why you are setting PATH?
@@ -0,0 +1,39 @@ | |||
FROM docker.io/rayproject/ray:2.9.3-py310 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first lines of this file need to be updated. From code2parquet, for example
ARG BASE_IMAGE=docker.io/rayproject/ray:2.24.0-py310
FROM ${BASE_IMAGE}
RUN pip install --upgrade --no-cache-dir pip
# install pytest
RUN pip install --no-cache-dir pytest
RUN cd python-transform && pip install --no-cache-dir -e . | ||
|
||
COPY --chown=ray:users python-transform lc_python | ||
RUN cd lc_python && pip install . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this is a duplication of lines 12 and 13 above (the previous 2 statements)?
task_image = "quay.io/dataprep1/data-prep-kit/license_select-ray:0.4.0.dev6" | ||
|
||
# components | ||
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.0.dev6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this should be 0.2.1.dev0. you might try running make set-versions
from ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops this should also be latest
} | ||
} | ||
``` | ||
license_column_name - The name of the column with licenses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a list of bullets. as is, it is wrapping into a single paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
1293669
to
0f2c5b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
d9357c8
to
9e4ac28
Compare
Signed-off-by: Shivdeep Singh <[email protected]>
9e4ac28
to
6cb888b
Compare
Why are these changes needed?
Add license check filter
Related issue number (if any).