-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update README docs for language transforms #800
base: dev
Are you sure you want to change the base?
Conversation
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dolfim-ibm @shahrokhDaijavad What do you guys think of adding a section like this one below to show how a user can invoke the transform once they have done a pip install (alternative to cloning the repo)::
import ast
import sys
from data_processing.runtime.pure_python import PythonTransformLauncher
from data_processing.utils import ParamsUtils
from pdf2parquet_transform_python import Pdf2ParquetPythonTransformConfiguration
local_conf = {
"input_folder": “input”,
"output_folder": “output”,
}
params = {
"data_local_config": ParamsUtils.convert_to_ast(local_conf),
"data_files_to_use": ast.literal_eval("['.pdf','.docx','.pptx','.zip']"),
}
sys.argv = ParamsUtils.dict_to_req(d=params)
launcher = PythonTransformLauncher(runtime_config=Pdf2ParquetPythonTransformConfiguration())
launcher.launch()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job, @dolfim-ibm! Great job with the README files for all three transforms. They follow the template.
What @touma-I is suggesting would be to add these lines of code in the section that says "Code example" and has the link to the upcoming Notebook example. These lines, together with the pip install, will be used in the Notebook, but they could also be used in a Python example that is not Notebook. I am ok either way: 1) Wait for the Notebook or 2) Add the lines now.
@dolfim-ibm Please don't pick option 1 because it will make it easier on you! Maroun's question is how useful it is to have these lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@touma-I @shahrokhDaijavad I was actually adding the code block already, but then I realized it was 1-to-1 exactly the content of the example script. Instead of having to maintain multiple versions of it (with the high-risk) of being outdated, I think that linking to the example is still ok.
Honestly, I think the best is to plan in terms of a documentation engine which can embed working code examples, and to ensure in CI that those example codes are being executed.
Why are these changes needed?
Updates for the
pdf2parquer
,doc_chunk
andtext_encoder
transforms.Related issue number (if any).
#753