v0.2.0
What's Changed
- Restructure the repository to distinguish/separate runtime libraries by @daw3rd in #140
- Move transform code into ray subdirectory - towards splitting transform runtimes. by @daw3rd in #143
- restore lost transforms/universal/noop/ray content by @daw3rd in #144
- New Readme file created for memory and endurance tests by @shahrokhDaijavad in #145
- LAB to Kit by @shahrokhDaijavad in #147
- Update ray/README.md by @eltociear in #148
- kfp multi jobs by @blublinsky in #142
- small fix in the init file by @blublinsky in #150
- rename make targets to be ray-specific by @daw3rd in #146
- Naming, docs and fix for recent binary file processing changes by @daw3rd in #153
- bug fixes by @blublinsky in #155
- Binary by @blublinsky in #141
- update kfp image version by @roytman in #159
- Update README.md for Broken links by @shahrokhDaijavad in #160
- adding multi_launcher tests by @blublinsky in #164
- Enable kfp in GH action for testing workflows by @revit13 in #149
- Fix paths in examples scripts. by @revit13 in #180
- Fail workflow if input size is empty. by @revit13 in #181
- library versions update by @blublinsky in #186
- Handle empty input parameter. by @Mohammad-nassar10 in #158
- Moving kfp workflows transform_workflows to transform directory. by @revit13 in #151
- update KFP docs by @roytman in #189
- Dev2 by @roytman in #191
- Modified ingress config (#130) by @D-Sai-Venkatesh in #156
- fixed flush in transform_file_processor.py by @blublinsky in #190
- added PLI related language extensions by @jitendrasinghibm in #177
- more fixes to the transform file processor by @blublinsky in #195
- Spark runtime by @cmadam in #183
- Fix white check marks in top readme. by @daw3rd in #199
- Minor fixes to kind/README.md. by @revit13 in #208
- Add utils functions to kfp support lib. by @Mohammad-nassar10 in #209
- Add Super pipeline for code transforms. by @revit13 in #172
- Tutorial README files fixes by @shahrokhDaijavad in #214
- Added copyright to the Spark files by @cmadam in #207
- Fix dependabot alert on tqdm in fdedup. by @daw3rd in #218
- Update filter_local.py by @shahrokhDaijavad in #217
- Split data-processing-lib/ray into python and ray. by @daw3rd in #213
- Enhanced the default 'make clean' rule to delete python leftovers and… by @daw3rd in #219
- small fixes by @roytman in #220
- Fixes after testing. by @revit13 in #223
- Change kfp_v1_workflow_support. by @revit13 in #227
- Split noop ray transform into ray and python runtimes. by @daw3rd in #221
- Fix tqdm security issue in ededup by @daw3rd in #224
- Tansform project conventions doc and makefile fix… by @daw3rd in #229
- Fixes after testing. by @revit13 in #232
- Runtime reorg by @daw3rd in #230
- Auto generate kfp pipelines. by @Mohammad-nassar10 in #193
- ingest to parquet rewrite by @blublinsky in #231
- KFPv2 support step 1 by @roytman in #226
- Rename of ingest_2_parquet file. by @daw3rd in #241
- Make all top level make targets pass w/o error by @daw3rd in #247
- Readme, pyproject metadata and makefile fixes in noop and filter. by @daw3rd in #240
- add retries counter to data processing by @blublinsky in #245
- Initial split of tokenization transform into ray and python by @daw3rd in #243
- add language identification transform module by @dtsuzuku-ibm in #256
- small changes to get ready for pdf by @blublinsky in #261
- Combine the common KFP support code in a shared library by @roytman in #253
- Fix tasks tags in kfp workflows. by @revit13 in #236
- Adjust ingest_2_parquet workflow. by @revit13 in #248
- Repo Root README and CONTRIBUTING clarifications by @shahrokhDaijavad in #264
- add build-language job to build-images workflow by @dtsuzuku-ibm in #268
- remove the artifactory settings by @roytman in #280
- update docs for KFPv2 by @roytman in #279
- Enhancing some README files by @shahrokhDaijavad in #278
- extended logging to print % and number processed files by @blublinsky in #272
- Updated transform readmes to reference correct runtime when describing cli params. by @daw3rd in #284
- Update advanced-transform-tutorial.md by @shahrokhDaijavad in #287
- add test-language job by @dtsuzuku-ibm in #286
- Change execution log file name. by @Mohammad-nassar10 in #251
- Update tests for KFP v2. by @revit13 in #255
- remove entire pipeline timeouts by @roytman in #270
- Randomly choose workflow to run in GH action. by @revit13 in #281
- Change the docker user as root by @takuyagt in #291
- Initial version of profiler by @blublinsky in #269
- Minimum explanation for VS Code by @shahrokhDaijavad in #290
- move logger to ensure Ray logging is correct by @blublinsky in #301
- Use dpk user for malware python image by @takuyagt in #304
- Move hack dirs to scripts dir by @revit13 in #295
- Fix issue #274 for venv corruption via make -n venv by @daw3rd in #302
- Installation of minio added to the transform README files by @shahrokhDaijavad in #303
- Minor fixes to profiler workflow by @revit13 in #308
- Ray version update by @blublinsky in #305
- update notebook by @shivdeep-singh-ibm in #310
- Split code quality, malware and proglang select transforms into python and ray. by @daw3rd in #288
- renaming of ingest_2_parquet by @blublinsky in #316
- move transform exceptions doc out of ray runtime to overview by @daw3rd in #319
- Inputcode2parquet rename by @daw3rd in #320
- fault tolerance by @blublinsky in #321
- Makefile rules updates by @revit13 in #323
- updated pyarrow version by @blublinsky in #325
- Fix make run-cli-sample for code2parquet by @daw3rd in #328
- Updated generate (simple pipeline) pipeline by @D-Sai-Venkatesh in #311
- Some new thoughts on cutting a release, especially scripts/release.sh by @daw3rd in #309
- Corrected Readme to update file path, added more detail signoff steps by @santoshborse in #330
- improve doc on transform design/expectations by @daw3rd in #331
- fix a typo by @roytman in #333
- Improvements to code2parquet transform by @daw3rd in #329
- implementing missing pyproject on transforms by @blublinsky in #327
- add new params to lang_id to store the results of language identification by @dtsuzuku-ibm in #322
- change content column name used in wf script by @dtsuzuku-ibm in #340
- small bug fixes by @blublinsky in #342
- fix typos by @roytman in #341
- update top readme table of transforms by @daw3rd in #344
- update the kfp release process by @roytman in #338
- remove globals in ray transforms that should insteads be references to the python transform globals by @daw3rd in #336
- Update K8s cluster deployment by @revit13 in #334
- Fix Instruction to create NOOP transformer by @santoshborse in #346
- Add workflow-build target by @revit13 in #348
- Update readme to point to new code2parquet transform by @Bytes-Explorer in #349
- add new release docs and stop publishing in script for 0.2.0 by @daw3rd in #337
- Add ingest2parquet step to superpipeline. by @Mohammad-nassar10 in #273
- code2parquet fixes on domain/snapshot and document_id by @daw3rd in #347
- add kfp_ray README files by @roytman in #351
- Changes in code2parquet, ingest2parquet, and advance tutorial readmes. by @daw3rd in #352
- disable debug flag, by default, in release-branch.sh by @daw3rd in #353
- Update release-branch script to not verify commits to avoid failures by @daw3rd in #354
New Contributors
- @eltociear made their first contribution in #148
- @Mohammad-nassar10 made their first contribution in #158
- @D-Sai-Venkatesh made their first contribution in #156
- @jitendrasinghibm made their first contribution in #177
- @cmadam made their first contribution in #183
- @dtsuzuku-ibm made their first contribution in #256
- @takuyagt made their first contribution in #291
Full Changelog: v0.1.0-dpk...v0.2.0