CDAP 6.1.1
ajainarayanan
released this
22 Jan 19:46
·
557 commits
to release/6.1
since this release
Summary
This release introduces a number of new features, improvements, and bug fixes to CDAP. Some of the main highlights of the release are:
-
Pipeline improvements
- Validation checks for plugins for early error detection and prevention
- New widgets for better pipeline configurability
- Wrangler ADLS connection
-
Field Level Lineage
- New, intuitive UI for field level lineage
- Field level lineage support for more plugins
-
Platform enhancements
- Performance improvements across the platform
- Migration of more UI components from Angular to React
New Features
- Added field level lineage support for Error Transform.(CDAP-16102)
- Added region support for google cloud plugins.(CDAP-16037)
- New UI landing page.(CDAP-15795)
- Allow plugin developers to define filters to show/hide properties based on custom plugin configuration logic..(CDAP-15789)
- Introduced new FailureCollector apis for better user experience via contextual error messages.(CDAP-15787)
- Added support for reading INT96 types in parquet file sources..(CDAP-15767)
- New ConfigurationGroup component in UI.(CDAP-15728)
- Added support for pipeline to run in shared vpc network.(CDAP-15723)
- Stage level validation for plugin properties..(CDAP-15619)
- Added a new REST endpoint that retrieves back all field lineage information about a dataset..(CDAP-15482)
- Added support for bytes types in the bigquery sink.(CDAP-15342)
Deprecation
- Removed the outdated Validator plugin. (CDAP-15917)
Bug Fixes
- Fix the preview run state after JVM restarted(CDAP-16193)
- content type detection now uses case insensitive file extensions(CDAP-16146)
- Fixed bug that prevents users from navigating to pipeline studio (indicating system artifacts being loaded for a long time).(CDAP-16137)
- Fixed the dataproc provisioner to log the error message if the dataproc creation operation fails.(CDAP-15973)
- Fixed a bug that caused pipeline startup to take longer than needed for cloud runs(CDAP-15899)
- Fixed regex usage in GCS and S3 source plugins.(CDAP-15879)
- Fixed a bug with the Datastore source that was overly restrictive when validating the user provided schema(CDAP-15878)
- Fixing a bug which can cause a thread spinning in an infinite while loop due to multi thread consumers on a queue that allows a single consumer.(CDAP-15809)
- Fixed a bug that caused pipeline failures when writing nullable byte fields as json.(CDAP-15770)
- Fixed a bug that caused MapReduce and Spark logs to be missing for remote pipeline runs(CDAP-15757)
- Fixed a race condition that could cause a program to get stuck in the pending state when stopped in the pending state(CDAP-15747)
- Added some safeguards to prevent cloud pipeline runs from getting stuck in certain edge cases(CDAP-15742)
- Fixed a bug where secure macros were not evaluated in preview mode(CDAP-15726)
- Fixed a bug in the BigQuery source that cause automatic bucket creation to fail if the dataset is in a different project.(CDAP-15617)
- Fix bug in new user tour on lower resolution screens(CDAP-15583)
- Fixed a bug that wrong resolution is used if a time range is specified for metrics query(CDAP-15554)
- Fixed an issue where BigQuery multi sink doesn't work if using an Oracle database as a source.(CDAP-15535)
- Fixed the dataproc provisioner to disable YARN pre-emptive container killing and to disable conscrypt. (CDAP-15498)
- Fixed a bug in the MLPredictor plugin that caused error when using a classification model(CDAP-15445)
- Fixed bug that didn't allow users to paste schema as runtime argument(CDAP-15423)
- Spark pipelines no longer try to run sinks in parallel unless runtime argument 'pipeline.spark.parallel.sinks.enabled' is set to 'true'. This prevents pipeline sections from being re-processed in the majority of situations.(CDAP-15388)
- Fixed the dataproc provisioner to handle networks that do not use automatic subnet creation(CDAP-15373)
- Fixed a Wrangler bug where the wrong jdbc driver would be used in some situations and where required classes could be unavailable.(CDAP-15353)
- Fixed a bug about artifact version comparison(CDAP-15221)
- Fixed a bug that the rollup of the workflow lineage does not remove the local datasets.(CDAP-15206)
- Expanding filename format that UI takes in when uploading artifacts.(CDAP-15097)
Improvements
- Fixed batch pipeline preview to read only the preview records instead of the full input.(CDAP-16110)
- Greatly improved the time it takes to calculate field level lineage(CDAP-16069)
- Set Spark as the default execution engine for batch pipeline(CDAP-15983)
- Improved error message for csv, tsv, and delimited formats when the schema has fewer fields than the data(CDAP-15794)
- Added support to automatically fill field level lineage for plugins that do not emit any(CDAP-15782)
- Upgrades Nodejs version from 8.x to 10.16.2(CDAP-15738)
- Added support to restore preview status after restart(CDAP-15677)
- Route user directly to the pipeline's detail page from pipeline card in Control Center. (CDAP-15659)
- New user experience for log level selection.(CDAP-15489)
- Added image version as a configuration setting to the dataproc provisioner(CDAP-15265)
- Improved the way pipelines with macros that are provided by intermediate stages run.(CDAP-16076)