Skip to content

CDAP 6.1.1

Compare
Choose a tag to compare
@ajainarayanan ajainarayanan released this 22 Jan 19:46
· 557 commits to release/6.1 since this release
4c903b7

Summary

This release introduces a number of new features, improvements, and bug fixes to CDAP. Some of the main highlights of the release are:

  1. Pipeline improvements

    • Validation checks for plugins for early error detection and prevention
    • New widgets for better pipeline configurability
    • Wrangler ADLS connection
  2. Field Level Lineage

    • New, intuitive UI for field level lineage
    • Field level lineage support for more plugins
  3. Platform enhancements

    • Performance improvements across the platform
    • Migration of more UI components from Angular to React

New Features

  • Added field level lineage support for Error Transform.(CDAP-16102)
  • Added region support for google cloud plugins.(CDAP-16037)
  • New UI landing page.(CDAP-15795)
  • Allow plugin developers to define filters to show/hide properties based on custom plugin configuration logic..(CDAP-15789)
  • Introduced new FailureCollector apis for better user experience via contextual error messages.(CDAP-15787)
  • Added support for reading INT96 types in parquet file sources..(CDAP-15767)
  • New ConfigurationGroup component in UI.(CDAP-15728)
  • Added support for pipeline to run in shared vpc network.(CDAP-15723)
  • Stage level validation for plugin properties..(CDAP-15619)
  • Added a new REST endpoint that retrieves back all field lineage information about a dataset..(CDAP-15482)
  • Added support for bytes types in the bigquery sink.(CDAP-15342)

Deprecation

  • Removed the outdated Validator plugin. (CDAP-15917)

Bug Fixes

  • Fix the preview run state after JVM restarted(CDAP-16193)
  • content type detection now uses case insensitive file extensions(CDAP-16146)
  • Fixed bug that prevents users from navigating to pipeline studio (indicating system artifacts being loaded for a long time).(CDAP-16137)
  • Fixed the dataproc provisioner to log the error message if the dataproc creation operation fails.(CDAP-15973)
  • Fixed a bug that caused pipeline startup to take longer than needed for cloud runs(CDAP-15899)
  • Fixed regex usage in GCS and S3 source plugins.(CDAP-15879)
  • Fixed a bug with the Datastore source that was overly restrictive when validating the user provided schema(CDAP-15878)
  • Fixing a bug which can cause a thread spinning in an infinite while loop due to multi thread consumers on a queue that allows a single consumer.(CDAP-15809)
  • Fixed a bug that caused pipeline failures when writing nullable byte fields as json.(CDAP-15770)
  • Fixed a bug that caused MapReduce and Spark logs to be missing for remote pipeline runs(CDAP-15757)
  • Fixed a race condition that could cause a program to get stuck in the pending state when stopped in the pending state(CDAP-15747)
  • Added some safeguards to prevent cloud pipeline runs from getting stuck in certain edge cases(CDAP-15742)
  • Fixed a bug where secure macros were not evaluated in preview mode(CDAP-15726)
  • Fixed a bug in the BigQuery source that cause automatic bucket creation to fail if the dataset is in a different project.(CDAP-15617)
  • Fix bug in new user tour on lower resolution screens(CDAP-15583)
  • Fixed a bug that wrong resolution is used if a time range is specified for metrics query(CDAP-15554)
  • Fixed an issue where BigQuery multi sink doesn't work if using an Oracle database as a source.(CDAP-15535)
  • Fixed the dataproc provisioner to disable YARN pre-emptive container killing and to disable conscrypt. (CDAP-15498)
  • Fixed a bug in the MLPredictor plugin that caused error when using a classification model(CDAP-15445)
  • Fixed bug that didn't allow users to paste schema as runtime argument(CDAP-15423)
  • Spark pipelines no longer try to run sinks in parallel unless runtime argument 'pipeline.spark.parallel.sinks.enabled' is set to 'true'. This prevents pipeline sections from being re-processed in the majority of situations.(CDAP-15388)
  • Fixed the dataproc provisioner to handle networks that do not use automatic subnet creation(CDAP-15373)
  • Fixed a Wrangler bug where the wrong jdbc driver would be used in some situations and where required classes could be unavailable.(CDAP-15353)
  • Fixed a bug about artifact version comparison(CDAP-15221)
  • Fixed a bug that the rollup of the workflow lineage does not remove the local datasets.(CDAP-15206)
  • Expanding filename format that UI takes in when uploading artifacts.(CDAP-15097)

Improvements

  • Fixed batch pipeline preview to read only the preview records instead of the full input.(CDAP-16110)
  • Greatly improved the time it takes to calculate field level lineage(CDAP-16069)
  • Set Spark as the default execution engine for batch pipeline(CDAP-15983)
  • Improved error message for csv, tsv, and delimited formats when the schema has fewer fields than the data(CDAP-15794)
  • Added support to automatically fill field level lineage for plugins that do not emit any(CDAP-15782)
  • Upgrades Nodejs version from 8.x to 10.16.2(CDAP-15738)
  • Added support to restore preview status after restart(CDAP-15677)
  • Route user directly to the pipeline's detail page from pipeline card in Control Center. (CDAP-15659)
  • New user experience for log level selection.(CDAP-15489)
  • Added image version as a configuration setting to the dataproc provisioner(CDAP-15265)
  • Improved the way pipelines with macros that are provided by intermediate stages run.(CDAP-16076)