Skip to content

CDAP 6.0.0

Compare
Choose a tag to compare
@CuriousVini CuriousVini released this 22 May 20:05
· 65 commits to release/6.0 since this release
56e284e

Summary

This release introduces a number of new features, improvements, bug fixes and feature removal to CDAP. Some of the main highlights of the release are:

  1. Portable CDAP Runtime

    • Provide a runtime architecture for CDAP to support both Hadoop and Hadoopless environments, such as Kubernetes, in a distributed and secure fashion.
  2. Storage SPIs

    • Provide an abstraction for all CDAP system storage so that CDAP is more portable across runtime environments - Hadoop or Hadoop-free environments.
  3. Pipeline Enhancements

    • Improve experience of building pipelines with the help of features such as copy & paste and minimap of the pipeline.

Please note that upgrade capability of CDAP is not supported in this release. Please look at list of incompatible changes.

New Features

  • Added Google Cloud Storage copy and move action plugins.(CDAP-14330)
  • New pipeline list user interface.(CDAP-14533)
  • Added minimap to pipeline canvas.(CDAP-14613)
  • Added support for running CDAP system services in Kubernetes environment.(CDAP-14645)
  • Added the ability to copy and paste a node in pipeline studio.(CDAP-14657)
  • Added the ability to limit the number of concurrent pipeline runs.(CDAP-15058)
  • Added support for toggling Stackdriver integration in Google Cloud Dataproc cluster.(CDAP-15095)
  • Added support for Numeric and Array types in Google BigQuery plugins.(CDAP-15256)
  • Added support for showing decimal field types in plugin schemas in pipeline view.(CDAP-15339)

Improvements

  • Added support for CDH 5.15.(CDAP-13632)
  • Revamps top navbar for CDAP UI based on material design.(CDAP-14653)
  • Secure store supports integration with other KMS systems such as Google Cloud KMS using new Secure Store SPIs.(CDAP-14667)
  • Improved CDAP Master logging of events related to programs that it launches.(CDAP-7208)
  • Use a shared thread pool for provisioning tasks to increase thread utilization.(CDAP-14343)
  • Improve performance of LevelDB backed Table implementation.(CDAP-14569)
  • Wrangler supports secure macros in connection.(CDAP-14571)
  • Significantly improve performance of Transactional Messaging System.(CDAP-14617)
  • Added early validation for the properties of the Google BigQuery sink to fail during pipeline deployment instead of at runtime.(CDAP-14821)
  • Improved the error message when a null value is read for a non-nullable field in avro file sources.(CDAP-14823)
  • Improved loading of system artifacts to load in parallel instead of sequentially.(CDAP-15047)
  • Improved Google Cloud Dataproc provisioner to allow configuring default projectID from CDAP configuration.(CDAP-15059)
  • Added support of using runtime arguments to pass in extra configurations for Google Cloud Dataproc provisioner.(CDAP-15318)
  • Added support for spaces in file path for Google Cloud Storage plugin.(CDAP-14579)
  • Google BigQuery source now validates schema when the pipeline is deployed.(CDAP-14897)

Bug Fixes

  • Fixed a casting bug for the DB source where unsigned integer column were incorrectly being treated as integers instead of longs.(CDAP-12211)
  • Removed the need for ZooKeeper for service discovery in remote runtime environment.(CDAP-13410)
  • Fixed an issue with recording lineage for realtime sources.(CDAP-7230)
  • Fixed dynamic Spark plugin to use appropriate context classloader for loading dynamic Spark code.(CDAP-12941)
  • Fixed a bug that caused MapReduce pipelines to fail when using too many macros.(CDAP-13554)
  • Fixed an issue that caused pipelines with too many macros to fail when running in MapReduce.(CDAP-13982)
  • Fixed an issue with publishing metadata changes for profile assignments.(CDAP-14666)
  • Fixed a bug that would cause workspace ids to clash when wrangling items of the same name.(CDAP-14691)
  • Fixed a bug in secure store caused by breaking changes in Java update 171. Users should be able to get secure keys on java 8u171.(CDAP-14702)
  • Fixed a bug that caused Google Cloud Dataproc clusters to fail provisioning if a firewall rule that denies ingress traffic existed in the project.(CDAP-14708)
  • Fixed a bug that would cause data preparation to fail when preparing a large file in Google Cloud Storage.(CDAP-14709)
  • Fixed a bug that caused action-only pipelines to fail when running using a cloud profile.(CDAP-14724)
  • Fixed an issue with adding business tags to an entity.(CDAP-14744)
  • Fixed an issue in handling metadata search parameters.(CDAP-14778)
  • Fixed a bug that would cause pipelines to fail on remote clusters if the very first pipeline run was an action-only pipeline.(CDAP-14779)
  • Fixed the standard deviation aggregate functions to work, even if there is only one element in a group.(CDAP-14857)
  • Fixed a bug in the Google BigQuery sink that would cause pipelines to fail when writing to a dataset in a different region.(CDAP-14951)
  • Fixed a race condition in processing profile assignments.(CDAP-15001)
  • Fixed an issue that could cause inconsistencies in metadata.(CDAP-15013)
  • Fixed an issue with displaying workspace metadata in the UI.(CDAP-15069)
  • Fixed a race condition in the remote runtime scp implementation that could cause process to hang.(CDAP-15127)
  • Fixed an issue with metadata search result pagination.(CDAP-15196)
  • Fixed Wrangler DB connection where a bad JDBC driver could stay in cache for 60 minutes, making DB connection not usable.(CDAP-15223)
  • Fixed a NullPointerException in Google Cloud Dataproc provision for when there was no network configured.(CDAP-15249)
  • Fixed a bug that caused some aggregator and joiner keys to be dropped if they hashed to the same value as another key.(CDAP-15299)
  • Fixed a bug in the RuntimeMonitor that doesn't reconnect through SSH correctly, causing failure in monitoring the correct program state.(CDAP-15332)
  • Fixed Google Cloud Dataproc runtime for Google Cloud Platform projects where OS Login is enabled.(CDAP-15369)

Deprecated and Removed Features

  • Deprecated HDFSMove and HDFSDelete plugins from core plugins.(CDAP-15241)
  • Removed Streams and Stream Views, which were deprecated in CDAP 5.0.(CDAP-14591)
  • Removed Flow, which was deprecated in CDAP 5.0.(CDAP-14592)
  • Removed deprecated HDFSSink Plugin.(CDAP-14529)
  • Removed the plugin endpoints feature to prevent execution of plugin code in the cdap master. Endpoints were only used for schema propagation, which has moved to the pipeline system service.(CDAP-14772)
  • Removed the support for custom routing for user services.(CDAP-14886)