CDAP 6.0.0
CuriousVini
released this
22 May 20:05
·
65 commits
to release/6.0
since this release
Summary
This release introduces a number of new features, improvements, bug fixes and feature removal to CDAP. Some of the main highlights of the release are:
-
Portable CDAP Runtime
- Provide a runtime architecture for CDAP to support both Hadoop and Hadoopless environments, such as Kubernetes, in a distributed and secure fashion.
-
Storage SPIs
- Provide an abstraction for all CDAP system storage so that CDAP is more portable across runtime environments - Hadoop or Hadoop-free environments.
-
Pipeline Enhancements
- Improve experience of building pipelines with the help of features such as copy & paste and minimap of the pipeline.
Please note that upgrade capability of CDAP is not supported in this release. Please look at list of incompatible changes.
New Features
- Added Google Cloud Storage copy and move action plugins.(CDAP-14330)
- New pipeline list user interface.(CDAP-14533)
- Added minimap to pipeline canvas.(CDAP-14613)
- Added support for running CDAP system services in Kubernetes environment.(CDAP-14645)
- Added the ability to copy and paste a node in pipeline studio.(CDAP-14657)
- Added the ability to limit the number of concurrent pipeline runs.(CDAP-15058)
- Added support for toggling Stackdriver integration in Google Cloud Dataproc cluster.(CDAP-15095)
- Added support for Numeric and Array types in Google BigQuery plugins.(CDAP-15256)
- Added support for showing decimal field types in plugin schemas in pipeline view.(CDAP-15339)
Improvements
- Added support for CDH 5.15.(CDAP-13632)
- Revamps top navbar for CDAP UI based on material design.(CDAP-14653)
- Secure store supports integration with other KMS systems such as Google Cloud KMS using new Secure Store SPIs.(CDAP-14667)
- Improved CDAP Master logging of events related to programs that it launches.(CDAP-7208)
- Use a shared thread pool for provisioning tasks to increase thread utilization.(CDAP-14343)
- Improve performance of LevelDB backed Table implementation.(CDAP-14569)
- Wrangler supports secure macros in connection.(CDAP-14571)
- Significantly improve performance of Transactional Messaging System.(CDAP-14617)
- Added early validation for the properties of the Google BigQuery sink to fail during pipeline deployment instead of at runtime.(CDAP-14821)
- Improved the error message when a null value is read for a non-nullable field in avro file sources.(CDAP-14823)
- Improved loading of system artifacts to load in parallel instead of sequentially.(CDAP-15047)
- Improved Google Cloud Dataproc provisioner to allow configuring default projectID from CDAP configuration.(CDAP-15059)
- Added support of using runtime arguments to pass in extra configurations for Google Cloud Dataproc provisioner.(CDAP-15318)
- Added support for spaces in file path for Google Cloud Storage plugin.(CDAP-14579)
- Google BigQuery source now validates schema when the pipeline is deployed.(CDAP-14897)
Bug Fixes
- Fixed a casting bug for the DB source where unsigned integer column were incorrectly being treated as integers instead of longs.(CDAP-12211)
- Removed the need for ZooKeeper for service discovery in remote runtime environment.(CDAP-13410)
- Fixed an issue with recording lineage for realtime sources.(CDAP-7230)
- Fixed dynamic Spark plugin to use appropriate context classloader for loading dynamic Spark code.(CDAP-12941)
- Fixed a bug that caused MapReduce pipelines to fail when using too many macros.(CDAP-13554)
- Fixed an issue that caused pipelines with too many macros to fail when running in MapReduce.(CDAP-13982)
- Fixed an issue with publishing metadata changes for profile assignments.(CDAP-14666)
- Fixed a bug that would cause workspace ids to clash when wrangling items of the same name.(CDAP-14691)
- Fixed a bug in secure store caused by breaking changes in Java update 171. Users should be able to get secure keys on java 8u171.(CDAP-14702)
- Fixed a bug that caused Google Cloud Dataproc clusters to fail provisioning if a firewall rule that denies ingress traffic existed in the project.(CDAP-14708)
- Fixed a bug that would cause data preparation to fail when preparing a large file in Google Cloud Storage.(CDAP-14709)
- Fixed a bug that caused action-only pipelines to fail when running using a cloud profile.(CDAP-14724)
- Fixed an issue with adding business tags to an entity.(CDAP-14744)
- Fixed an issue in handling metadata search parameters.(CDAP-14778)
- Fixed a bug that would cause pipelines to fail on remote clusters if the very first pipeline run was an action-only pipeline.(CDAP-14779)
- Fixed the standard deviation aggregate functions to work, even if there is only one element in a group.(CDAP-14857)
- Fixed a bug in the Google BigQuery sink that would cause pipelines to fail when writing to a dataset in a different region.(CDAP-14951)
- Fixed a race condition in processing profile assignments.(CDAP-15001)
- Fixed an issue that could cause inconsistencies in metadata.(CDAP-15013)
- Fixed an issue with displaying workspace metadata in the UI.(CDAP-15069)
- Fixed a race condition in the remote runtime scp implementation that could cause process to hang.(CDAP-15127)
- Fixed an issue with metadata search result pagination.(CDAP-15196)
- Fixed Wrangler DB connection where a bad JDBC driver could stay in cache for 60 minutes, making DB connection not usable.(CDAP-15223)
- Fixed a NullPointerException in Google Cloud Dataproc provision for when there was no network configured.(CDAP-15249)
- Fixed a bug that caused some aggregator and joiner keys to be dropped if they hashed to the same value as another key.(CDAP-15299)
- Fixed a bug in the RuntimeMonitor that doesn't reconnect through SSH correctly, causing failure in monitoring the correct program state.(CDAP-15332)
- Fixed Google Cloud Dataproc runtime for Google Cloud Platform projects where OS Login is enabled.(CDAP-15369)
Deprecated and Removed Features
- Deprecated HDFSMove and HDFSDelete plugins from core plugins.(CDAP-15241)
- Removed Streams and Stream Views, which were deprecated in CDAP 5.0.(CDAP-14591)
- Removed Flow, which was deprecated in CDAP 5.0.(CDAP-14592)
- Removed deprecated HDFSSink Plugin.(CDAP-14529)
- Removed the plugin endpoints feature to prevent execution of plugin code in the cdap master. Endpoints were only used for schema propagation, which has moved to the pipeline system service.(CDAP-14772)
- Removed the support for custom routing for user services.(CDAP-14886)