Skip to content

Releases: cdapio/cdap

Cask Data Application Platform v4.2.0

07 Jun 08:19
Compare
Choose a tag to compare

Summary

  1. Spark Enhancements: Added suppport for Apache Spark 2.x. Users have an option to configure CDAP to use Spark 1.x or Spark 2.x on their cluster. Also added capability to run interactive Spark code within CDAP.

  2. Enhanced Data Preparation: Added capabilities in data preparation to connect to the File System (Local and HDFS) and relational databases, browse and select their existing data, and import into Data Preparation for cleansing, preparing and transforming.

  3. Event Driven Schedules: Added capabilities to start CDAP programs based on data availability of partitions of data in HDFS and pose run contraints to intelligently orchestrate CDAP Workflows.

New Features

Spark Enhancements

  • Added support for Spark 2.x. In environments where multiple Spark versions exist, CDAP must be configured to use one or the other (CDAP-7875)

  • Enable capabilities to run interactive Spark code within CDAP (CDAP-11409)

  • Added capabilities to run arbitrary Spark code in CDAP Pipelines (CDAP-11410)

  • Enhancements to speed up launching Spark programs (CDAP-11411)

Enhanced Data Preparation

  • Adds File System Browser Component to browse Local and HDFS File System from Data Preparation (CDAP-9290)

  • Adds Data Quality information to Data Preparation table. Currently, it shows the completeness of each column (CDAP-9517)

  • Added point-and-click interactions for applying directives such as parsing, splitting, find and replace, filling null or empty rows, copying and deleting columns in Data Preparation. They can be invoked by using the dropdown menu for each column (CDAP-9524)

  • Added point-and-click interaction for cleansing column names (CDAP-11333)

  • Added a point-and-click interaction to set all column names in Data Preparation (CDAP-11334)

  • Added the ability to ingest data one-tim from Data Preparation to a CDAP Dataset (CDAP-11424)

  • Added macro support for Data Preparation directives (CDAP-9556)

Event Driven Schedules

  • Introduces a new, event-driven scheduling system that can start programs based on data availability in HDFS partitions (CDAP-7593)

  • Allow users to configure constraints for schedules, such as duration since last run and allowed time range for program execution (CDAP-11338)

Other New Features

  • Added capability for CDAP Services to dynamically list available artifacts and dynamically load artifacts (CDAP-11498)

  • Added support for EMR 5.0 - 5.3 (CDAP-7873)

  • Added the ability for Data Preparation to handle byte arrays of data for processing binary data (CDAP-11486)

  • Added an API to Spark Streaming sources to provide number of streams being used by a streaming source (CDAP-11422)

  • Users can now upload, view, and use plugins of type 'sparksink' in Studio. (CDAP-11681)

  • Modified the log viewer to only show ERROR, WARN, and INFO levels of logs by default, instead of all logs as previously (CDAP-8668)

Bug fixes

  • Fix a bug where the log level was always set to INFO at the root logger (CDAP-8289)

  • Fix a bug where extra characters after an artifact version range were being ignored instead of being recognized as invalid (CDAP-7727)

  • Fixed a bug where users could not read from real Datasets while previewing CDAP Pipelines (CDAP-7884)

  • Fixed a bug that prevented users from adding extra classpath to Apache Spark drivers and executors (CDAP-9422)

  • Fixed a bug where impersonated workflow was not creating local datasets with the correct impersonated user (CDAP-9456)

  • Fixed a bug in Parquet and Avro File sinks that would cause them to fail if they received ByteBuffers instead of byte arrays. (CDAP-11417)

  • Fixed a bug where writes could only succeed in one MongoDB sink even when multiple MongoDB sinks were present in a pipeline (CDAP-11558)

  • Fixed a thread leakage bug in Spark (SPARK-20935) after Spark Streaming program completed (CDAP-11577)

  • Fixed a bug in TMS where fetching from the payload table raised an exception if the fetch had an empty result (CDAP-11588)

  • Fixed a bug in the Purchase example that could cause purchases to overwrite each other (CDAP-11643)

  • Fixed a bug that prevented from using logback.xml in Apache Spark Streaming programs. (CDAP-11651)

  • Fixed an issue where pipeline metrics were not showing up in pipelines with a large number of nodes (CDAP-9284)

  • Fixed an issue with retrieving workflow state if it contained an exception without a message (CDAP-11795)

  • Fixed an issue with the CDAP Ambari service definition where the "cdap" headless user was not unique to the cluster (CDAP-11445)

  • Fixed the CDAP Upgrade tool to not fail when encountering a non-CDAP table that follows the CDAP naming convention (CDAP-4887)

  • Fixed an issue where the driver process of a CDAP Workflow was getting restarted when it ran out of memory, causing the Workflow to be executed again from the start node (CDAP-5067)

  • Fixed an issue with the detection of Apache Spark on HDP 2.5 and above, which caused excess noise on the console (CDAP-7429)

  • Fixed an issue with the YARN container allocation logic so that the correct container size is used. (CDAP-8888)

  • Fixed the stream container to terminate cleanly and cleaned up the CDAP Master's Apache Twill JAR files after master shutdown (CDAP-8911)

  • Fixed an issue where redeployment of an application with a deleted schedule would fail (CDAP-8918)

  • Fixed warnings about /opt/cdap/master/artifacts not being a directory in unit tests (CDAP-8961)

  • Fixed an issue due to which CDAP entity roles were not cleanup when the entity was deleted (CDAP-9026)

  • Fixed an issue where cdap-security.xml was not written under Ambari unless security.enabled in cdap-site.xml was set to true (CDAP-9378)

  • Fixed the Azure Blob Store source to work with Avro and Parquet formats (CDAP-10475)

  • Fixed the Azure Blob Store source to work with CDAP FileSets (CDAP-11384)

  • Fixed the "value is" filter in the Data Preparation UI (CDAP-11557)

  • Fixed impersonation while upgrading datasets in the Upgrade tool (CDAP-11815)

Deprecations

  • Add property "metrics.processor.queue.size" with default value 20000 to limit the maximum size of a queue where metrics processor temporarily stores newly fetched metrics in memory before persisting them. Added property "metrics.processor.max.delay.ms" with default value 3000 milliseconds to specify the maximum delay allowed between the latest metrics timestamp and the time when it is processed. The larger this property is, Metrics Processor gets to sleep more often between fetching each batch of metrics but the delay between metrics emission and processing also increases. Deprecated the property "metrics.messaging.fetcher.limit" (CDAP-8327)

Cask Data Application Platform 4.1.1

17 Apr 04:03
Compare
Choose a tag to compare

Summary

  1. Data Preparation: Point-and-click interactions and integration with the rest of CDAP
    including, but not limited to, namespaces, security, and pipelines.

  2. Upgrade: Significant reduction in downtime during CDAP upgrades, by removing some data
    migration and doing required migration in the background after CDAP starts up.

  3. Pipeline Previews: Added logs, better error messaging, ability to read from existing
    datasets, and a better stop experience.

  4. Logs: Added a condensed view of logs for CDAP pipelines and programs that does not
    include logs emitted by the CDAP platform and libraries. The condensed view only contains lifecycle logs, logs emitted by the program or pipeline, and errors.

  5. Schedules: Added the ability to update schedules without redeploying the application.

New Features

Data Preparation
................................

  • Users can now interact with and manage multiple workspaces in Data Preparation. (CDAP-9235)

  • Added point-and-click interactions for applying directives such as parsing, splitting, find and replace, filling null or empty rows, copying and deleting columns in Data Preparation. They can be invoked by using the dropdown menu for each column. (WRANGLER-77)

Logs
................................

  • Added option to the log viewer to only show "user" condensed logs. (CDAP-9117)

  • Logs for previews of CDAP pipelines are now available in the CDAP UI via the Logs button in Preview mode. (HYDRATOR-1316)

Schedules
................................

  • Added support for adding, deleting, updating, and retrieving workflow schedules. (CDAP-8902)

Other New Features
................................

  • Upgraded Apache Tephra dependency to the 0.11.0-incubating version. (CDAP-8872)

  • Users can now deploy CDAP pipelines with a single action plugin. This feature can be used to run external Apache Spark programs as CDAP pipelines. (CDAP-9141

    Added a sparkprogram plugin type that can be used to run arbitrary Spark code at the beginning or end of a pipeline. An external Spark program can be added by clicking the "plus" ("+") button in the CDAP UI, choosing Library, and specifying sparkprogram as the type. It is then available as an Action plugin in the CDAP Studio.

  • Added support for HDP 2.6. (CDAP-9250)

  • Added support for CDH 5.11.0. (CDAP-9281)

  • Added support that allows plugin developers to integrate with CDAP services by exposing CDAP service discovery capabilities in the plugin context. (CDAP-9311)

Improvements

Upgrade
................................

  • Added the running of HBase coprocessor upgrades concurrently on CDAP Datasets. (CDAP-9278)

  • Improved the CDAP upgrade process to minimize the downtime needed to upgrade, by performing data migration in the background. (CDAP-9282

Pipeline Previews
................................

  • Simplified the status, next runtime of pipelines, total number of running pipelines, and drafts in the pipeline list view UI. (CDAP-9017)

Schedules
................................

  • Allow administrators to enable or disable updating schedules using the property "app.deploy.update.schedules" in cdap-site.xml. Users can override this to enable or disable updating schedules during deployment of an application using the same property specified in the configuration of the application. (CDAP-8942)

Other Improvements
................................

  • Added fetch size and transaction flush interval configurations to the Kafka Consumer Flowlet. (CDAP-7731)

  • Users can now see a contextual message with appropriate call(s) to action when no entities are found on the Overview page. (CDAP-8430)

  • Added new configurations to control the YARN application master container memory size, maximum heap memory size, and maximum non-heap memory size: twill.java.heap.memory.ratio, twill.yarn.am.memory.mb, and twill.yarn.am.reserved.memory.mb. (CDAP-8990)

  • Increased the default memory allocation for the CDAP Explore service container to 2048MB. (CDAP-9003)

  • Users can now grant and revoke privileges for UNIX groups and users when using Apache Sentry as the authorization extension for CDAP. (CDAP-9027)

  • Added a "cdap apply-pack [pack]" command to the "cdap" script that allows for upgrading of individual CDAP components. (CDAP-9077)

Bug Fixes

Upgrade
................................

  • Fixed an issue with the pipeline upgrade tool that caused it to skip CDAP 4.0.x pipelines. (CDAP-9185)

Pipeline Previews
................................

  • Fixed a bug that preview cannot read from datasets in real space. (CDAP-7884)

  • When previewing a pipeline in the CDAP Studio, disabled all writes to sinks. Incoming data to sinks can be viewed in the preview tab of the sink, but is not written to the sink. (CDAP-8013)

  • Fixed an issue where preview of CDAP pipelines did not show data for successful stages if a particular stage failed. (CDAP-9333)

Logs
................................

  • Fixed a problem that caused duplicate logs to show up for a running pipeline. (CDAP-7138)

  • Fixed bug where the "Total Messages/Errors/Warnings" at the top of logviewer was showing incorrect values. (CDAP-9248)

Schedules
................................

  • Fixed an issue where redeployment of an application with a deleted schedule would fail. (CDAP-8918)

Other Bug Fixes
................................

  • Removed the requirement of being an admin to run the CDAP startup script for Windows. (CDAP-4213)

  • Made Plugin Endpoint invocation more robust. If a plugin's parent can't instantiate the plugin necessary for invoking, CDAP will attempt with other parents of the plugin and try to instantiate using them before retuning error. (CDAP-5715)

  • Fixed an issue with namespace deletion which caused CDAP Application test cases to fail in a Windows environment. (CDAP-6348)

  • Fix an issue with losing a few metrics when a container is shutdown. (CDAP-8862)

  • Fixed an issue with the YARN container allocation logic so that the correct container size is used. (CDAP-8888)

  • Improved the serializability of Tables and IndexedTables when used in Spark programs. (CDAP-8913)

  • Moved the "add plugin" behavior from a plugin's left panel to an "Add Entity" button in the CDAP Studio UI. (CDAP-8945)

  • Fixed an issue in the CDAP UI where navigating from a stream card to an overview and then to a detail page made the detail page show a spinner icon indefinitely. (CDAP-8950)

  • Fixed an issue with the Spark program runtime so that the Kryo serializer can be used. (CDAP-8980

  • Fixed an issue where the HBase Queue Debugging Tool failed when authorization was enabled. (CDAP-9005)

  • Fixed an issue where users could not grant and revoke privileges for UNIX groups and users when using Apache Sentry as the authorization extension for CDAP. (CDAP-9029

  • Fixed an issue where revoking privileges from a role caused the privilege to be revoked from all roles. (CDAP-9046)

  • Fixed an issue with the Window plugin so that it propagates schema properly. (CDAP-9086)

  • Fixed the Overview panel in home page of the CDAP UI to handle unknown entities appropriately. (CDAP-9087)

  • Added the retrying of local dataset operations when a failure happens. (CDAP-9114)

  • Fixed an issue with the binary format in the Kafka streaming source that prevented pipeline deployment. (CDAP-9142)

  • Fixed an issue that caused YARN containers to be killed due to excessive memory usage when impersonation is enabled. (CDAP-9160)

  • Fixed bug where navigation links were referencing default namespace instead of the current namespace. (CDAP-9216)

  • Improved error messages for the 'Get S...

Read more

Cask Data Application Platform 3.5.5

10 Apr 23:10
Compare
Choose a tag to compare

New Features

  • Authentication server announce address is now configurable with the property security.auth.server.announce.urls, which are comma-separated URLs in the form of protocol://host:port. The property security.auth.server.announce.address is now deprecated. It is only used if it is set but security.auth.server.announce.urls has not been set. (CDAP-4535)
    security.auth.server.announce.address now takes a single address in the form of either host:port or host.
    A default URL will be generated by the Authentication Server if either property is not set.
  • New configurations have been added to control the YARN application master container memory size, maximum heap memory size, and maximum non-heap memory size: twill.java.heap.memory.ratio, twill.yarn.am.memory.mb, and twill.yarn.am.reserved.memory.mb. (CDAP-8990)

Improvements

  • LogHandler endpoints now returns a 404 status code if the entity (the run id) for which logs are requested does not exist. (CDAP-9084)

Bug Fixes

  • Fixed an issue where HBaseQueueDebugger failed when authorization was enabled. (CDAP-9005)
  • Fixed a memory leak issue with the Hadoop FileSystem object. (CDAP-9160)
  • Fixed an out-of-memory issue for the log saver by adding a limit on the maximum number of events in-memory. (CDAP-9085)
  • Fixed an issue with uncaught exceptions so that they are logged through the logger, allowing log collections for those exceptions. (CDAP-8997)

Deprecated and Removed Features

  • The property security.auth.server.announce.address is now deprecated. (CDAP-4535)

Cask Data Application Platform 3.5.4

03 Mar 02:43
Compare
Choose a tag to compare

New Features

  • Added fetch size and transaction flush interval configurations to the Kafka Consumer Flowlet. (CDAP-7731)

  • Fixed an issue to make artifact, datasets, logs, and coprocessor JAR locations resilient to an HDFS Namenode HA upgrade. (CDAP-8343)

Improvements

  • Reduced non-informative stacktrace information in the log when a connection to the CDAP Router is closed prematurely. (CDAP-8250)

  • Improved the master process stop procedure to support fast failover when running with HA. Added a new kill command to force-kill CDAP processes. (CDAP-8565)

Bug Fixes

  • Fixed an issue where DefaultNamespaceEnsurer sometimes prevented CDAP Master shutdown. (CDAP-7090)

  • Fixed an issue with CDAP Standalone starting in a Microsoft Windows environment. (CDAP-7829)

  • Fix the CDAP UpgradeTool to not rely on the existence of a 'default' namespace. (CDAP-8229)

  • Added back the CDAP UI health-check end point to determine the status of the CDAP UI service. (CDAP-8260)

  • Fixed an issue where a major compaction was not evicting invalid queue entries. (CDAP-8798)

  • Fixed an issue with transactions started after a snapshot restore having an incorrect invalid transaction list. (CDAP-8855)

Cask Data Application Platform 4.1.0

27 Feb 07:15
Compare
Choose a tag to compare

New Features

Secure Impersonation

  • Added support for fine-grained impersonation at the CDAP application, dataset, and stream level. (CDAP-8110)
  • Impersonated namespaces can be configured to disallow the impersonation of the namespace owner when running CDAP Explore queries. (CDAP-8355)

Replication and Resiliency

  • Provided SPI hooks that users can implement for performing HBase DDL operations. (CDAP-7685)
  • Added a tool to check a cluster's replication status. (CDAP-8025)
  • CDAP context methods will now be retried according to a program's retry policy. These are governed by these properties: (CDAP-8032)
    • custom.action.retry.policy.base.delay.ms
    • custom.action.retry.policy.max.delay.ms
    • custom.action.retry.policy.max.retries
    • custom.action.retry.policy.max.time.secs
    • custom.action.retry.policy.type
    • flow.retry.policy.base.delay.ms
    • flow.retry.policy.max.delay.ms
    • flow.retry.policy.max.retries
    • flow.retry.policy.max.time.secs
    • flow.retry.policy.type
    • mapreduce.retry.policy.base.delay.ms
    • mapreduce.retry.policy.max.delay.ms
    • mapreduce.retry.policy.max.retries
    • mapreduce.retry.policy.max.time.secs
    • mapreduce.retry.policy.type
    • service.retry.policy.base.delay.ms
    • service.retry.policy.max.delay.ms
    • service.retry.policy.max.retries
    • service.retry.policy.max.time.secs
    • service.retry.policy.type
    • spark.retry.policy.base.delay.ms
    • spark.retry.policy.max.delay.ms
    • spark.retry.policy.max.retries
    • spark.retry.policy.max.time.secs
    • spark.retry.policy.type
    • system.log.process.retry.policy.base.delay.ms
    • system.log.process.retry.policy.max.retries
    • system.log.process.retry.policy.max.time.secs
    • system.log.process.retry.policy.type
    • system.metrics.retry.policy.base.delay.ms
    • system.metrics.retry.policy.max.retries
    • system.metrics.retry.policy.max.time.secs
    • system.metrics.retry.policy.type
    • worker.retry.policy.base.delay.ms
    • worker.retry.policy.max.delay.ms
    • worker.retry.policy.max.retries
    • worker.retry.policy.max.time.secs
    • worker.retry.policy.type
    • workflow.retry.policy.base.delay.ms
    • workflow.retry.policy.max.delay.ms
    • workflow.retry.policy.max.retries
    • workflow.retry.policy.max.time.secs
    • workflow.retry.policy.type
  • Added a master.manage.hbase.coprocessors setting that can be set to false on clusters where the CDAP coprocessors are deployed on every HBase node. (CDAP-8037)

Enhancements to the New CDAP UI

  • Added the management of preferences at the application and program levels. (CDAP-8021)

    The CDAP UI added dataset and stream detail and overviews. (CDAP-8217)

  • The CDAP UI added a "call-to-action" dialog after entity creation, so users can easily perform actions on the newly-created entities. (CDAP-8203)

  • Users can now view events and logs of programs in the new CDAP UI using the events and log view "fast-action" dialogs. (CDAP-8282,CDAP-8376)

  • Users now see on the CDAP UI homepage a "Just Added" section, listing and highlighting any entities added in the last five minutes. (CDAP-8398)

  • The CDAP UI added a duration timer to CDAP pipelines. (HYDRATOR-208)

Logs

  • Added a prototype implementation for a rolling HDFS log appender. (CDAP-7676,CDAP-9999)
  • Program context information, including namespace, program name, and program type, are now available in the MDC property of each ILoggingEvent emitted from a program container. (CDAP-7962)
  • Revised the CDAP Log Appender to use Logback's Appender interface. (CDAP-8108)
  • The log file cleaner thread will remove metadata and, for successfully deleted metadata entries, it will delete the corresponding log files. The log file cleaner thread will only remove the metadata entries for the old (pre-4.1.0) log format. (CDAP-8231)
  • Logs collected by the CDAP Log Appender will be stored at a common <cdap>/logs path, owned by the cdap user. For security, it is readable only by the cdap user. (CDAP-8261)
  • Added additional metrics about the status of the log framework: log.process.min.delay and log.process.max.delay. (CDAP-8428)

New CDAP Pipeline Plugins

Dataset Improvements

  • Added the ability to reuse an existing file system location and Hive table when creating a partitioned file set. (CDAP-7596)
  • Added configuring the CDAP Explore database and table name for a dataset using dataset properties. (CDAP-7597)
  • Added a tool that pre-builds and loads the HBase coprocessors required by CDAP onto HDFS. (CDAP-7683)
  • Added control of group ownership and permissions through dataset properties. (CDAP-8070)

Other New Features

  • CDAP now uses environment variables in the spark-env.sh and properties in the `spark-d...
Read more

Cask Data Application Platform 4.0.1

25 Jan 03:20
Compare
Choose a tag to compare

Improvement

  • Added a step in the CDAP Upgrade Tool to disable TMS (Transaction Messaging Service) message and payload tables. The TMS TwillRunnable will update the coprocessors of those tables if required and enable the tables. (CDAP-8047)

Bug Fixes

  • Fixed an issue where the CDAP service scripts could cause a terminal session to not echo characters. (CDAP-7694)
  • The CDAP Security service under CDAP Standalone is no longer forced to bind to localhost. (CDAP-7992)
  • To avoid transaction timeouts, log cleanup is now done in configurable batches (controlled by the property log.cleanup.max.num.files) instead of a single short transaction. (CDAP-8000)
  • Fixed a bug in the TMS (Transaction Messaging Service) message and payload table coprocessors by changing the accessing of CDAP configuration and TMS metadata tables from reading them inline to reading them in a separate thread. (CDAP-8007)
  • Changed the default CDAP UI port to 11011 to match the CDAP 4.0.0 release. (CDAP-8023)
  • Removed an obsolete Update Dataset Specifications step in the CDAP Upgrade tool. This step was required only for upgrading from CDAP versions lower than 3.2 to CDAP Version 3.2. (CDAP-8086)
  • Provided a workaround for Scala bug SI-6240 (https://issues.scala-lang.org/browse/SI-6240) to allow concurrent execution of Spark programs in CDAP Workflows. (CDAP-8087)
  • Fixed the CDAP Hydrator detail view so that it can be rendered in older browsers. (CDAP-8088)
  • Fixed an issue where the number of records processed during a preview run of the realtime data pipeline was being incremented incorrectly. (CDAP-8094)
  • Fixed an issue with the flag used by the Node proxy to enable SSL between the CDAP UI and CDAP Router. (CDAP-8126)
  • Fixed an issue with the CDAP CLI where execute commands may be interpreted incorrectly. (CDAP-8137)
  • Fixed an issue in the template path used with the original CDAP UI when rendering a dataset detailed view. (CDAP-8148)
  • Fixed issues with the Ambari UI "Quick Links" and alerts definitions for SSL and non-default ports and the writing of the cdap-security.xml file when configured under the CDAP Ambari Service. (CDAP-8158)
  • Fixed an issue where runtime arguments were not being passed for the preview run correctly in the CDAP UI. (HYDRATOR-1212)
  • Fixed an issue where previews would not run in a non-default namespace. (HYDRATOR-1226)

Cask Data Application Platform 3.5.3

21 Jan 01:09
Compare
Choose a tag to compare

Improvements

  • Now allows usage of a custom Kryo serializer in Spark programs. (CDAP-7647)

Bug Fixes

  • Fixed an issue where the CDAP service scripts could cause a terminal session to not echo characters. (CDAP-7694)
  • Removed an obsolete Update Dataset Specifications step in the CDAP Upgrade tool. This step was required only for upgrading from CDAP versions lower than 3.2 to CDAP Version 3.2. (CDAP-8086)
  • Provided a workaround for Scala bug SI-6240 (https://issues.scala-lang.org/browse/SI-6240) to allow concurrent execution of Spark programs in CDAP Workflows. (CDAP-8087)

Cask Data Application Platform 3.5.2

23 Dec 22:40
Compare
Choose a tag to compare

Known Issues

  • In CDAP 3.5.0, new kafka.server.* properties replace older properties such as kafka.log.dir, as described in the Administration Manual: Appendices: cdap-site.xml. (CDAP-7179)

    If you are upgrading from CDAP 3.4.x to 3.5.x and you have set a value for kafka.log.dir by using Cloudera Manager's safety-valve mechanism, you need to change to the new property kafka.server.log.dirs, as the deprecated kafka.log.dir is being ignored in favor of the new property. If you don't, your custom value will be replaced with the default value.

  • When running in CDAP Standalone, the Cask Hydrator plugin NaiveBayesTrainer has a permgen memory leak that leads to an out-of-memory error if the plugin is repeatedly used a number of times, as few as six runs. The only workaround is to reset the memory by restarting CDAP Standalone. (CDAP-7608)

Improvements

  • Fixed an issue with the CDAP scripts under Windows not handling a JAVA_HOME path with spaces in it correctly. CDAP SDK home directories with spaces in the path are not supported (due to issues with the product) and the scripts now exit if such a path is detected. (CDAP-3262)
  • For MapReduce programs using a PartitionedFileSet as input, expose the partition key corresponding to the input split to the mapper. (CDAP-4322)
  • Added the property program.container.dist.jars to set extra jars to be localized to every program container and to be added to classpaths of CDAP programs. (CDAP-6183)
  • The namespace that integration test cases run against by default has been made configurable. (CDAP-6572)
  • Improve UpgradeTool to upgrade tables in namespaces with impersonation configured. (CDAP-6577)
  • Added support for concurrent runs of a Spark program. (CDAP-6885)
  • Added support for impersonation with CDAP Explore (Hive) operations, such as enabling exploring of a dataset or running queries against it. (CDAP-6587)
  • Added support for CDH 5.9. (CDAP-7291)
  • The Log HTTP Handler and Router have been fixed to allow the streaming of larger logs files. (CDAP-7385)
  • Added support to LogSaver for impersonation. (CDAP-7387)
  • Added authorization for schedules in CDAP. (CDAP-7404)
  • Improved error handling upon failures in namespace creation. (CDAP-7529)
  • DynamicPartitioner can now limit the number of open RecordWriters to one, if the output partition keys are grouped. (CDAP-7557)
  • Added a property kafka.zookeeper.quorum to be used across all internal clients using Kafka. (CDAP-7682)
  • Adds cluster.name as a property that identifies a cluster; this property can be set in the cdap-site.xml. (CDAP-7761)
  • Added the Windows Share Copy plugin to the Hydrator plugins. (HYDRATOR-979)
  • The SSH hostname and the command to be executed are now macro-enabled for the SSH action plugin. (HYDRATOR-997)

Bug Fixes

  • Fixed an issue that prevented macros from being used with a secure KMS store. (CDAP-6981)
  • Fixed an issue so as to significantly reduce the chance of a schedule misfire in the case where the CPU cannot trigger a schedule within a certain time threshold. (CDAP-7116)
  • Fixed an issue where macros were not being substituted for postaction plugins. (CDAP-7177)
  • Fixed an issue where dataset usage was not being recorded after an application was deleted. (CDAP-7250)
  • Fixed an issue that would cause MapReduce and Spark programs to fail if too many macros were being used. (CDAP-7318)
  • Fixed a problem with upgrading CDAP using the CDAP Upgrade Tool. (CDAP-7321)
  • Fixed a problem with the upgrade tool while upgrading HBase coprocessors. (CDAP-7324)
  • Fixed a problem with using "Download All" logs in the browser log viewer by having it fetch and stream the response to the client. (CDAP-7353)
  • Fixed a problem with NodeJS buffering a response before sending it to a client. (CDAP-7359)
  • Fixed a problem with log file corruption if the log saver container crashes due to being killed by YARN. (CDAP-7361)
  • Fixed a problem with the CDAP UI not handling "5xx" error codes correctly. (CDAP-7364)
  • Fixed Hydrator Studio in the Windows version of Chrome to allow users to open and edit a node configuration. (CDAP-7374)
  • Fixed an error in the "CDAP Introduction" tutorial's "Transforming Your Data" example of an application configuration. (CDAP-7386)
  • Fixed TestFramework classloading to support classes that depend on org.hamcrest. (CDAP-7391)
  • Fixed an issue where the Java process corresponding to the MapReduce application master kept running even if the application was moved to the FINISHED state. (CDAP-7392)
  • Fixed an issue with impersonation in flows not working by not re-using HBaseAdmin across different UGI. (CDAP-7394)
  • Fixed an issue which prevented scheduled jobs from running on a namespace with impersonation. (CDAP-7396)
  • Fixed an issue which prevented an app in a namespace from being deleted if a program for the same app is running in a different namespace. (CDAP-7398)
  • Fixed an issue that prevented the CDAP UI from starting if the logback.xml was configured to log at the INFO or lower level. (CDAP-7403)
  • Added authorization for schedules in CDAP. (CDAP-7404)
  • Avoid the caching of YarnClient in order to fix a problem that occurred in namespaces with impersonation configured. (CDAP-7420)
  • Fixed an issue that prevented HBaseQueueDebugger from running in an impersonated namespace. (CDAP-7433)
  • Fixed an error which prevented the downloading of large logs using the CDAP UI. (CDAP-7435)
  • Removed the requirement of running "kinit" prior to running either the Upgrade or Transaction Debugger tools of CDAP on a secure Hadoop cluster. ([CDAP-7438, :cask-issue:CDAP-7439](https://issues.cask.co/browse/CDAP-7438`, :cask-issue:`CDAP-7439))
  • Fixed an issue that prevented the CDAP Upgrade Tool from being run for a namespace with authorization turned on. (CDAP-7458)
  • Fix logback-container.xml to work on clusters with multiple log directories configured for YARN. (CDAP-7473)
  • Fixed a problem in CDAP logging that caused system logs from Kafka to not be saved after an upgrade and for previously-saved logs to become inaccessible. (CDAP-7482)
  • Fixed cases where the MapReduce classloader was being closed prematurely. (CDAP-7500)
  • Fixed a problem that prevented the use of a logback.xml from an application jar. (CDAP-7527)
  • Fixed a problem in integration tests to allow JDBC connections against authorization-enabled and SSL-enabled CDAP instances. (CDAP-7548)
  • Improved the usability of ServiceManager in integration tests. The getServiceURL method now waits for the service to be discoverable before returning the service's URL. (CDAP-7566)
  • Fixed cases where Spark programs could not be started after a master failover or restart. (CDAP-7612)
  • The CDAP Ambari service was updated to use scripts for Auth Server/Router alerts in Ambari due to Ambari not supporting CDAP's /status endpoint with WEB check. (CDAP-7660)
  • Fixed a problem with Hydrator pipelines using a DBSource not working in an HDP cluster. (HYDRATOR-791)
  • Fixed a problem with Spark data pipelines not supporting argument values in excess of 64K characters. (HYDRATOR-948)
  • Fixed a problem that prevented the adding of a schema with hyphens in the Hydrator UI.

Cask Data Application Platform 4.0.0

21 Dec 11:19
Compare
Choose a tag to compare

New Features

  • Adds a transactional messaging system that is used for reliable communication of messages between components. In CDAP 4.0.0, the transactional messaging system replaces Kafka for publishing and subscribing audit logs that is used within CDAP for computing data lineage. (CDAP-7211)
  • Added a pluggable extension to retrieve operational statistics in CDAP. Provided extensions for operational stats from YARN, HDFS, HBase, and CDAP. (CDAP-7670) (CDAP-7703) (CDAP-7704)
  • Allow updating or resetting of log levels for program types worker, flow, and service dynamically using REST endpoints (CDAP-5479) (CDAP-7214)

Improvements

  • New menu option in Cloudera Manager when running the CDAP CSD enables running utilities such as the HBaseQueueDebugger. (CDAP-5632)
  • Added support for impersonation with CDAP Explore (Hive) operations, including enabling exploring of a dataset or running queries against it. (CDAP-6587)
  • Added support for enabling client certificate-based authentication to the CDAP Authentication server. (CDAP-7287)
  • Merged various shell scripts into a single script to interface with CDAP, called cdap, shipped with both the SDK and Distributed CDAP.(CDAP-1280)
  • Updated the default CDAP Router port to 11015 to avoid conflicting with HiveServer2's default port.(CDAP-1696)
  • Fixed an issue with the CDAP scripts under Windows not handling a JAVA_HOME path with spaces in it correctly. CDAP SDK home directories with spaces in the path are not supported (due to issues with the product) and the scripts now exit if such a path is detected.(CDAP-3262)
  • For MapReduce programs using a PartitionedFileSet as input, the partition key corresponding to the input split is now exposed to the mapper.(CDAP-4322)
  • Fixed an issue where an exception from an HttpContentConsumer was being silently ignored.(CDAP-4901)
  • Added pagination for the search RESTful API. Pagination is achieved via {{offset}}, {{limit}}`, {{numCursors}}, and {{cursor}} parameters in the RESTful API.(CDAP-5068)
  • Added the property program.container.dist.jars to set extra jars to be localized to every program container and to be added to classpaths of CDAP programs.(CDAP-6183)
  • Fixed an issue that allowed a FileSet to be created if its corresponding directory already existed.(CDAP-6425)
  • The namespace that integration test cases run against by default has been made configurable.(CDAP-6572)
  • Added a feature that implements caching of user credentials in CDAP system services.(CDAP-6635)
  • Fixed an issue in WorkerContext that did not properly implement the contract of the Transactional interface. Note that this fix may cause incompatibilities with previous releases in certain cases. See API Changes, CDAP-6837 for more details.(CDAP-6837)
  • Updated more system services to respect the cdap-site parameter "master.service.memory.mb".(CDAP-6862)
  • Added support for concurrent runs of a Spark program.(CDAP-6885)
  • Added support for running CDAP on Apache HBase 1.2.(CDAP-6937)
  • Added support for Amazon EMR 4.6.0+ installation of CDAP via a bootstrap action script.(CDAP-6938)
  • Added support for enabling SSL between the CDAP Router and CDAP Master.(CDAP-6984)
  • Adding the capability to clean up log files which do not have corresponding metadata.(CDAP-6995)
  • Added support for checkpointing in Spark Streaming programs to persist checkpoints transactionally.(CDAP-7117)
  • Updated the Windows start scripts to match the new shell script functionality.(CDAP-7181)
  • Added the ability to specify an announce address and port for the CDAP AppFabric and Dataset services. Deprecated the properties app.bind.address and dataset.service.bind.address, replacing them with master.services.bind.address as the bind address for master services. Added the properties master.services.announce.address, app.announce.port, and dataset.service.announce.port for use as announce addresses that are different from the bind address.(CDAP-7192)
  • Improved CDAP Master logging of events related to programs that it launches.(CDAP-7208)
  • Fixed a NullPointerException being logged on closing network connection.(CDAP-7240)
  • Upgraded the Apache Tephra version to 0.10-incubating.(CDAP-7284)
  • Added support for CDH 5.9.(CDAP-7291)
  • Provided programs more control over when and how transactions are executed.(CDAP-7319)
  • The Log HTTP Handler and Router have been fixed to allow the streaming of larger logs files.(CDAP-7385)
  • Revised the documentation on the recommended setting for yarn.nodemanager.delete.debug-delay-sec.(CDAP-7393)
  • Removed the requirement in the documentation of running kinit prior to running the CDAP Upgrade Tool when upgrading a package installation of CDAP on a secure Hadoop cluster.(CDAP-7439)
  • Improves how MapReduce configures its inputs, such that failures surface immediately.(CDAP-7476)
  • Fixed an issue in MapReduce that caused skipping the destroy() method if the committing of any of the dataset outputs failed.(CDAP-7477)
  • DynamicPartitioner can now limit the number of open RecordWriters to one, if the output partition keys are grouped.(CDAP-7557)
  • Added support for specifying the Hive execution engine at runtime (dynamically).(CDAP-7659)
  • Adds the cluster.name property that identifies a cluster; this property can be set in the cdap-site.xml file.(CDAP-7761)
  • Added a step in the CDAP Upgrade Tool to upgrade the specification of the MetadataDataset.(CDAP-7797)

Bug Fixes

  • A MapReduce job using either a FileSet or PartitionedFileSet as input no longer fails if there are no input partitions.(CDAP-2945)
  • The Authentication server announce address is now configurable.(CDAP-4535)
  • Fixed a problem with downloading of large (multiple gigabyte) CDAP Explore queries.(CDAP-5012)
  • Fixed an issue where the metadata of streams was not being updated when the stream's schema was altered.(CDAP-5061)
  • Fixed an issue where a warning was logged instead of an error when a MapReduce job failed in the CDAP SDK.(CDAP-5372)
  • Updated the default CDAP UI port to 11011 to avoid conflicting with Accumulo and Cloudera Manager's Activity Monitor.(CDAP-5897)
  • Authentication handler APIs have been updated to restrict which cdap-site.xml and cdap-security.xml properties are available to it.(CDAP-6398)
  • Fixed an issue with searching for an entity in Cask Tracker by metadata after a tag with the same prefix has been removed.(CDAP-6404)
  • Fixed an issue with misleading log messages from the RunRecord corrector.(CDAP-7031)
  • Fixed an issue so as to significantly reduce the chance of a schedule misfire in the case where the CPU cannot trigger a schedule within a certain time threshold.(CDAP-7116)
  • Fixed a problem with duplicate logs showing for a running program.(CDAP-7138)
  • On an incorrect ZooKeeper quorum configuration, the CDAP Upgrade Tool and other services such as Master, Router, and Kafka will timeout with an error instead of hanging indefinitely.(CDAP-7154)
  • Fixed an issue in the CDAP Upgrade Tool to allow it to run on a CDAP instance with authorization enabled.(CDAP-7175)
  • Fixed an issue where macros were not being substituted for postaction plugins.(CDAP-7177)
  • Lineage information is now returned for deleted datasets.(CDAP-7204)
  • Fixed an issue with the FileBatchSource not working with Azure Blob Storage.(CDAP-7248)
  • Fixed an issue with CDAP Explore using Tez on Azure HDInsight.(CDAP-7249)
  • Fixed an issue where dataset usage was n...
Read more

Cask Data Application Platform 3.6.0

06 Oct 01:07
Compare
Choose a tag to compare

Improvements

  • Allow concurrent runs of different versions of a service. A RouteConfig can be uploaded to configure the percentage of requests that need to be sent to the different versions. (CDAP-5771)
  • Improved the PartitionedFileSet to validate the schema of a partition key. Note that this will break code that uses incorrect partition keys, which was previously silently ignored. (CDAP-7281)
  • All non-versioned endpoints are now directed to applications with a default version. Added test cases with a mixed usage of the new versioned endpoints and the corresponding non-versioned endpoints. (CDAP-7343)
  • Added an upgrade step that adds a default version ID to jobs and triggers in the Schedule Store. (CDAP-7366)
  • The Log HTTP Handler and Router have been fixed to allow the streaming of larger logs files. (CDAP-7385)
  • Added an HTTP RESTful API to create applications with a version. (CDAP-7264)
  • Added an HTTP RESTful API to start or stop programs of a specific application version. (CDAP-7265)
  • Added an upgrade step that adds a default application version to existing applications. (CDAP-7266)
  • Added an HTTP RESTful API to store, fetch, and delete RouteConfigs for user service endpoint routing control. (CDAP-7268)
  • User services now include their application version in the payload when they announce themselves in Apache Twill. (CDAP-7272)

Bug Fixes

  • Unit Test framework now has the capability to exclude scala, so users can depend on their own version of the library. (CDAP-3822)
  • Fixed an issue where dataset usage was not being recorded after an application was deleted. (CDAP-7250)
  • Fixed a problem with the documentation example links to the CDAP ETL Guide. (CDAP-7314)
  • Fixed a problem with upgrading CDAP using the CDAP Upgrade Tool. (CDAP-7321)
  • Fixed a problem with the upgrade tool while upgrading HBase coprocessors. (CDAP-7324)
  • Fixed a problem with the listing of applications not returning the application version correctly. (CDAP-7334)
  • Fixed a problem with using "Download All" logs in the browser log viewer by having it fetch and stream the response to the client. (CDAP-7353)
  • Fixed a problem with NodeJS buffering a response before sending it to a client. (CDAP-7359)
  • Fixed a problem with log file corruption if the log saver container crashes due to being killed by YARN. (CDAP-7361)
  • Fixed a problem with the CDAP UI not handling "5xx" error codes correctly. (CDAP-7364)
  • Fixed Hydrator Studio in the Windows version of Chrome to allow users to open and edit a node configuration. (CDAP-7374)
  • Fixed an error in the "CDAP Introduction" tutorial's "Transforming Your Data" example of an application configuration. (CDAP-7386)
  • Fixed an issue that caused unit test failures when using org.hamcrest classes. (CDAP-7391)
  • Fixed an issue where the Java process corresponding to the MapReduce application master kept running even if the application was moved to the FINISHED state. (CDAP-7392)
  • Fixed a problem with Hydrator pipelines using a DBSource not working in an HDP cluster. (HYDRATOR-791)
  • Fixed a problem with Spark data pipelines not supporting argument values in excess of 64K characters. (HYDRATOR-948)