Cloudera Deploy is a toolset for deploying the Cloudera Data Platform (CDP). Its scope includes Public Cloud and Private Cloud products, Private Cloud Base clusters, and application setup, execution, and other post-deployment functions.
You can use Cloudera Deploy as your entrypoint for getting started with CDP. The toolset uses straightforward configuration definitions to instruct the automation functions, yet is extensible and highly configurable. The toolset can be a great foundation for custom entrypoints, CI/CD pipelines, and development environments.
Get the latest version of Docker.
-
Linux users, use your package manager.
ℹ️
|
Git is required if you intend to clone the software for local editing, if you just intend to Run the automation tools you may skip this step. |
There are excellent instructions for installing Git on all Operating Systems on the Git website
Get the latest version of the AWS CLI.
-
Linux users, use your package manager.
ℹ️
|
The Quickstart image prepackages the AWS CLI. |
If this is the first time you are installing the AWS CLI, configure the program by providing your credentials.
aws configure
Visit the AWS CLI User Guide for further details regarding credential management.
Get the latest version of the CDP CLI.
ℹ️
|
The Quickstart image prepackages the CPI CLI. |
If this is the first time you are installing the CDP CLI, you will need to configure the program by providing the right credentials.
cdp configure
Visit the CDP CLI User Guide for further details regarding credential management.
Ensure that you have a generated SSH keypair for your local profile. Visit the SSH Keygen How-To for details.
ℹ️
|
The Quickstart will generate a SSH keypair if none is provided. |
Ensure that you have a properly configured SSH Agent. Visit the SSH Agent How-To for details.
💡
|
Put AWS CLI and CDP CLI programs in your $PATH to make these two programs easily accessible.
|
The quickstart.sh
script will setup the Docker container with the software dependencies you need for deployment.
curl https://raw.githubusercontent.com/cloudera-labs/cloudera-deploy/main/quickstart.sh -o quickstart.sh
Clone this, i.e. the cloudera-deploy
, repository, which contains the quickstart.sh
script.
git clone https://github.com/cloudera-labs/cloudera-deploy.git
cd cloudera-deploy
|
Be careful to not modify any of the files in the project as a user of the software. The vast majority of changes are managed through configurations provided to these project files. |
Check that Docker is running.
-
Linux users, use
systemd
(or another init system)
Run the quickstart.sh
entrypoint script. This script will prepare and execute the Ansible Runner container.
chmod +x quickstart.sh
./quickstart.sh
Confirm that you have the orange cldr (build)-(version) #>
prompt.
This is your interactive Ansible Runner environment and provides builtin access to the relevant dependencies for CDP.
❗
|
Do NOT run the example definition until you have made the changes below. |
Modify your local cloudera-deploy
user profile. Your profile is present in your $HOME
directory under ~/.config/cloudera-deploy/profiles/default
.
vim ~/.config/cloudera-deploy/profiles/default
-
Recommended
-
admin_password: Note the password requirements (see the profile template comments).
-
name_prefix: Note the namespace requirements (see the profile template comments).
-
infra_type: The valid values are
aws
,gcp
,azure
. -
infra_region: Region is dependent on the value provided in
infra_type
.
-
-
Optional
-
tags (see the profile template comments)
-
|
Please ensure you provide a valid region for the infra_type property.
|
Before running a Deployment, it is good practice to check that the credentials available to the Automation software are functioning correctly and match the expected accounts - generally it is good practice to compare the user and account IDs produced in the terminal match those found in the Browser UI.
If you are deploying CDP Public, check your credential is available in your profile
cdp iam get-user
💡
|
If you do not yet have a CDP Public credential, follow the Cloudera Documentation here |
If you are using AWS cloud infrastructure, check your credential is available in your profile
aws iam get-user
If you are using Azure cloud infrastructure, check you are logged into your account and your credentials are available
az account list
💡
|
If you cannot list your Azure accounts, consider using az login to refresh your credential
|
If you are using GCP cloud infrastructure, check your service account credential is being picked up.
|
You need a provisioning Service Account for GCP setup in your cloudera-deploy user profile 'gcloud_credential_file' entry. If you do not yet have a Provisioning Service Account you can follow this process in the CDP Documentation to generate one.
|
gcloud auth list
Run the main playbook with the defaults and your configuration at the orange cldr prompt.
ℹ️
|
This will create a ' CDP sandbox', which is both a CDP Public Environment and CDP Private Base cluster using your default Cloud Infrastructure Provider credentials. Many other deployments are possible and explained elsewhere. |
ansible-playbook /opt/cloudera-deploy/main.yml -e "definition_path=examples/sandbox" \
-t run,default_cluster -vvv
The logs are present at $HOME/.config/cloudera-deploy/log/latest-<currentdate>
tail -100f $HOME/.config/cloudera-deploy/log/latest-2021-05-08_150448
❗
|
The total time to deploy varies from 90 to 150 minutes, depending on CDN, network connectivity, etc. Keep checking the logs; if there are no errors, the scripts are working in the background. |
Cloudera-Deploy is regularly updated by the maintainers with new features and fixes.
The quickstart.sh
script will check for an updated Container image to use if there is currently no Container running.
You may use the following process to trigger this behavior.
|
This will close any active cldr sessions you may have running.
|
Stop the cloudera-deploy Docker Container
docker stop cloudera-deploy
|
If you have made local uncommitted changes to cloudera-deploy, you must resolve them before updating |
In the cloudera-deploy directory, pull the latest changes with git
git pull
Finally, rerun the quickstart to download the latest image.
💡
|
You can stop the Docker Container and rerun the quickstart at any time to download the latest image |
./quickstart.sh
🔥
|
Don’t change the project configuration without getting comfortable with the quickstart a few times. |
ℹ️
|
Below pages will be migrated to Github pages shortly. |
Cloudera Deploy is powered by Ansible and provides a standard configuration and execution model for CDP deployments and their applications. It can be run within a container, or directly on a host.
Specifically, Cloudera Deploy is an Ansible project that uses a set of playbooks, roles, and tags to construct a runlevel-like management experience for cloud and cluster deployments. It leverages several collections, both Cloudera and third-party.
Cloudera Deploy requires a number of host applications, services, and Python libraries for its execution. These dependencies are already packaged for ease-of-use in Cloudera Labs Ansible-Runner, another project within Cloudera Labs, and are made readily accessible through the quickstart.sh
script.
Alternatively, and especially if you plan on running Cloudera Deploy in your own environment, you may install the dependencies yourself.
Cloudera Deploy relies directly on a number of Ansible collections:
And roles:
-
geerlingguy.postgresql
-
ansible-role-mysql
These collection dependencies can be found in the ansible.yml
file in the cldr-runner
project.
Cloudera Deploy does have a single dependency for its own execution, the community.crypto
collection. To install all of these dependencies, you can run the following:
# Get the cldr-runner dependency file first
curl https://raw.githubusercontent.com/cloudera-labs/cldr-runner/main/payload/deps/ansible.yml \
--output requirements.yml
# Install the collections (and their dependencies)
ansible-galaxy collection install -r requirements.yml
# Install the roles
ansible-galaxy role install -r requirements.yml
# Install the crypto collection
ansible-galaxy collection install community.crypto
The supporting Python libraries and other clients can be installed using the various dependencies files in the cldr-runner
project directly. You might find it easier to follow the installation instructions for cloudera.exe
and cloudera.cluster
, the two collections that drive this set of dependencies.
For the community.crypto
collection dependency, you will need to ensure that the ssh-keygen
executable is on your Ansible controller.
The dependencies cover the full range of the automation tooling, from infrastructure on public or private cloud to the relevant Cloudera platform assets. If you are only working with a limited part of the tooling, then you may not need the full list of dependencies. e.g., if you are only working with AWS infrastructure, it is safe to only install those dependencies or use the tagged cldr-runner
version.
Cloudera Deploy does require a small set of user-supplied information for a successful deployment. A minimum set of user inputs is defined in a profile file (see the profile.yml template for details). For example, the profile.yml
should define your password for the Administrator account of the deployed services, and you should set a unique name_prefix
to avoid clashing with other deployments.
The default location for profiles is ~/.config/cloudera-deploy/profiles/
. Cloudera Deploy looks for the default
file in this directory unless the Ansible runtime variable profile
is set, e.g. -e profile=my_custom_profile
. Creating additional profiles is simple, and you can use the profile.yml
template as your starting point.
For CDP Public Cloud, you will need an Access Key and Secret set in your user profile. The tooling uses your default profile unless you instruct it otherwise. (See Configuring CDP client with the API access key.)
For Azure and AWS infrastructure, the process is similar, and these parameters may likewise be overridden.
For Google Cloud, we suggest you issue a credentials file, store it securely in your profile, and then provide the path to that file in profile.yml
, as this works best with both CLI and Ansible Gcloud interactions.
We suggest you set your default infra_type
in profile.yml
to match your preferred default Public Cloud Infrastructure credentials.
For CDP Private Cloud you will need a valid Cloudera license file in order to download the software from the Cloudera repositories. We suggest this is stored in your user profile in ~/.cdp/
and set in the profile.yml
config file.
If you are also using Public Cloud infrastructure to host your CDP Private Cloud clusters, then you will need those credentials as well.
For CDP Private Cloud clusters and other direct inventory scenarios, you will need to manage SSH host key validation appropriate to your specific environment.
❗
|
By default, the quickstart.sh script explicitly sets the ANSIBLE_HOST_KEY_CHECKING variable to False for ease-of-use with an introductory deployment. However, this setting is not recommended for any other deployment type. For all other deployment types, you should directly manage your SSH host key checking.
|
A common approach is to create your own "startup" script using the quickstart.sh
as a template, and setting the appropriate Ansible SSH configuration variables.
In some scenarios, for example, a reused pool of dynamic hosts within a development Openstack environment, you might wish to manage this control from your host machine’s SSH config file. For example:
# ~/.ssh/config
# Disable host key checking only for your specific environment
Host *.your.development.domain
StrictHostKeyChecking no
These settings will flow from your host to the Docker container’s environment if you use the quickstart.sh
script.
Cloudera Deploy utilizes a single entrypoint playbook — main.yml
— that examines the user-provided profile details, a deployment definition, and any optional Ansible tags
and then runs the appropriate actions. At minimum, you execute a deployment like so:
ansible-playbook <location of cloudera-deploy>/main.yml \
-e "definition_path=<absolute or relative directory to main.yml>"
ℹ️
|
The location defined by definition_path is relative to the location of the main.yml playbook, and can also be an absolute location.
|
Cloudera Deploy exposes a set of Ansible tags that allows fine-grained inclusion and exclusion of functions, in particular, a runlevel-like management process.
|
Infrastructure (cloud provider assets) |
|
Platform (CDP Public Cloud Datalakes). Assumes |
|
Runtime (CDP Public Cloud experiences, e.g. Cloudera Machine Learning (CML)). Assumes |
|
CDP Private Cloud Base Clusters. |
Current Tags: verify_inventory, verify, full_cluster, default_cluster, verify_definition, custom_repo, verify_parcels, database, security, kerberos, tls, ha, os, users, jdk, mysql_connector, oracle_connector, fetch_ca, cm, license, autotls, prereqs, restart_agents, heartbeat, mgmt, preload_parcels, kts, kms, restart_stale, teardown_ca, teardown_all, teardown_tls, teardown_cluster, infra, init, plat, run, validate
With these tags, you can set your deployment to a given "runlevel" state:
# Ensure only the infrastructure layer is available
ansible-playbook main.yml -e "definition_path=my_example" -t infra
or select or skip a level or function:
# Ensure the platform and runtimes are available, but skip any infrastructure
ansible-playbook main.yml -e "definition_path=my_example" -t run --skip-tags infra
|
Setting a deployment to a lower runlevel, e.g. from run to infra will teardown deployed components in the higher runlevels.
|
For further details on the various runlevel-like tags for CDP Public Cloud, see the Runlevel Guide in the cloudera.exe
project.
Cloudera Deploy uses a set of configuration files within a directory to define and coordinate a deployment. This directory also stores any artifacts created during the deployment, such as Ansible inventory files, CDP environment readouts, etc.
The main.yml
entrypoint playbook expects the runtime variable definition_path
which should point at the absolute or relative (to the playbook) directory hosting these configuration files.
Within the directory, you must supply the following files:
-
definition.yml
-
application.yml
Optionally, if deploying a CDP Private Cloud cluster or need to set up adhoc IaaS infrastructure, you can supply the following :
-
inventory_static.ini
-
inventory_template.ini
The definition directory can host any other file or asset, such as data files, additional configuration details, additional playbooks. However, Cloudera Deploy will not operate unless the definition.yml
and application.yml
files are present.
The required definition.yml
file contains top-level configuration keys that define and direct the deployment.
|
Hosting infrastructure to manage |
|
CDP Public Cloud Environment deployment (on the infrastructure) |
|
CDP Private Cloud Cluster deployment (on the Infrastructure) |
|
|
|
Within the top-level keys, you may override the defaults appropriate to that section.
You may also add other top-level configuration keys if your automation requires it, e.g. if your application.yml
playbook needs its own configuration details.
More detailed documentation of all the options is beyond the scope of this introductory readme; further documentation is forthcoming.
The required application.yml
file is not a configuration file, it is actually an Ansible playbook. At minimum, this playbook requires a single Ansible play; a basic no-op task works well if you wish to take no additional actions beyond the core deployment.
For more sophisticated post-deployment actitivies, you can expand this playbook as much as needed. For example, the playbook can interact with hosts and inventory, execute computing jobs on deployment environments, and include additional playbooks and configuration files.
ℹ️
|
This file is a standard Ansible playbook, and when it is executed (via import_playbook ) by the main.yml entrypoint, the working directory of the Ansible executable is changed to the directory of the application.yml playbook.
|
You may also include an inventory_static.ini
file that describes your static Ansible inventory. This file will be automatically loaded and added to the Ansible inventory. Note that you can also use the standard Ansible -i
switch to include other static inventory.
If included, Cloudera Deploy will use a definition’s inventory_template.ini
file, which describes a set of dynamic host inventory, and provision these hosts as infrastructure for the deployment, typically for a CDP Private Cloud cluster.
ℹ️
|
This currently only works on AWS. |
Copyright 2021, Cloudera, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.