-
How to Build the AMI
-
Start with a centos7 AMI w/o license (e.g.
ami-00846a67
in eu-west-2) using at least m4.4xlarge 100GB disk. -
Copy CDF parcels to /root per instructions here.
-
Run below script.
curl -sSL https://gist.github.com/abajwa-hw/563f86ef80cc21ba4980767eb02225f3/raw | sudo -E sh
-
Changes the hostname to logical name e.g. cdp.cloudera.com
-
Sets up KDC
-
Updates /etc/rc.local to regen hosts file and start services on boot
-
Runs SingleNodeCDP script to install CDP-DC (including kerberos) and deploys demo
-
-
Attach SSH key/pair
SSH key/pair to attach to AMI instance.
-
-
Whats included
-
Single Node CDP 7.x
-
Kerberos, for authentication (via local MIT KDC)
-
Ranger, for authorization (via both resource/tag based policies for access and masking)
-
Atlas, for governance (classification/lineage/search)
-
Zeppelin, for running/visualizing Hive queries
-
Hive 3, for Sql access and ACID capabilities
-
HiveWarehouseConnector, for running secure SparkSQL and Kafka queries
-
Hue, for query execution and visualisation
-
DAS, for query execution and visualisation
-
-
Worldwide Bank Demo/Workshop artifacts
-
Demo hive tables
-
Demo tags/attributes and lineage in Atlas
-
Demo Zeppelin notebooks to walk through demo scenario
-
Ranger policies across HDFS, Hive, Hbase, Kafka, Atlas to showcase:
-
Tag based policies across CDP components
-
Row level security in Hive columns
-
Dynamic tag based masking in Hive columns
-
Hive UDF execution authorization
-
Atlas capabilities like
-
Classifications (tags) and attributes
-
Tag propagation
-
Data lineage
-
Business glossary:categories and terms
-
GDPR Scenarios around consent and data erasure via Hive ACID
-
-
Hive ACID / MERGE labs
-
-
-
Change the
/etc/hosts
on your desktop-
Map the given cluster IP Address to
cdp.cloudera.com
-
Ensure you can connect to the following services using your browser(Firefox):
Service URL Credentials Cloudera Manager
admin/admin
Ranger
admin/BadPass#1
Atlas
admin/BadPass#1
Zeppelin
joe_analyst/BadPass#1, ivanna_eu_hr/BadPass#1, etl_user/BadPass#1
NoteIn some cases you might not be able to login via cdp.cloudera.com
(due to browser restrictions) , in such cases use the same port with public-dns names. -
Sanity Check
-
Login to Cloudera Manager and check if all services are green.
username: admin password: admin
-
-
Open Zeppelin and login as joe_analyst. Find his notebook by searching for worldwide using the text field under ‘Notebook’ section. Select the notebook called: Worldwide Bank - Joe Analyst
-
On the first launch of the notebook, you will be prompted to choose interpreters. You can keep the defaults but make sure you click Save button.