Skip to content

vsellappa/cdpdcsdx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

CDP SDX Workshop

Colophon

Version: 0.1 : Jan 25, 2020 : Draft

What

Cloudera SDX / Security Governance centric workshop using CDP Data Centre 7.x.

Pre-Requisites (For Instructors)

  1. How to Build the AMI

    1. Start with a centos7 AMI w/o license (e.g. ami-00846a67 in eu-west-2) using at least m4.4xlarge 100GB disk.

    2. Copy CDF parcels to /root per instructions here.

    3. Run below script.

      curl -sSL https://gist.github.com/abajwa-hw/563f86ef80cc21ba4980767eb02225f3/raw | sudo -E sh
      1. Changes the hostname to logical name e.g. cdp.cloudera.com

      2. Sets up KDC

      3. Updates /etc/rc.local to regen hosts file and start services on boot

      4. Runs SingleNodeCDP script to install CDP-DC (including kerberos) and deploys demo

    4. Attach SSH key/pair

      SSH key/pair to attach to AMI instance.

  2. Whats included

    1. Single Node CDP 7.x

      1. Kerberos, for authentication (via local MIT KDC)

      2. Ranger, for authorization (via both resource/tag based policies for access and masking)

      3. Atlas, for governance (classification/lineage/search)

      4. Zeppelin, for running/visualizing Hive queries

      5. Hive 3, for Sql access and ACID capabilities

      6. HiveWarehouseConnector, for running secure SparkSQL and Kafka queries

      7. Hue, for query execution and visualisation

      8. DAS, for query execution and visualisation

    2. Worldwide Bank Demo/Workshop artifacts

      1. Demo hive tables

      2. Demo tags/attributes and lineage in Atlas

      3. Demo Zeppelin notebooks to walk through demo scenario

      4. Ranger policies across HDFS, Hive, Hbase, Kafka, Atlas to showcase:

      5. Tag based policies across CDP components

      6. Row level security in Hive columns

      7. Dynamic tag based masking in Hive columns

      8. Hive UDF execution authorization

      9. Atlas capabilities like

        1. Classifications (tags) and attributes

        2. Tag propagation

        3. Data lineage

        4. Business glossary:categories and terms

        5. GDPR Scenarios around consent and data erasure via Hive ACID

      10. Hive ACID / MERGE labs

  3. Agenda To Share

  4. Slidedeck

Participants

  1. SSH Access

  2. Change the /etc/hosts on your desktop

    1. Windows How-To

    2. Map the given cluster IP Address to cdp.cloudera.com

  3. Ensure you can connect to the following services using your browser(Firefox):

    Service URL Credentials

    Cloudera Manager

    http://cdp.cloudera.com:7180/

    admin/admin

    Ranger

    http://cdp.cloudera.com:6080/

    admin/BadPass#1

    Atlas

    http://cdp.cloudera.com:31000/

    admin/BadPass#1

    Zeppelin

    http://cdp.cloudera.com:8885/

    joe_analyst/BadPass#1, ivanna_eu_hr/BadPass#1, etl_user/BadPass#1

    Note
    In some cases you might not be able to login via cdp.cloudera.com (due to browser restrictions) , in such cases use the same port with public-dns names.
  4. Sanity Check

    1. Login to Cloudera Manager and check if all services are green.

      username: admin
      password: admin
  5. Presentation Deck

Lab 101

  1. Open Zeppelin and login as joe_analyst. Find his notebook by searching for worldwide using the text field under ‘Notebook’ section. Select the notebook called: Worldwide Bank - Joe Analyst lab101 zeppelin search

  2. On the first launch of the notebook, you will be prompted to choose interpreters. You can keep the defaults but make sure you click Save button. lab101 zeppelin interpreters

  3. Walkthru the workshop via Zeppelin notebooks

About

CDPDc SDX Lab

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published