Skip to content

Latest commit

 

History

History
72 lines (62 loc) · 2.36 KB

CURRICULUM.md

File metadata and controls

72 lines (62 loc) · 2.36 KB

Data Engineering on GCP

Getting Started with Data Engineering on GCP

  • Pre-requisite Skills for the course
  • Overview of Google Cloud Platform
  • Signing up for GCP
  • Setup Google Cloud SDK
  • Overview of Analytics Services on GCP

Setting up Data Lake using GCS

  • Setup GCS Bucket
  • Overview of GCS Web UI
  • Overview of gsutil
  • Overview of Data Sets
  • Manage Files in GCS using gsutil commands
  • Copy Retail Data Set to GCS using gsutil commands
  • Manage Files in GCS using Python
  • Overview of processing data in GCS using Pandas

Getting Started with GCP Secrets Manager

  • Overview of Secrets Manager
  • Create Secret using UI
  • Managing Secrets using Google Cloud CLI
  • Reading Secret Details using Python
  • Use Cases of Secrets

Setup Postgres Database using Cloud SQL

  • Overview of Cloud SQL
  • Setup Postgres Database Server using Cloud SQL
  • Configure Network to Connect to Database
  • Install PostgreSQL Database Server
  • Configure psql CLI
  • Connect to Postgres Database using pgAdmin
  • Create Database for Retail using Postgres
  • Setup Tables and Load Data using Postgres

Data Warehouse using Google Big Query

  • Getting Started with Google Big Query
  • Overview of Running Queries using Public Data Sets
  • Setup Database for Retail in Google Big Query
  • Setup Tables and Load Data using Google Big Query
  • Compute Daily Product Revenue using Google Big Query
  • Create Dimensions and Facts using Google Big Query
  • Cumulative Aggregations and Ranking using Google Big Query

Data Processing using Google Functions

  • Overview of Google Cloud Functions
  • Validate GCS Bucket
  • Create Google Cloud Function for Cloud Storage
  • Update Requirements for the Project
  • Review File Converter Logic
  • Deploy File Converter Logic

Big Data Processing using Google Dataproc

  • Overview of GCP Dataproc
  • Setup Single Node Hadoop and Spark Cluster using Dataproc
  • Setup Remote Development Environment using VS Code
  • Review Data Sets
  • Run Spark SQL Commands or Scripts using Dataproc
  • Run Pyspark Applications using Dataproc
  • Review Spark Jobs on Dataproc using Spark UI

Big Data Processing using Databricks on GCP

  • Overview of Databricks on GCP
  • Overview of DBFS
  • Mount GCS Buckets on DBFS
  • Run Spark SQL Commands or Scripts using Databricks Workflows
  • Run Pyspark Applications using Databricks
  • Review Spark Jobs on Dataproc using Spark UI

Orchestration using Cloud Composer