Skip to content

Description of some databricks workshops and learning material

License

Notifications You must be signed in to change notification settings

paalvibe/databricks-workshops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Databricks Workshops

Description of some databricks workshops and learning material we have developed at Knowit.

Workshops (Knowit toppturer)

These workshops are 2.5h hands-on workshops for learning various important aspects of databricks.

At Knowit we call these workshops Toppturer, giving quick but meaningful experience with a technology/tool/framework.

Available workshops:

  • Workshop: Data engineering on Databricks
  • Workshop: Using LangChain and open LLM-models on Databricks
  • Workshop: LLM Adaptation on Databricks
  • Workshop: DataOps on Databricks, using git and versioning of tables, jobs and code
image

Workshop: Data engineering on Databricks

Link: https://github.com/knowit/AWS-Databricks-NYC-Taxi-Workshop

For: Developers, analysts, data scientists, data engineers.

Pre-requisites: Some python knowledge

Topics:

  • Basic understanding of components and tools in Databricks
  • Perform data transformation in Spark SQL and Pyspark
  • Use Databricks Reops for git-versioned Data Engineering
  • Deploy a Spark job with Databricks Workflows
  • Write ETL code and data quality checks in Delta Live Tables

Link:

Workshop: Using LangChain and open LLM-models on Databricks

Link: https://github.com/paalvibe/llm-langchain-course

For: Anybody

Topics:

  • Setup and use of LLMs in Databricks
  • Use of Langchain-rammeverket for:
    • LLM-wrapping
    • LLM-serving
    • Summarizing
  • Context embedding with chromadb
  • Reformating
  • Multi query retrieval
  • Prompt engineering

Workshop: LLM Adaptation on Databricks

Link: https://github.com/paalvibe/llm-tune-course

For: Anybody

Topics:

  • What is an LLM (Large Language Model)?
  • Tuning of LLM models on Databricks
  • Different modes of adapting LLMs
  • When and when not to train your own LLM?

Workshop: DataOps on Databricks, using git and versioning of tables, jobs and code

Link: https://github.com/paalvibe/databricks-dataops-course

For: Data Engineers, Full stack data scientists, ML Engineers, Data Platform Engineers

Topics:

  • Opinionated git-based approach to DataOps
  • Structure your environments to allow for dev runs of data pipelines
  • Move data pipelines from dev to prod
  • Using git branches and commits to name and manage data and jobs responsibly
  • Will not do Github Actions here, but the processed needed are used
  • Does not cover data quality nor pipeline management

Pre-requisites: Some python knowledge

FUTURE Workshop: DataOps on Databricks part 2

For: Data Engineers, Full stack data scientists, ML Engineers, Data Platform Engineers

  • How to enable data contracts and data quality checks in pipelines
  • Difference between Delta Live Tables and regular databricks notebooks

Pre-requisites: Some python knowledge

About

Description of some databricks workshops and learning material

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published