Description of some databricks workshops and learning material we have developed at Knowit.
These workshops are 2.5h hands-on workshops for learning various important aspects of databricks.
At Knowit we call these workshops Toppturer, giving quick but meaningful experience with a technology/tool/framework.
Available workshops:
- Workshop: Data engineering on Databricks
- Workshop: Using LangChain and open LLM-models on Databricks
- Workshop: LLM Adaptation on Databricks
- Workshop: DataOps on Databricks, using git and versioning of tables, jobs and code
Link: https://github.com/knowit/AWS-Databricks-NYC-Taxi-Workshop
For: Developers, analysts, data scientists, data engineers.
Pre-requisites: Some python knowledge
Topics:
- Basic understanding of components and tools in Databricks
- Perform data transformation in Spark SQL and Pyspark
- Use Databricks Reops for git-versioned Data Engineering
- Deploy a Spark job with Databricks Workflows
- Write ETL code and data quality checks in Delta Live Tables
Link:
Link: https://github.com/paalvibe/llm-langchain-course
For: Anybody
Topics:
- Setup and use of LLMs in Databricks
- Use of Langchain-rammeverket for:
- LLM-wrapping
- LLM-serving
- Summarizing
- Context embedding with chromadb
- Reformating
- Multi query retrieval
- Prompt engineering
Link: https://github.com/paalvibe/llm-tune-course
For: Anybody
Topics:
- What is an LLM (Large Language Model)?
- Tuning of LLM models on Databricks
- Different modes of adapting LLMs
- When and when not to train your own LLM?
Link: https://github.com/paalvibe/databricks-dataops-course
For: Data Engineers, Full stack data scientists, ML Engineers, Data Platform Engineers
Topics:
- Opinionated git-based approach to DataOps
- Structure your environments to allow for dev runs of data pipelines
- Move data pipelines from dev to prod
- Using git branches and commits to name and manage data and jobs responsibly
- Will not do Github Actions here, but the processed needed are used
- Does not cover data quality nor pipeline management
Pre-requisites: Some python knowledge
For: Data Engineers, Full stack data scientists, ML Engineers, Data Platform Engineers
- How to enable data contracts and data quality checks in pipelines
- Difference between Delta Live Tables and regular databricks notebooks
Pre-requisites: Some python knowledge