Workshop Material: for Near RealTime Predictive Analytics with Apache Spark Structured Streaming Workshop
Open Data Science Conference WEST 2019
Find me on Twitter: @newfront Find me on Medium @newfrontcreative About Twilio: Twilio
- Docker (at least 2 CPU cores and 8gb RAM)
- System Terminal (iTerm, Terminal, etc)
- Working Web Browser (Chrome or Firefox)
Install Docker Desktop (https://www.docker.com/products/docker-desktop)
Additional Docker Resources:
- 2 or more cpu cores.
- 8gb/ram or higher.
- Install Docker (See Docker above)
- Once Docker is installed. Open up your terminal application and
cd /path/to/odsc-west-2019-realtime-analytics/docker
./run.sh install
./run.sh start
The initial download can take some time depending on your WiFi connection. Expect this to take around 5-10 minutes and fingers crossed it goes faster!
The ./run.sh init
process will 1.) download Apache Spark and untar it into docker/spark-2.4.4
and 2.) unzip
the wine reviews data set from docker/data
.
The ./run.sh start
will 1.) download the official Apache Zeppelin
docker image, and 2.) download the official Redis
docker image. It will then run docker compose
on redis followed by zeppelin. Zeppelin will use the spark version (2.4.4
) that you downloaded in the init
phase so we are running on the latest and greatest Spark.
- The Main Application should now be running at http://localhost:8080/
- Go to http://localhost:8080/#/interpreter on your Web Browser
- Search for
spark
in theSearch Interpreters
input field. - Click the
edit
button to initiate editing mode.
Add the following key/values.
- spark.redis.host redis5
- spark.redis.port 6379
Updated the following key/values
- spark.cores.max 2
- spark.executor.memory 8g
- Add
com.redislabs:spark-redis:2.4.0
- Click
Save
and these settings will be applied to the Zeppelin Runtime.
docker exec -it redis5 redis-cli
xadd books-liked * userId 1 bookId 3
These events will now be preocessed in spark-2.4.4 foreachBatch