Reactive spark streaming example

This is a fully dockerized Spark streaming example using a Kafka queue to explore the new back-pressure feature introduced in Spark 1.5.

Install docker-machine & docker-compose (require >= 1.5.1)
Create / start the virtual machine
docker-machine create rspark --driver virtualbox or
docker-machine start rspark

Update your environment

eval "$(docker-machine env rspark)"
export DOCKER_HOST_IP=`docker-machine ip rspark`

Run zookeeper, kafka, kafkamanager, spark master and worker

docker-compose up zookeeper kafka kafkamanager sparkmaster sparkworker

Configure kafkamanager and add the required topic
- Zookeeper: zk:2181
- Kafka version: 0.8.2.1
- Topic: numbers
Build and submit the spark streaming consumer
Unfortunately Spark expects jobs to be submitted to spark://master:7077 (the docker internal hostname), so just map it in your /etc/hosts.
```
sbt consumer/assembly consumer/sparkSubmit
```

Build and start the containerized producer

sbt producer/docker:publishLocal
docker-compose up producer

Tweak the producer settings in docker-compose.yml and observe how your processing time / processing delay changes.
Your Spark streaming UI should be here, but can always be found on the Spark master UI.
- PRODUCER_RATE: the event rate
- PRODUCER_MIN_DELAY: the min. processing weight of the produced messages
- PRODUCER_MAX_DELAY: the max. processing weight of the produced messages
```
docker-compose up producer
```
Tweak the Spark streaming settings in SparkConsumer.scala and resubmit the job.
```
sbt consumer/assembly consumer/sparkSubmit
```

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
consumer		consumer
producer		producer
project		project
spark		spark
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
build.sbt		build.sbt
docker-compose.yml		docker-compose.yml
readme.md		readme.md

Provide feedback