- Maven
- Apache Spark
- Scala
Use the following commands:
-
sudo yum install git
-
cd streamsql
Use the following command:
mvn clean install
This post demonstrates how to set up Apache Kafka on Amazon EC2, use Spark Streaming on Amazon EMR to process data coming in to Apache Kafka topics, and query streaming data using Spark SQL on Amazon EMR.
This repo provides:
- An AWS CloudFormation stack to set up Apache Kafka on Amazon EC2
- Scripts/code to create the Apache Kafka topic and producer
- Spark Streaming and Spark SQL code to run on Amazon EMR
For more information about how to set everything up, see the post.