Skip to content

Apache Spark Consuming Kafka Processing PostgreSql To Redshift

Notifications You must be signed in to change notification settings

mdmamunhasan/streamsql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Requirements

  1. Maven
  2. Apache Spark
  3. Scala

Clone the repo

Use the following commands:

  1. sudo yum install git

  2. git clone https://github.com/mdmamunhasan/streamsql.git

  3. cd streamsql

Install the code

Use the following command: mvn clean install

Reference

This post demonstrates how to set up Apache Kafka on Amazon EC2, use Spark Streaming on Amazon EMR to process data coming in to Apache Kafka topics, and query streaming data using Spark SQL on Amazon EMR.

This repo provides:

  • An AWS CloudFormation stack to set up Apache Kafka on Amazon EC2
  • Scripts/code to create the Apache Kafka topic and producer
  • Spark Streaming and Spark SQL code to run on Amazon EMR

For more information about how to set everything up, see the post.

https://github.com/awslabs/aws-big-data-blog.git

About

Apache Spark Consuming Kafka Processing PostgreSql To Redshift

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages