Dotingestion 2: there is no Dotingestion 1

Project structure

.
├── api                     # Rest API associated with the data (ExpressJS)
├── cassandra               # Cassandra Dockerfile and init script (Cassandra)
├── connect-cassandra       # Kafka-Cassandra sink Dockerfile and configurations (Kafka Connect + Cassandra)
├── connect-elastic         # Kafka-Elasticsearch sink Dockerfile and configurations (Kafka Connect + Elasticsearch)
├── docs                    # Documentation files and notebooks
├── ingestion               # Data ingestion (python script + Kafka)
├── kubernetes              # Kubernetes configuration files (Kubernetes)
├── spark                   # Spark Dockerfile and python scripts (Spark + python script)
├── stream                  # Kafka stream application to filter and enrich the input data (Kafka Streaming)
├── .gitattribute           # .gitattribute file
├── .gitignore              # .gitignore file
├── docker-compose.yaml     # Base docker-compose file. Starts all the applications
├── LICENSE                 # Licence of the project
└── README.md               # This file

Brief description

This is a project created for the subject Technologies for Advanced Programming or TAP at the university of Catania or UniCT.
The idea is to showcase a simple ETL pipeline using some of the most widely known technologies in the big data fields.
The main inspiration for this project was the OpenDota project, more specifically the "core" part which is opensource.
Raw data comes from the WebAPI provided by Steam (Valve).

Technologies used

Step	Technology used
Data source	Steam API
Data transport	Apache Kafka
Data processing	Apache Kafka streams - Apache Spark
Data storage	Apache Cassandra - Elasticsearch
Data visualization	Kibana
Programming language	Python - Java

Pipeline

Index	Service	From Kafka	To Kafka
1	Steam Web API	/	dota_raw
2	Kafka Streaming	dota_raw	dota_single - dota_lineup
3	Cassandra	dota_single	/
4	Dotingestio2 API	dota_request	dota_response
5	Spark	dota_lineup - dota_request	dota_response
6	Elasticsearch	dota_single	/

Quickstart local (Docker)

System requirements

Steps

To run the Elasticsearch container you may need to tweak the vm.max_map_count variable. See here
Download DataStax Apache Kafka® Connector and place it in the connect-cassandra directory
Make sure you are in the root directory, with the docker-compose.yaml file

Create an ingestion/settings.yaml file with the following values (see ingestion/settings.yaml.example)

# You need this to access the Steam Web API, which is used to fetch basic match data. You can safely use your main account to obtain the API key. You can request an API key here: https://steamcommunity.com/dev/apikey
api_key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# Steam Web API endpoint. You should not modify this unless you know what you are doing
api_endpoint: http://api.steampowered.com/IDOTA2Match_570/GetMatchHistoryBySequenceNum/V001/?key={}&start_at_match_seq_num={}
# Kafka topic the producer will send the data to. The Kafka streams consumer expects this topic
topic: dota_raw
# Interval between each data fetch by the python script
interval: 10
# 3 possible settings can be placed here:
# - The sequential match id of the first match you want to fetch, as a string
# - 'cassandra', will fetch the last sequential match id in the cassandra database
# - 'steam', will fetch the most recent sequential match id from the "history_endpoint"
match_seq_num: 4976549000 | 'steam' | 'cassandra'
# Steam API Web endpoint used when 'steam' value is placed in "match_seq_num"
history_endpoint: https://api.steampowered.com/IDOTA2Match_570/GetMatchHistory/V001/key={}&matches_requested=1

All the values present in the settings file can be overwritten by any environment variable whit the same name in all caps

Start:
```
docker-compose up
```
Stop:
```
docker-compose down
```

Quickstart local (Kubernetes)

System requirements

Steps

To run the Elasticsearch container you may need to tweak the vm.max_map_count variable. See here
Make sure you are in the root directory, with the all-in-one-deploy.yaml file
Make sure to edit the kubernetes/kafkaproducer-key.yaml file to add your Steam Web API key. All the settings shown above will be determined by the environment variable whit the same name in all caps
Start:
```
kubectl apply -f all-in-one-deploy.yaml
```

Stop:

kubectl delete -f all-in-one-deploy.yaml

Useful commands

docker exec -it <container-name> bash Get a terminal into the running container
docker system prune Cleans your system of any stopped containers, images, and volumes
docker-compose build Rebuilds your containers (e.g. for database schema updates)
kubectl -n default rollout restart deploy Restart all Kubernetes pods

TODO list

Add the much needed replay parsing to gather much more informations about each match.
Make a usable user interface to fetch the data.
Use cluster with more than one node for each of the distributed services.
Improve performances.
Use kubernetes at its fullest.
Use the recommanded security layers like passwords and cryptography.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dotingestion 2: there is no Dotingestion 1

Project structure

Brief description

Technologies used

Pipeline

Quickstart local (Docker)

System requirements

Steps

Quickstart local (Kubernetes)

System requirements

Steps

Useful commands

TODO list

Resources

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
api		api
cassandra		cassandra
connect-cassandra		connect-cassandra
connect-elastic		connect-elastic
docs		docs
ingestion		ingestion
kubernetes		kubernetes
spark		spark
stream		stream
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
docker-compose.yaml		docker-compose.yaml

License

TendTo/Dotingestion2

Folders and files

Latest commit

History

Repository files navigation

Dotingestion 2: there is no Dotingestion 1

Project structure

Brief description

Technologies used

Pipeline

Quickstart local (Docker)

System requirements

Steps

Quickstart local (Kubernetes)

System requirements

Steps

Useful commands

TODO list

Resources

About

Topics

Resources

License

Stars

Watchers

Forks

Languages