Cloud-based data visualization and analysis tool for telemetry data

A naive data visualization and analysis tool for F1 on board telemetry data.

Cloud-based data visualization and analysis tool for telemetry data

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contacts
Acknowledgments

About The Project

In both minor motorsport categories and racing e-sports there seems to be no easily accessible tool to collect, visualize and analyze live telemetry data. The user often has to perform complex installation tasks to run these tools on his own machine, which might not be powerful enough to handle real-time data stream analysis.

This work proposes a possible baseline architecture to implement a data visualization and analysis tool for on-board telemetry data, completely based on cloud technologies and distributed systems. The proposed system falls under the Software-as-a-Service (SaaS) paradigm and relies on Infrastructure-as-a-Service (IaaS) cloud solutions to provide hardware support to its software components.

For more info, please refer to the Project report.

Built With

This section lists all major frameworks/libraries used in this project.

Data source and front-end:

FastF1 (v. 2.2.8)
Streamlit (v. 1.9.0)

Back-end Apache services:

ZooKeeper
Kafka - KafkaPython
Spark - PySpark

(back to top)

Getting Started

To get your system up and running, follow these simple steps.

Prerequisites

First, you need to have an account on any cloud platform from which you can access cluster services. We used Google Cloud Dataproc clusters, but any other cloud provider should do.

Following the next section, this is the architecture you will end up with.

Installation

Make sure to have two clusters on which you can deploy the following technologies:

Apache ZooKeeper (v. 3.7.1) and Apache Kafka (v. 3.1.0) on one cluster.
Apache Spark (v. 3.1.2) on the other cluster.

ZooKeeper is required in order to run Kafka. The following example shows how to properly setup on each cluster node the zoo.cfg file in the conf directory under the ZooKeeper home, to run a ZooKeeper ensemble over a three-nodes cluster:
```
ticktime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=20
syncLimit=5
server.1=hostnameA:2888:3888
server.2=hostnameB:2888:3888
server.3=hostnameC:2888:3888
```
On each cluster node, the following key properties must be specified in the server.properties file, located in the config directory under the Kafka home.
- broker.id=UID (where UID is a unique ID for this broker).
- listeners=PLAINTEXT://internalIP:9092
- advertised.listeners=PLAINTEXT://externalIP:9092
- zookeeper.connect=hostnameA:2181,hostnameB:2181,hostnameC:2181/kafka_root_znode
If you're using Google Cloud Dataproc clusters, you don't need to manually install and configure Spark as it is already included in the cluster's VM image.

(back to top)

Usage

Before launching the streamlit client, make sure that:

Both Kafka and Spark clusters are up and running.
Specify the correct broker IPs and topic names in configuration.ini.
The data source is active and publishing on the correct Kafka topic. For test purposes, you could run the data stream producer process provided in this repo:
```
python ./datastream_producer.py
```

Start the Spark streaming analysis script on the spark cluster:

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2 ./structured_stream_process.py --broker <IP:port> --intopic <topicName> --outtopic <topicName>

Finally, you are ready to run the client:

streamlit run ./main.py

(back to top)

Roadmap

These are some of the features we would like to add to this project.

Add anomaly threshold real-time choice
Multidriver support (this involves kafka topics re-organization)
Add statefulness to streamlit
- Counter variables
- Data dict
Use MLlib into the Spark SS data analysis module

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

(back to top)

Contacts

Andrea Lombardi - Linkedin
Vincenzo Silvio - Linkedin
Ciro Panariello - Linkedin
Vincenzo Capone - Linkedin

(back to top)

Acknowledgments

Thanks to O'Reilly books about:

ZooKeeper
Kafka
Spark

Infrastructure-as-a-Service used for this project:

Google Cloud Dataproc

(back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Cloud-based data visualization and analysis tool for telemetry data