Use traces to describe micro-services architecture.
Microservices architectures are complex, it's never easy to write and maintain documentation around it. Add asynchronous messages heavily used in reactive applications, it becomes almost unrealistic to describe how such a system is working. It's a challenge even in monolith system. So, how to help a developer improving/fixing a system where there is nothing to guide expect the system itself ?
With initiatives such as OpenTracing or OpenCensus, traces are easy to acquire in the micro services world (pending each component is compatible with those frameworks). So what about using traces in a bottom-up approach to infer the architecture ? However, unless we know exactly what we are looking for in large amount of traces, it's quite hard to extract the right information: too much traces kill the traces. We weed something on top to make some sense of them.
This project is a PoC that goes toward that goal. It builds a sequence diagram from jaeger traces and display it in a Grafana diagram panel by leveraging mermaid to render the diagram. It's not much but it can be useful. Disclaimer: the initial idea is from: https://danlebrero.com/2017/04/06/documenting-your-architecture-wireshark-plantuml-and-a-repl/
The transformation is done in Clojure. One file handler.clj of few hundreds lines, Swagger UI and Grafana source API included. Indeed, for Grafana to get access to the traces metadata, such as service name..., an API compatible with Grafana simple json datasource is needed.
The traces are generated by an extension of akka.net (akka.net is an actor framework for .NET derived from akka in java) that allows to easily trace all messages exchanged by actors: akka.opentracing. The example is generating fake requests every 5 seconds that trigger jobs, sub processes, storage operations... No need to know the details, the idea is to discover and understand it from the traces.
git clone https://github.com/alexvaut/OpenTracingDiagram.git
cd OpenTracingDiagram
docker-compose up
Wait a bit for all the containers to be up and then:
- Browse to http://localhost:16686 to see traces in jeager.
- Browse to http://localhost:3000 to see the service transformation API in a Swagger UI (allow a couple of seconds for the Clojure Ring server to start).
- Browse to http://localhost:3001/d/mddcLWmWk/sequence-diagram-from-traces to see the grafana dashboard where sequence diagrams are displayed. Click on refresh (top-right of the grafana screen) to display a new sequence diagram: jaeger is returning random traces when filtering them, by default only one trace is displayed, you can increase this limit on the dashboard, it's a grafana variable.
A sequence diagram rendered in Grafana
The information recorded for each span
One trace made of several spans in jaeger
Same trace rendered as a sequence diagram
From docker-compose to a diagram sequence in grafana
Many ideas can emerge from this work, some:
- Provide more inputs (like jaeger UI) and link the 2 UIs.
- Improve diagrams (mermaidjs is quite limited).
- Cluster traces to extract the most common sequence diagrams.
- Build dependency diagrams between components (kind of available in Jaeger already).
- Use metrics (from prometheus for instance) on components to (this is where the merge of OpenCensus and OpenTracing should help):
- Focus the architecture description on heavily used components, long processing...
- Color/Format messages, components.