diff --git a/Alaz-Architecture.md b/ARCHITECTURE.md similarity index 60% rename from Alaz-Architecture.md rename to ARCHITECTURE.md index e340aa9..a5abeb2 100644 --- a/Alaz-Architecture.md +++ b/ARCHITECTURE.md @@ -1,53 +1,71 @@ # Alaz Architecture + + + +- [1. Kubernetes Client](#1-kubernetes-client) +- [2. Container Runtimes (`containerd`)](#2-container-runtimes-containerd) +- [3. eBPF Programs](#3-ebpf-programs) + - [Note](#note) +- [How to Build](#how-to-build) +- [How to Deploy](#how-to-deploy) + + + Alaz is designed to run in a kubernetes cluster as an agent, deployed as Daemonset (runs on each cluster node separately). What it does is to watch and pull data from cluster to gain visibility onto the cluster. It gathers information from 3 different sources: - -## 1- Kubernetes Client + +## 1. Kubernetes Client + Using kubernetes client, it polls different type of events related to kubernetes resources. Like **ADD, UPDATE, DELETE** events for any kind of K8s resources like **Pods,Deployments,Services** etc. - Packages used: -- `k8s.io/api/core/v1` -- `k8s.io/apimachinery/pkg/util/runtime` -- `k8s.io/client-go` +We use the following packages: + +- `k8s.io/api/core/v1` +- `k8s.io/apimachinery/pkg/util/runtime` +- `k8s.io/client-go` + +## 2. Container Runtimes (`containerd`) -## 2- Container Runtimes (containerd) There are different types of container runtimes available for K8s clusters like containerd, crio, docker etc. By connecting to chosen container runtimes socket, Alaz is able to gather more detailed information on containers running on nodes. + - log directory of the container, - information related to its sandbox, - pid, - cgroups - environment variables -- ... +- etc. > We do not take into consideration container runtimes data, we do not need it for todays objectives. Will be used later on for collecting more detailed data. -## 3- eBPF Programs +## 3. eBPF Programs -In Alaz's eBPF directory there are a couple of **eBPF programs written in C using libbpf**. +In Alaz's eBPF directory there are a couple of eBPF programs written in C using libbpf. -In order to compile these programs, we have a **eBPF-builder image** that contains necessary dependencies installed like **clang, llvm, libbpf and go**. +In order to compile these programs, we have a **eBPF-builder image** that contains necessary dependencies installed like clang, llvm, libbpf and go. -eBPF programs are compiled in mentioned container, leveraging [Cilium bpf2go package](https://github.com/cilium/ebpf/tree/main/cmd/bpf2go). +> eBPF programs are compiled in mentioned container, leveraging [Cilium bpf2go package](https://github.com/cilium/ebpf/tree/main/cmd/bpf2go). -Using go generate directive with `bpf2go`, it compiles the eBPF program and generated necessary helper files in go in order us to interact with eBPF programs. +Using go generate directive with `bpf2go`, it compiles the eBPF program and generated necessary helper files in go in order us to interact with eBPF programs. -- Link the program to a tracepoint or a kprobe. +- Link the program to a tracepoint or a kprobe. - Read bpf maps from user space and pass them for sense-making of data. -Used packages from cilium are : - - `github.com/cilium/eBPF/link` - - `github.com/cilium/eBPF/perf` - - `github.com/cilium/eBPF/rlimit` +Used packages from cilium are: - eBPF programs: - - `tcp_state` : Detects newly established, closed, and listened TCP connections. The number of sockets associated with the program's PID depends on the remote IP address. Keeping this data together with the file descriptor is useful. - - `l7_req` : Monitors both incoming and outgoing payloads by tracking the write,read syscalls and uprobes. Then use `tcp_state` to aggregate the data we receive, allowing us to determine who sent which request to where. - -Current programs are generally attached to kernel tracepoints like: +- `github.com/cilium/eBPF/link` +- `github.com/cilium/eBPF/perf` +- `github.com/cilium/eBPF/rlimit` + +eBPF programs: + +- `tcp_state` : Detects newly established, closed, and listened TCP connections. The number of sockets associated with the program's PID depends on the remote IP address. Keeping this data together with the file descriptor is useful. +- `l7_req` : Monitors both incoming and outgoing payloads by tracking the write,read syscalls and uprobes. Then use `tcp_state` to aggregate the data we receive, allowing us to determine who sent which request to where. + +Current programs are generally attached to kernel tracepoints like: ``` tracepoint/syscalls/sys_enter_write (l7_req) @@ -64,6 +82,7 @@ tracepoint/syscalls/sys_exit_connect (tcp_state) ``` uprobes: + ``` SSL_write SSL_read @@ -71,18 +90,22 @@ crypto/tls.(*Conn).Write crypto/tls.(*Conn).Read ``` -#### Note: -Uretprobes crashes go applications. (https://github.com/iovisor/bcc/issues/1320) +### Note + +Uretprobes crashes go applications. See + That's why we disassemble the executable and find return instructions addresses and attach classic uprobes on them as a workaround. -## How to Build Alaz +## How to Build + Alaz embeds compiled eBPF programs in it. After compilation process on eBPF-builder is done, compiled programs are located in project structure. -Using **//go:embed** directive of golang. We embed *.o* files and load them into kernel using [Cilium eBPF package](https://github.com/cilium/eBPF). +Using **//go:embed** directive of golang. We embed _.o_ files and load them into kernel using [Cilium eBPF package](https://github.com/cilium/eBPF). Then we build Alaz like a ordinary golang app more or less since compiled codes are embedded. -#### How to Deploy Alaz +## How to Deploy + Deployed as a privileged DaemonSet resource on the cluster. Alaz is required to run as a privileged container since it needs read access to `/proc` directory of the host machine. And Alaz's `serviceAccount` must be should be associated with `ClusterRole` and `ClusterRoleBinding` resources in order to be able to talk with K8s server. diff --git a/README.md b/README.md index 3fcdee2..92a6986 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,13 @@ - -

Alaz - Anteon (Formerly Ddosify) eBPF Agent for Kubernetes Monitoring

- -

- alaz license - Anteon discord server - alaz docker image -

+

Alaz - Anteon eBPF Agent for Kubernetes Monitoring

Anteon Kubernetes Monitoring Service Map -Anteon automatically generates Service Map of your K8s cluster without code instrumentation or sidecars with eBPF Agent Alaz. So you can easily find the bottlenecks in your system. Red lines indicate the high latency between services. +alaz license +Anteon discord server +alaz docker image + +Anteon (formerly Ddosify) automatically generates Service Map of your K8s cluster without code instrumentation or sidecars with eBPF Agent Alaz. So you can easily find the bottlenecks in your system. Red lines indicate the high latency between services. +

@@ -19,62 +17,89 @@ Discord

+
+ Table of Contents + + + +- [What is Alaz?](#what-is-alaz) +- [Features](#features) +- [๐Ÿš€ Getting Started](#-getting-started) + - [โ˜๏ธ For Anteon Cloud](#-for-anteon-cloud) + - [Using the kubectl](#using-the-kubectl) + - [Using the Helm](#using-the-helm) + - [๐Ÿ  For Anteon Self-Hosted](#-for-anteon-self-hosted) + - [Using the kubectl](#using-the-kubectl-1) + - [Using the Helm](#using-the-helm-1) +- [๐Ÿงน Cleanup](#-cleanup) +- [Supported Protocols](#supported-protocols) +- [Limitations](#limitations) + - [Encryption Libraries](#encryption-libraries) +- [Contributing](#contributing) +- [Communication](#communication) +- [License](#license) + + ## What is Alaz? -[Alaz](https://github.com/getanteon/alaz) is an open-source Anteon eBPF agent that can inspect and collect Kubernetes (K8s) service traffic without the need for code instrumentation, sidecars, or service restarts. This is possible due to its use of eBPF technology. +[**Alaz**](https://github.com/getanteon/alaz) is an open-source Anteon eBPF agent that can inspect and collect Kubernetes (K8s) service traffic without the need for code instrumentation, sidecars, or service restarts. This is possible due to its use of eBPF technology. Alaz can create a **Service Map** that helps identify golden signals and problems like: + - High latencies between K8s services -- Detect 5xx HTTP status codes +- Detect 5xx HTTP status codes - Detect Idle / Zombie services - Detect slow SQL queries -Additionally, Anteon tracks and displays live data on your cluster instances CPU, memory, disk, and network usage. All of the dashboards are generated out-of-box and you can create alerts based on these metrics values. Check out the [docs](https://getanteon.com/docs/) for more. +Additionally, Anteon tracks and displays live data on your cluster instances CPU, memory, disk, and network usage. All of the dashboards are generated out-of-box and you can create alerts based on these metrics values. Check out the [documentation](https://getanteon.com/docs/) for more.

Anteon Kubernetes Monitoring Metrics Anteon tracks and displays live data on your cluster instances CPU, memory, disk, and network usage.

- -โžก๏ธ For more information about Anteon, see [Anteon](https://github.com/getanteon/anteon). +โžก๏ธ See [Anteon repository](https://github.com/getanteon/anteon) for more information. ## Features -โœ… **Low-Overhead:** +โœ… **Low-Overhead** Inspect and collect K8s service traffic without the need for code instrumentation, sidecars, or service restarts. -โœ… **Effortless:** +โœ… **Effortless** Anteon will create the Service Map & Metrics Dashboard that helps identify golden signals and issues such as high latencies, 5xx errors, zombie services. -โœ… **Prometheus Compatible:** +โœ… **Prometheus Compatible** Gather system information and resources via the Prometheus Node Exporter, which is readily available on the agent. -โœ… **Cloud or On-premise:** +โœ… **Cloud or On-premise** + +Export metrics to [Anteon Cloud](https://getanteon.com), or install the [Anteon Self-Hosted](https://getanteon.com/docs/self-hosted/) in your infrastructure and manage everything according to your needs. + +โœ… **Test & Observe** -Export metrics to [Anteon Cloud](https://getanteon.com), or install the [Anteon Self-Hosted](https://github.com/getanteon/anteon/tree/master/selfhosted) in your infrastructure and manage everything according to your needs. +Anteon Performance Testing and Alaz can work collaboratively. You can start a load test and monitor your system simultaneously. This will help you spot performance issues instantly. Check out the [Anteon documentation](https://getanteon.com/docs) for more information about Anteon Stack. -โœ… **Test & Observe:** +โœ… **Alerts for Anomalies** -Anteon Performance Testing and Alaz can work collaboratively. You can start a load test and monitor your system simultaneously. This will help you spot performance issues instantly. Check out the [Anteon GitHub Repository](https://github.com/getanteon/anteon) for more information about Anteon Stack. +If something unusual, like a sudden increase in CPU usage, happens in your Kubernetes (K8s) cluster, Anteon immediately sends alerts to your Slack. -โœ… **Alerts for Anomalies:** If something unusual, like a sudden increase in CPU usage, happens in your Kubernetes (K8s) cluster, Anteon immediately sends alerts to your Slack. +โœ… **Platform Support** -โœ… Works on both Arm64 and x86_64 architectures. +Works on both Arm64 and x86_64 architectures. -## Getting Started +## ๐Ÿš€ Getting Started -To use Alaz, you need to have a [Anteon Cloud](https://app.getanteon.com/register) account or [Anteon Self-Hosted](https://github.com/getanteon/anteon/tree/master/selfhosted) installed. +To use Alaz, you need to have a [Anteon Cloud](https://app.getanteon.com/register) account or [Anteon Self-Hosted](https://github.com/getanteon/anteon) installed. ### โ˜๏ธ For Anteon Cloud 1. Register for a [Anteon Cloud account](https://app.getanteon.com/register). 2. Add a cluster on the [Observability page](https://app.getanteon.com/clusters). You will receive a Monitoring ID and instructions. -3. Run the agent on your Kubernetes cluster using the instructions you received. There are two options for Kubernetes deployment: +3. Run the agent on your Kubernetes cluster using the instructions you received. There are two options for Kubernetes deployment: #### Using the kubectl @@ -102,11 +127,11 @@ Then you can view the metrics and Kubernetes Service Map on the [Anteon Observab ### ๐Ÿ  For Anteon Self-Hosted -1. Install [Anteon Self-Hosted](https://github.com/getanteon/anteon/tree/master/selfhosted) +1. Install [Anteon Self-Hosted](https://getanteon.com/docs/self-hosted) 2. Add a cluster on the Observability page of your Self-Hosted frontend. You will receive a Monitoring ID and instructions. -3. Run the agent on your Kubernetes cluster using the instructions you received. +3. Run the agent on your Kubernetes cluster using the instructions you received. -Note: After you install Anteon Self-Hosted, you will have a Anteon Self-Hosted endpoint of nginx reverse proxy. The base URL of the Anteon Self-Hosted endpoint forwards traffic to the frontend. The base URL of the Anteon Self-Hosted endpoint with `/api` suffix forwards traffic to the backend. So you need to set the backend host variable as `http:///api`. +Note: After you install Anteon Self-Hosted, you will have a Anteon Self-Hosted endpoint of Nginx reverse proxy. The base URL of the Anteon Self-Hosted endpoint forwards traffic to the frontend. The base URL of the Anteon Self-Hosted endpoint with `/api` suffix forwards traffic to the backend. So you need to set the backend host variable as `http:///api`. There are two options for Kubernetes deployment: @@ -139,19 +164,19 @@ helm upgrade --install --namespace anteon alaz anteon/alaz --set monitoringID=$M Then you can view the metrics and Kubernetes Service Map on the Anteon Self-Hosted Observability dashboard. For more information, see [Anteon Monitoring Docs](https://getanteon.com/docs/kubernetes-monitoring/). -Alaz runs as a DaemonSet on your Kubernetes cluster. It collects metrics and sends them to Anteon Cloud or Anteon Self-Hosted. You can view the metrics on the Anteon Observability dashboard. For the detailed Alaz architecture, see [Alaz Architecture](https://github.com/getanteon/alaz/blob/master/Alaz-Architecture.md). +Alaz runs as a DaemonSet on your Kubernetes cluster. It collects metrics and sends them to Anteon Cloud or Anteon Self-Hosted. You can view the metrics on the Anteon Observability dashboard. For the detailed Alaz architecture, see [Alaz Architecture](https://github.com/getanteon/alaz/blob/master/ARCHITECTURE.md). -## Cleanup +## ๐Ÿงน Cleanup To remove Alaz from your Kubernetes cluster, run the following command: -- For Kubectl +- For Kubectl: ```bash kubectl delete -f https://raw.githubusercontent.com/getanteon/alaz/master/resources/alaz.yaml ``` -- For Helm +- For Helm: ```bash helm delete alaz --namespace anteon @@ -172,7 +197,7 @@ Alaz supports the following protocols: - MySQL - MongoDB -Other protocols will be supported soon. If you have a specific protocol you would like to see supported, please open an issue. +Other protocols will be supported soon. If you have a specific protocol you would like to see supported, please [open an issue](https://github.com/getanteon/alaz/issues/new). ## Limitations @@ -182,19 +207,21 @@ In the future, we plan to support Docker containers. Alaz is an eBPF application that uses [CO-RE](https://github.com/libbpf/libbpf#bpf-co-re-compile-once--run-everywhere). Most of the latest linux distributions support CO-RE. In order to CO-RE to work, the kernel has to be built with BTF(bpf type format) information. -You can check your kernel version with `uname -r` +You can check your kernel version with `uname -r` command and whether btf is enabled by default or not at the [btfhub](https://github.com/aquasecurity/btfhub/blob/main/docs/supported-distros.md). -For the time being, we expect that btf information is readily available on your system. We'll support all kernels in the upcoming weeks leveraging [btfhub](https://github.com/aquasecurity/btfhub). +For the time being, we expect that btf information is readily available on your system. We will support all kernels in the upcoming weeks leveraging [btfhub](https://github.com/aquasecurity/btfhub). ### Encryption Libraries + These are the libraries that alaz hooks into for capturing encrypted traffic. + - [crypto/tls](https://pkg.go.dev/crypto/tls): -In order to Alaz to capture tls requests in your Go applications, your go version must be **1.17+** and your executable must include debug info. + In order to Alaz to capture tls requests in your Go applications, your go version must be **1.17+** and your executable must include debug info. - [OpenSSL](https://www.openssl.org/): -OpenSSL shared objects that is dynamically linked into your executable is supported. -Supported versions : **1.0.2**, **1.1.1** and **3.*** + OpenSSL shared objects that is dynamically linked into your executable is supported. + Supported versions : **1.0.2**, **1.1.1** and **3.\*** ## Contributing @@ -202,15 +229,14 @@ Contributions to Alaz are welcome! To contribute, please follow these steps: 1. Fork the repository 2. Create a new branch: `git checkout -b my-branch` -3. Make your changes and commit them: `git commit -am 'Add some feature'` +3. Make your changes and commit them: `git commit -am "Add some feature"` 4. Push to the branch: `git push origin my-branch` -5. Submit a pull request +5. Submit a pull request. ## Communication -You can join our [Discord Server](https://discord.com/invite/9KdnrSUZQg) for issues, feature requests, feedbacks or anything else. +You can join our [Discord Server](https://discord.com/invite/9KdnrSUZQg) for issues, feature requests, feedbacks or anything else. ## License -Alaz is licensed under the AGPLv3: https://www.gnu.org/licenses/agpl-3.0.html - +Alaz is licensed under the [AGPLv3](LICENSE) diff --git a/aggregator/data.go b/aggregator/data.go index 7191b53..a1ab5af 100644 --- a/aggregator/data.go +++ b/aggregator/data.go @@ -1251,7 +1251,6 @@ func (a *Aggregator) processHttpEvent(ctx context.Context, d *l7_req.L7Event) { func (a *Aggregator) processMongoEvent(ctx context.Context, d *l7_req.L7Event) { query, err := a.parseMongoEvent(d) if err != nil { - log.Logger.Error().AnErr("err", err) return } addrPair := extractAddressPair(d) @@ -1278,6 +1277,7 @@ func (a *Aggregator) processMongoEvent(ctx context.Context, d *l7_req.L7Event) { return } + log.Logger.Debug().Str("path", reqDto.Path).Msg("processmongoEvent persisting") err = a.ds.PersistRequest(reqDto) if err != nil { log.Logger.Error().Err(err).Msg("error persisting request") @@ -1555,6 +1555,9 @@ func (a *Aggregator) parsePostgresCommand(d *l7_req.L7Event) (string, error) { return sqlCommand, nil } +var MongoOpCompressed uint32 = 2012 +var MongoOpMsg uint32 = 2013 + func (a *Aggregator) parseMongoEvent(d *l7_req.L7Event) (string, error) { defer func() { if r := recover(); r != nil { @@ -1565,41 +1568,49 @@ func (a *Aggregator) parseMongoEvent(d *l7_req.L7Event) (string, error) { payload := d.Payload[:d.PayloadSize] - // cut mongo header, 4 bytes MessageLength, 4 bytes RequestID, 4 bytes ResponseTo, 4 bytes Opcode, 4 bytes MessageFlags - payload = payload[20:] - - kind := payload[0] - payload = payload[1:] // cut kind - if kind == 0 { // body - docLenBytes := payload[:4] // document length - docLen := binary.LittleEndian.Uint32(docLenBytes) - payload = payload[4:docLen] // cut docLen - // parse Element - type_ := payload[0] // 2 means string - if type_ != 2 { - return "", fmt.Errorf("document element not a string") - } - payload = payload[1:] // cut type + // cut mongo header, 4 bytes MessageLength, 4 bytes RequestID, 4 bytes ResponseTo + payload = payload[12:] + // cut 4 bytes Opcode, 4 bytes MessageFlags + opcode := payload[:4] + payload = payload[8:] + + opcodeInt := binary.LittleEndian.Uint32(opcode) + + if opcodeInt == MongoOpCompressed { + return "compressed mongo event", nil + } else if opcodeInt == MongoOpMsg { + kind := payload[0] + payload = payload[1:] // cut kind + if kind == 0 { // body + docLenBytes := payload[:4] // document length + docLen := binary.LittleEndian.Uint32(docLenBytes) + payload = payload[4:docLen] // cut docLen + // parse Element + type_ := payload[0] // 2 means string + if type_ != 2 { + return "", fmt.Errorf("document element not a string") + } + payload = payload[1:] // cut type - // read until NULL - element := []uint8{} - for _, p := range payload { - if p == 0 { - break + // read until NULL + element := []uint8{} + for _, p := range payload { + if p == 0 { + break + } + element = append(element, p) } - element = append(element, p) - } - // 1 byte NULL, 4 bytes len - elementLenBytes := payload[len(element)+1 : len(element)+1+4] - elementLength := binary.LittleEndian.Uint32(elementLenBytes) + // 1 byte NULL, 4 bytes len + elementLenBytes := payload[len(element)+1 : len(element)+1+4] + elementLength := binary.LittleEndian.Uint32(elementLenBytes) - payload = payload[len(element)+5:] // cut element + null + len - elementValue := payload[:elementLength-1] // myCollection, last byte is null + payload = payload[len(element)+5:] // cut element + null + len + elementValue := payload[:elementLength-1] // myCollection, last byte is null - result := fmt.Sprintf("%s %s", string(element), string(elementValue)) - log.Logger.Debug().Str("result", result).Msg("mongo-elem-result") - return result, nil + result := fmt.Sprintf("%s %s", string(element), string(elementValue)) + return result, nil + } } return "", fmt.Errorf("could not parse mongo event") diff --git a/ebpf/collector.go b/ebpf/collector.go index 58913ee..636759d 100644 --- a/ebpf/collector.go +++ b/ebpf/collector.go @@ -243,18 +243,23 @@ func (e *EbpfCollector) close() { for pid := range e.sslWriteUprobes { e.sslWriteUprobes[pid].Close() } + log.Logger.Info().Msg("closed sslWriteUprobes") for pid := range e.sslReadEnterUprobes { e.sslReadEnterUprobes[pid].Close() } + log.Logger.Info().Msg("closed sslReadEnterUprobes") for pid := range e.sslReadURetprobes { e.sslReadURetprobes[pid].Close() } + log.Logger.Info().Msg("closed sslReadURetprobes") for pid := range e.goTlsWriteUprobes { e.goTlsWriteUprobes[pid].Close() } + log.Logger.Info().Msg("closed goTlsWriteUprobes") for pid := range e.goTlsReadUprobes { e.goTlsReadUprobes[pid].Close() } + log.Logger.Info().Msg("closed goTlsReadUprobes") for pid := range e.goTlsReadUretprobes { for _, l := range e.goTlsReadUretprobes[pid] { l.Close()