Amazon Managed Service for Prometheus (AMP) is a Prometheus-compatible monitoring and alerting service that makes it easy to monitor containerized applications and infrastructure at scale. Amazon Managed Grafana (AMG) is a fully managed service for open source Grafana developed in collaboration with Grafana Labs. Grafana is a popular open source analytics platform that enables you to query, visualize, alert on and understand your metrics no matter where they are stored. AWS Distro for OpenTelemetry (OTEL) is a secure, production-ready, AWS-supported distribution of the OpenTelemetry project. Part of the Cloud Native Computing Foundation, OpenTelemetry provides open source APIs, libraries, and agents to collect distributed traces and metrics for application monitoring.
In this ECS Solution Blueprint, we will use AWS OpenTelemetry agent to collect both custom application metrics and infrastructure metrics (CPU, memory, etc.), send them to Amazon Managed Prometheus, and visualize them using Amazon Managed Grafana. This solution is based on the Getting Started Guide for OTEL and AMP for ECS.
Follow the AMP and AMG documentation to setup prometheus and grafana workspace respectively. The AMP workspace creation is a simple one step process. The AMG workspace requires a mechanism to authenticate users for accessing Grafana dashboard. You can set this up using AWS SSO or SAML based federated authentication.
- Deploy the core-infra. Note if you have already deployed the infra then you can reuse it as well.
- In this folder, copy the
terraform.tfvars.example
file toterraform.tfvars
and update the variables. - NOTE: Codestar notification rules require a one-time creation of a service-linked role. Please verify one exists or create the codestar-notification service-linked role.
-
aws iam get-role --role-name AWSServiceRoleForCodeStarNotifications
An error occurred (NoSuchEntity) when calling the GetRole operation: The role with name AWSServiceRoleForCodeStarNotifications cannot be found.
-
If you receive the error above, please create the service-linked role with the
aws cli
below. -
aws iam create-service-linked-role --aws-service-name codestar-notifications.amazonaws.com
-
Again, once this is created, you will not have to complete these steps for the other examples.
-
- From the previously created AMP workspace, copy the remote-write endpoint. It will have this form,
https://aps-workspaces.<region>.amazonaws.com/workspaces/<workspace-id>/api/v1/remote_write
. - Open the
ecs-adot-config.yaml
in this folder, and change theAMP_REMOTE_WRITE_ENDPOINT
with above URL and change the REGION to region of AMP workspace. - Now you can deploy this blueprint
terraform init
terraform plan
terraform apply -auto-approve
Access your AMG workspace that was setup in the prerequisites. In the left panel, find AWS icon, select Data sources
, select Amazon Managed Servive for Prometheus
and find the AMP workspace created above by using appropriate region and account.
In the left panel, find the +
icon and create a Dashboard
. In the create dashboard, select Add a new panel
. From metrics browser select ecs_task_cpu_utilized
, you should see the metrics from the task created above.
The following are important aspects to note in the solution:
- The OpenTelemetry agent is running as sidecar along with the application container. They are both part of the same Fargate task.
- The OpenTelemetry agent configuration has two aspects
- The configuration YAML which has all the details for scraping and sending the metrics including the AMP workspace and region that was set above. This YAML is stored in the AWS Systems Manager Parameter Store (
adot_config_ssm_parameter = "otel-collector-config"
). When OpenTelemetry container is started, ECS fetches the value from the parameter store and assigns to the environment variableAOT_CONFIG_CONTENT
(defined in themap_secrets
input variable interraform.tfvars
). - The application container name and port to scrape for custom metrics are provided in the
map_environment={"PROMETHEUS_SAMPLE_APP":"prometheus-sample-app:8080"}
input variable instructing OpenTelemetry agent to scrape from port 8080 for theprometheus-sample-app
container. - The OpenTelemetry agent gets the infrastructure metrics using the built-in AWS ECS Container Metrics Receiver.
- The configuration YAML which has all the details for scraping and sending the metrics including the AMP workspace and region that was set above. This YAML is stored in the AWS Systems Manager Parameter Store (