Golden metrics are the most important metrics for a specific entity type.
We allow a maximum of 10 metrics, although we recommend no more than 3.
Golden metrics should be defined under the file name golden_metrics.yml
.
They're defined in a map with a unique key, which defines the intention of the metric:
- We allow the
[a-zA-Z0-9_]
characters, with a maximum of 100 characters. - Provide a
title
, with a brief explanation of the query. - Provide a
unit
, which helps the UI make unit conversions when required.
For example, a query result of 0.003 seconds
will most probably be converted into 3 miliseconds
.
memoryUsage:
title: "A title explaining what the user is seeing"
unit: COUNT
queries:
newRelic:
select: average(host.memoryUsagePercent)
from: Metric
where: ""
facet: ""
eventId: entity.guid
eventName: entity.name
displayAsValue: false
All the fields, except title
, unit
and query.select
, are optional.
The previous example shows the default values for each configuration option, so it's equivalent to this:
memoryUsage:
title: "A title explaining what the user is seeing"
unit: COUNT
queries:
newRelic:
select: average(host.memoryUsagePercent)
Name | Mandatory | Default | Description |
---|---|---|---|
title | Yes | Provide a meaningful title to the graph or value you are displaying. | |
displayAsValue | No | false |
Use this option if you want to display a value instead of a line of data (TIMESERIES ) when viewing the information of one entity. |
unit | Metric Unit | The unit of the metric, used to provide more context to the user. | |
queries | Yes | A map of queries where the key is the provider. |
When multiple sources of data exist, provide a query for each source. Otherwise use newRelic
as the key.
Name | Mandatory | Default | Description |
---|---|---|---|
select | Yes | Provide the field and function you want to display in the metric. You must only provide one field, but you can do aggregations, sums, etc. Always name the fields to make it easier to read: for example, sum((provider.httpCodeElb4XXCount.Sum OR 0) + (provider.httpCodeElb5XXCount.Sum OR 0)) AS 'Errors' |
|
from | No | Metric |
Choose where your metric gathers the information from. |
where | No | empty string | In the event you need a more granular WHERE clause added to the query, use this field. For example, provider='Alb' . |
facet | No | empty string | An extra facet by a specific field to be added to the default facet by entityName . |
eventId | No | entity.guid |
The event attribute used to filter the entity. We recommend to use the default entity.guid , which is generated automatically as part of the entity synthesis. |
eventName | No | entity.name |
The name of the field in the event that references the entity name. By default, entity.name , which is generated automatically as part of the entity synthesis. |
The unit of the metric must be a string with one of the following values:
- REQUESTS_PER_SECOND
- REQUESTS_PER_MINUTE
- PAGES_PER_SECOND
- MESSAGES_PER_SECOND
- OPERATIONS_PER_SECOND
- COUNT
- SECONDS
- MS
- PERCENTAGE
- BITS
- BYTES
- BITS_PER_SECOND
- BYTES_PER_SECOND
- HERTZ
- APDEX
- TIMESTAMP
- CELSIUS
When the entity type can be ingested from multiple sources, you'll be required to provide a different query implementation for each source.
In this example, the entity must have prometheus
and newRelic
in the instrumentation.provider
tag. The first tag value that matches the entity will be used to build the queries.
memoryUsage:
title: "A title explaining what the user is seeing (unit displayed in the dashboard)"
queries:
prometheus:
select: average(field)
from: PrometheusSample
newRelic:
select: average(nrField)
from: NewRelicSample
There's also the possibility to specify both provider and name in the form of {provider}/{name}
.
- Add the provider as a value of the
instrumentation.provider
tag. For example, provider:kentik
. - Add the name of the provider in the
instrumentation.name
tag. For example, provider name:netflow-events
.
Note that query semantics (such as average vs counts, units, etc.) should match in each implementation. If no rule matches, the first one on the list will be used. In the example above, prometheus
would be used.
destinations:
title: Unique Destinations
queries:
kentik/netflow-events:
select: uniqueCount(dst_addr)
from: KFlow
where: "provider = 'kentik-flow-device'"
Telemetry for Golden Metrics is evaluated while it is streaming through our ingest pipeline (before it is written to disk in NRDB). As such, it is not possible to support ever NRQL expression. The following is a breakdown of the expressions currently supported in creating queries for Golden Metrics:
Expression |
Notes |
---|---|
sum(x) min(x) max(x) average(x) count(x) |
Basic operation s on a value |
C * operation(x) C / operation(x) |
constant C != 0 |
C * sum(x) / count(y) C * count(x) / count(y) C * sum(x) / sum(y) filter( C * count(x), WHERE ...) / count(x) filter(count(x), WHERE ...) * C / count(x) |
Useful to calculate averages or percentages. x and y can be equals. |
op(x) + op(y) op(x) OR op(y) |
Only some operations are addable or 'or-able': sum , min , max , average |
sum(x) - sum(y) |
|
uniqueCount(x[, y...]) uniqueCount(tuple(x, y, ...)) |
There is support for a tuple with more than 1 value. Note: uniqueCount(x, y) == uniqueCount(tuple(x, y)) |
latest(x) ± C |
|
rate(op(x), 1 minute) |
|
(sum(x) ± sum(y)) / sum(z) |
|
percentile(x, 90) |
Although the percentile function supports more than one argument, the Golden Metrics pipeline only allows 1 argument. |
To provide more context around this concept; take this example:
latest(x) + latest(y) + latest(z)
This does not make sense in the context of analyzing streaming telemetry for 2 primary reasons:
- The processing pipeline is distributed. Meaning that a specific metric will almost assuredly be processed by different processors.
- To keep the pipeline simple, it lacks a central or distributed state. There's no way of maintaining information during a time window just for the sake of processing it by the end of the window.
In order to calculate latest
, a pipeline should:
- Keep all the observed datapoints for the different metrics in a central state (it needs state).
- At the end of a time window, it should aggregate in a common place (not be distributed).
- Once aggregated, the metric must be synthesized.
Another way to think about this is that a streaming pipeline cannot know the latest
of anything because it simply does not have any context about any other data point in this manner. Without an earliest
, there can be no latest
.