Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Muted alerts are supressed and add "mutedBy" to APIv2 #3520

Conversation

grobinson-grafana
Copy link
Contributor

This commit updates TimeMuteStage to mark muted alerts as suppressed, and adds the names of all mutes that mute the alert to a field called mutedBy. This field is also returned in APIv2.

Fixes #3513

@grobinson-grafana grobinson-grafana force-pushed the grobinson/add-muted-by-and-surppressed branch from 641df46 to dc56210 Compare September 13, 2023 08:26
@grobinson-grafana
Copy link
Contributor Author

Should I also update apiv1? I'm not sure if that's in maintenance mode or still receiving features?

Copy link
Member

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation-wise, this LGTM - I skimmed through the API changes and they look fine but I decided to put an emphasis in what we're doing as part of the notifier package.

Let's try to address that first.

types/types.go Outdated Show resolved Hide resolved
notify/notify.go Outdated Show resolved Hide resolved
Copy link
Member

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're almost there!

types/types.go Outdated Show resolved Hide resolved
api/v2/api.go Show resolved Hide resolved
notify/notify.go Outdated Show resolved Hide resolved
@gotjosh
Copy link
Member

gotjosh commented Oct 13, 2023

I chatted with George offline for a bit, and it turns out this doesn't do what he thought it did. It suffers from a "race" where an alert from one route can override the status from the same alert going through another way.

Because time intervals are at a route level, the filtering must show on which route was the time interval active so that underlying data structure needs to change to accommodate that requirement.

This commit updates TimeMuteStage to mark muted alerts as suppressed,
and adds the names of all mutes that mute the alert to a field called
mutedBy. This field is also returned in APIv2.

Signed-off-by: George Robinson <[email protected]>
Signed-off-by: George Robinson <[email protected]>
@grobinson-grafana
Copy link
Contributor Author

I've decided to make this a draft for the time being as there is a lot of work still to do and there are still some decisions to make about how to best add this feature.

@grobinson-grafana grobinson-grafana force-pushed the grobinson/add-muted-by-and-surppressed branch from a31b20c to cbeca0c Compare October 20, 2023 09:36
Signed-off-by: George Robinson <[email protected]>
State AlertState `json:"state"`
SilencedBy []string `json:"silencedBy"`
InhibitedBy []string `json:"inhibitedBy"`
MutedBy map[string][]string `json:"mutedBy"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, changed from []string to map[string][]string. The key I have used is the group key. This unfortunately means we are storing the names of the active and mute time intervals redundantly for each alert group, instead of just once per route. However, since routes do not have names I'm not sure what we can do to fix this.

// intervals are provided and both InhibitedBy and SilencedBy are empty,
// it sets the provided alert to AlertStateActive. Otherwise, it sets the
// provided alert to AlertStateSuppressed.
SetMuted(groupKey string, alert model.Fingerprint, intervalNames ...string)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I have added groupKey to SetMuted as active and mute time intervals are marked for each aggregation group individually, unlike inhibitions and silences.

@@ -381,9 +424,18 @@ type Muter interface {
Mutes(model.LabelSet) bool
}

// TimeMuter determines if alerts should be muted based on the specified current time and active time interval on the route.
// TimeMuter determines if alerts should be muted or not for active and mute
// time intervals based on the specified current time.
type TimeMuter interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I made changes to the interface based on the discussion in #3562. I'm not super happy with how this interface has turned out though as there are now two methods Active and Mutes which look like inverses of each other when they are not.

For example, it looks like if I pass a series of mute time intervals to Active and it returns false, then passing the same mute time intervals to Mute should return true. That is, Active and Mute are negations of each other. However, in this case both will instead return true, as Active is only supposed to be used with active time intervals and Mute is only supposed to be used with mute time intervals. I have attempted to document their behavior as such, but I'm still not happy with it.

if err != nil {
return false, err
}
for _, a := range alerts {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The marking here is incorrect and will need to be fixed if we agree to continue having both Active and Mute.

@jasuade
Copy link

jasuade commented Feb 19, 2024

Hey team!
Is there any progress on this task?

thank you!

@grobinson-grafana
Copy link
Contributor Author

I'm closing this to start over, as it's been almost 6 months since we last worked on it. There are some conflicts with main that I need to fix, and I am considering doing something a little different this time.

@grobinson-grafana grobinson-grafana deleted the grobinson/add-muted-by-and-surppressed branch April 16, 2024 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Muted alerts are not suppressed in API
3 participants