-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Muted alerts are supressed and add "mutedBy" to APIv2 #3520
Muted alerts are supressed and add "mutedBy" to APIv2 #3520
Conversation
641df46
to
dc56210
Compare
Should I also update apiv1? I'm not sure if that's in maintenance mode or still receiving features? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation-wise, this LGTM - I skimmed through the API changes and they look fine but I decided to put an emphasis in what we're doing as part of the notifier package.
Let's try to address that first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're almost there!
I chatted with George offline for a bit, and it turns out this doesn't do what he thought it did. It suffers from a "race" where an alert from one route can override the status from the same alert going through another way. Because time intervals are at a route level, the filtering must show on which route was the time interval active so that underlying data structure needs to change to accommodate that requirement. |
This commit updates TimeMuteStage to mark muted alerts as suppressed, and adds the names of all mutes that mute the alert to a field called mutedBy. This field is also returned in APIv2. Signed-off-by: George Robinson <[email protected]>
Signed-off-by: George Robinson <[email protected]>
ca33ca9
to
a31b20c
Compare
I've decided to make this a draft for the time being as there is a lot of work still to do and there are still some decisions to make about how to best add this feature. |
Signed-off-by: George Robinson <[email protected]>
a31b20c
to
cbeca0c
Compare
Signed-off-by: George Robinson <[email protected]>
State AlertState `json:"state"` | ||
SilencedBy []string `json:"silencedBy"` | ||
InhibitedBy []string `json:"inhibitedBy"` | ||
MutedBy map[string][]string `json:"mutedBy"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, changed from []string
to map[string][]string
. The key I have used is the group key. This unfortunately means we are storing the names of the active and mute time intervals redundantly for each alert group, instead of just once per route. However, since routes do not have names I'm not sure what we can do to fix this.
// intervals are provided and both InhibitedBy and SilencedBy are empty, | ||
// it sets the provided alert to AlertStateActive. Otherwise, it sets the | ||
// provided alert to AlertStateSuppressed. | ||
SetMuted(groupKey string, alert model.Fingerprint, intervalNames ...string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I have added groupKey
to SetMuted
as active and mute time intervals are marked for each aggregation group individually, unlike inhibitions and silences.
@@ -381,9 +424,18 @@ type Muter interface { | |||
Mutes(model.LabelSet) bool | |||
} | |||
|
|||
// TimeMuter determines if alerts should be muted based on the specified current time and active time interval on the route. | |||
// TimeMuter determines if alerts should be muted or not for active and mute | |||
// time intervals based on the specified current time. | |||
type TimeMuter interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where I made changes to the interface based on the discussion in #3562. I'm not super happy with how this interface has turned out though as there are now two methods Active
and Mutes
which look like inverses of each other when they are not.
For example, it looks like if I pass a series of mute time intervals to Active
and it returns false, then passing the same mute time intervals to Mute
should return true. That is, Active
and Mute
are negations of each other. However, in this case both will instead return true, as Active
is only supposed to be used with active time intervals and Mute
is only supposed to be used with mute time intervals. I have attempted to document their behavior as such, but I'm still not happy with it.
if err != nil { | ||
return false, err | ||
} | ||
for _, a := range alerts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The marking here is incorrect and will need to be fixed if we agree to continue having both Active
and Mute
.
Hey team! thank you! |
I'm closing this to start over, as it's been almost 6 months since we last worked on it. There are some conflicts with main that I need to fix, and I am considering doing something a little different this time. |
This commit updates TimeMuteStage to mark muted alerts as suppressed, and adds the names of all mutes that mute the alert to a field called mutedBy. This field is also returned in APIv2.
Fixes #3513