Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

component: prometheus: enable/disable http jobs like rules #316

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ykaliuta
Copy link

Jira: https://issues.redhat.com/browse/RHOAIENG-87

Some jobs scrape http endpoint /probe (versus those which use kubernetes_sd_configs for example) and they start collect metrics as soon as they are configured. While enabled component's rules are deployed when deployement is available, it does not help since they fetch stale metrics from jobs.

Configure them in the prometheus.yml file (field in mounted ConfigMap) similar way as rules_files are configured: put into separate data fields and substitute array of scrape_configs in prometheusContent unmarshaled map.

Description

How Has This Been Tested?

Screenshot or short clip

Merge criteria

  • You have read the contributors guide.
  • Commit messages are meaningful - have a clear and concise summary and detailed explanation of what was changed and why.
  • Pull Request contains a description of the solution, a link to the JIRA issue, and to any dependent or related Pull Request.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Jira: https://issues.redhat.com/browse/RHOAIENG-87

Some jobs scrape http endpoint /probe (versus those which use
kubernetes_sd_configs for example) and they start collect metrics as
soon as they are configured. While enabled component's rules are
deployed when deployement is available, it does not help since they
fetch stale metrics from jobs.

Configure them in the prometheus.yml file (field in mounted
ConfigMap) similar way as `rules_files` are configured: put into
separate data fields and substitute array of `scrape_configs` in
prometheusContent unmarshaled map.

Signed-off-by: Yauheni Kaliuta <[email protected]>
Copy link

openshift-ci bot commented Jul 31, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ykaliuta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

openshift-ci bot commented Jul 31, 2024

@ykaliuta: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/rhods-operator-e2e aa23e6f link true /test rhods-operator-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ykaliuta ykaliuta marked this pull request as draft July 31, 2024 12:57
@ykaliuta ykaliuta requested a review from asanzgom July 31, 2024 12:59
@@ -97,6 +97,73 @@ func (c *Component) ConfigComponentLogger(logger logr.Logger, component string,
return logger.WithName("DSC.Components." + component)
}

func getJobName(job map[any]any) (string, bool) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have not touched the detail of this PR, but should component.go change get into ODH first?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have not touched the detail of this PR, but should component.go change get into ODH first?

100%

It's a branch to test it first.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that in downstream they anyway should come as one patch (upstream changes do not break anything there, no problem), but looks like having the code without prometheus changes even if does not work as expected, does not break existing setup.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is similar situation to MR's PR.
we can use this one to test first and backport to ODH as long as we have the logic applied to both upstream and downstream.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll split it for the real submission, it's fine.

return name, true
}

func getJobIdx(scrapeConfigs *[]any, jobName string) (int, bool) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am thinking about this function:
does "bool" really needed? if it is to check get the index in order for the "removal" work in updateJob()
then either it return 0 (which is not found in the jobmap) or a non-0 index.
e.g "scrape_configs element is not array" and the final return, are treated the same.

so

idx, exists := getJobIdx(&scrapeConfigs, name)
	switch {
	case enable && !exists:
		scrapeConfigs = append(scrapeConfigs, job)
	case !enable && exists:
		scrapeConfigs = append(scrapeConfigs[:idx], scrapeConfigs[idx+1:]...)
	default:
		return
	}

can be

idx := getJobIdx(&scrapeConfigs, name)
if idx != 0 {
  if enable {
     (*prometheusContent)["scrape_configs"] = append(scrapeConfigs, job)
  } else {
   (*prometheusContent)["scrape_configs"] = append(scrapeConfigs[:idx], scrapeConfigs[idx+1:]...)
  }


right?
}

Copy link
Author

@ykaliuta ykaliuta Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's split it into 2 parts.

  1. special idx value for not found vs flag.
    0 is a valid index, but it's possible to use negative value for example. But I found it is very much C-style (I use it in C all the time) where in Go idiomatically use a flag and/or error value.

  2. Code refactoring.
    2.1) it should not append if the job already exists, so it must be still a different branch. At least for rules is makes the check to work enable when it's already enabled.
    2.2) although it can be written as

if enable && idx < 0 {
     (*prometheusContent)["scrape_configs"] = append(scrapeConfigs, job)
} else if !enable && idx >= 0 {
   (*prometheusContent)["scrape_configs"] = append(scrapeConfigs[:idx], scrapeConfigs[idx+1:]...)
}

and it's a matter of taste. I found preparing the list and then update it in one place nicer.

@zdtsw
Copy link

zdtsw commented Sep 20, 2024

do we still need this PR?

@ykaliuta
Copy link
Author

do we still need this PR?

yes. Nothing changed on the topic. Problem should persist and the solution still not verified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants