Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shift generated cluster e2e image validation jobs to config-forker/rotator #33927

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
923 changes: 0 additions & 923 deletions config/jobs/kubernetes/generated/generated.yaml

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# How to generate the k8sbeta job in this folder

When a release branch of kubernetes is first cut, the jobs defined in [`cloud_provider_image_validation.yaml`]
must be forked to use the new release branch. Use [`releng/config-forker`] to
accomplish this, eg:
Comment on lines +3 to +5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some wording improvements:

Suggested change
When a release branch of kubernetes is first cut, the jobs defined in [`cloud_provider_image_validation.yaml`]
must be forked to use the new release branch. Use [`releng/config-forker`] to
accomplish this, eg:
When cutting a new Kubernetes release branch, the jobs defined in
[`cloud_provider_image_validation.yaml`] must be forked for the new
release branch. Use [`releng/config-forker`] to accomplish this, e.g.:


```sh
# from test-infra root
$ go run ./releng/config-forker \
--job-config $(pwd)/releng/cloud_provider_image_validation.yaml \
--version 1.27 \
--go-version 1.31 \
Comment on lines +11 to +12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put placeholders here, e.g.:

Suggested change
--version 1.27 \
--go-version 1.31 \
--version <Kubernetes major.minor version> \
--go-version <used Go major.minor version> \

--output $(pwd)/config/jobs/kubernetes/sig-release/release-branch-jobs/cloud-provider/image-validation-1.31.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--output $(pwd)/config/jobs/kubernetes/sig-release/release-branch-jobs/cloud-provider/image-validation-1.31.yaml
--output $(pwd)/config/jobs/kubernetes/sig-release/release-branch-jobs/cloud-provider/image-validation-<Kubernetes major.minor version>.yaml

```

# How to rotate the k8sbeta job to stable1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should explain why are we doing this.


```sh
# from test-infra root
$ go run ./releng/config-rotator \
--config-file ./config/jobs/kubernetes/sig-release/release-branch-jobs/cloud-provider/image-validation-1.31.yaml \
--new stable1 --old beta
```
Comment on lines +18 to +23
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also mention that this needs to be done for each release branch.



[`releng/config-forker`]: /releng/config-forker
[`cloud_provider_image_validation.yaml`]: /releng/cloud_provider_image_validation.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
periodics:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this file called "image-validation" ? isn't this just release-branched jobs?

and they're not really sig-release, even if release is doing the forking, cc @aojea
these are just kubernetes/kubernetes jobs, with multiple SIGs involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replied here

- annotations:
fork-per-release-periodic-interval: ""
testgrid-dashboards: sig-release-1.29-blocking
testgrid-tab-name: gce-cos-k8sstable3-alphafeatures
cluster: k8s-infra-prow-build
decorate: true
decoration_config:
timeout: 3h20m0s
interval: 24h
labels:
preset-k8s-ssh: "true"
preset-service-account: "true"
name: ci-kubernetes-e2e-gce-cos-k8sstable3-alphafeatures
spec:
containers:
- args:
- --cluster=test-gce-cos-k8sstable3-alphafeatures
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to this change, this used to be a randomly generated string (I think). Do we care about this change, is there a chance that this might break something?

cc @BenTheElder

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will break anything, we are fully on boskos. I don't think we even need to set this flag anymore, but I'd prefer to leave it for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest concern would be generating a value that is too long or improper character set, if we're changing the generation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value for the --cluster flag was previously assigned the SHA1 hash of the job_name, generated by this line.

Since the job names only varied by version markers (e.g., beta, stable1) and remained consistent otherwise, the --cluster data stayed aligned with the job name despite version changes.

Given this consistency, I thought assigning the --cluster value directly as the job name should suffice.

- --check-leaked-resources
- --provider=gce
- --gcp-zone=us-west1-b
- --gcp-node-image=gci
- --extract=ci/latest-1.29
- --extract-ci-bucket=k8s-release-dev
- --timeout=180m
- --env=KUBE_PROXY_DAEMONSET=true
- --env=ENABLE_POD_PRIORITY=true
- --env=KUBE_FEATURE_GATES=AllAlpha=true
- --env=ENABLE_CACHE_MUTATION_DETECTOR=true
- --runtime-config=api/all=true
- --test_args=--ginkgo.focus=\[Feature:(Audit|BlockVolume|PodPreset|ExpandCSIVolumes|ExpandInUseVolumes)\]|Networking --ginkgo.skip=\[Feature:(SCTPConnectivity|Volumes|Networking-Performance|Networking-IPv6)\]|csi-hostpath-v0 --minStartupPods=8
command:
- runner.sh
- /workspace/scenarios/kubernetes_e2e.py
image: gcr.io/k8s-staging-test-infra/kubekins-e2e:v20241128-8df65c072f-1.29
name: ""
resources:
limits:
cpu: "1"
memory: 3Gi
requests:
cpu: "1"
memory: 3Gi
- annotations:
fork-per-release-periodic-interval: ""
testgrid-dashboards: sig-release-1.29-blocking
testgrid-tab-name: gce-cos-k8sstable3-default
testgrid-num-failures-to-alert: "6"
cluster: k8s-infra-prow-build
decorate: true
decoration_config:
timeout: 2h20m0s
interval: 24h
labels:
preset-k8s-ssh: "true"
preset-service-account: "true"
name: ci-kubernetes-e2e-gce-cos-k8sstable3-default
spec:
containers:
- args:
- --cluster=test-gce-cos-k8sstable3-default
- --check-leaked-resources
- --provider=gce
- --gcp-zone=us-west1-b
- --gcp-node-image=gci
- --extract=ci/latest-1.29
- --extract-ci-bucket=k8s-release-dev
- --timeout=120m
- --test_args=--ginkgo.skip=\[Driver:.gcepd\]|\[Slow\]|\[Serial\]|\[Disruptive\]|\[Flaky\]|\[Feature:.+\] --minStartupPods=8
- --ginkgo-parallel=30
command:
- runner.sh
- /workspace/scenarios/kubernetes_e2e.py
image: gcr.io/k8s-staging-test-infra/kubekins-e2e:v20241128-8df65c072f-1.29
name: ""
resources:
limits:
cpu: "2"
memory: 6Gi
requests:
cpu: "2"
memory: 6Gi
- annotations:
fork-per-release-periodic-interval: ""
testgrid-dashboards: sig-release-1.29-blocking
testgrid-tab-name: gce-cos-k8sstable3-ingress
cluster: k8s-infra-prow-build
decorate: true
decoration_config:
timeout: 2h50m0s
interval: 24h
labels:
preset-k8s-ssh: "true"
preset-service-account: "true"
name: ci-kubernetes-e2e-gce-cos-k8sstable3-ingress
spec:
containers:
- args:
- --cluster=test-gce-cos-k8sstable3-ingress
- --check-leaked-resources
- --provider=gce
- --gcp-zone=us-west1-b
- --gcp-node-image=gci
- --extract=ci/latest-1.29
- --extract-ci-bucket=k8s-release-dev
- --timeout=150m
- --test_args=--ginkgo.focus=\[Feature:Ingress\] --minStartupPods=8
command:
- runner.sh
- /workspace/scenarios/kubernetes_e2e.py
image: gcr.io/k8s-staging-test-infra/kubekins-e2e:v20241128-8df65c072f-1.29
name: ""
resources:
limits:
cpu: "1"
memory: 3Gi
requests:
cpu: "1"
memory: 3Gi
- annotations:
fork-per-release-periodic-interval: ""
testgrid-dashboards: sig-release-1.29-blocking
testgrid-tab-name: gce-cos-k8sstable3-reboot
cluster: k8s-infra-prow-build
decorate: true
decoration_config:
timeout: 3h20m0s
interval: 24h
labels:
preset-k8s-ssh: "true"
preset-service-account: "true"
name: ci-kubernetes-e2e-gce-cos-k8sstable3-reboot
spec:
containers:
- args:
- --cluster=test-gce-cos-k8sstable3-reboot
- --check-leaked-resources
- --provider=gce
- --gcp-zone=us-west1-b
- --gcp-node-image=gci
- --extract=ci/latest-1.29
- --extract-ci-bucket=k8s-release-dev
- --timeout=180m
- --test_args=--ginkgo.focus=\[Feature:Reboot\] --minStartupPods=8
command:
- runner.sh
- /workspace/scenarios/kubernetes_e2e.py
image: gcr.io/k8s-staging-test-infra/kubekins-e2e:v20241128-8df65c072f-1.29
name: ""
resources:
limits:
cpu: "1"
memory: 3Gi
requests:
cpu: "1"
memory: 3Gi
- annotations:
fork-per-release-periodic-interval: ""
testgrid-dashboards: sig-release-1.29-informing
testgrid-num-failures-to-alert: "6"
testgrid-tab-name: gce-cos-k8sstable3-serial
cluster: k8s-infra-prow-build
decorate: true
decoration_config:
timeout: 11h20m0s
interval: 24h
labels:
preset-k8s-ssh: "true"
preset-service-account: "true"
name: ci-kubernetes-e2e-gce-cos-k8sstable3-serial
spec:
containers:
- args:
- --cluster=test-gce-cos-k8sstable3-serial
- --check-leaked-resources
- --provider=gce
- --gcp-zone=us-west1-b
- --gcp-node-image=gci
- --extract=ci/latest-1.29
- --extract-ci-bucket=k8s-release-dev
- --timeout=660m
- --ginkgo-parallel=1
- --test_args=--ginkgo.focus=\[Serial\]|\[Disruptive\] --ginkgo.skip=\[Driver:.gcepd\]|\[Flaky\]|\[Feature:.+\] --minStartupPods=8
command:
- runner.sh
- /workspace/scenarios/kubernetes_e2e.py
image: gcr.io/k8s-staging-test-infra/kubekins-e2e:v20241128-8df65c072f-1.29
name: ""
resources:
limits:
cpu: "1"
memory: 3Gi
requests:
cpu: "1"
memory: 3Gi
- annotations:
fork-per-release-periodic-interval: ""
testgrid-dashboards: sig-release-1.29-informing
testgrid-num-failures-to-alert: "6"
testgrid-tab-name: gce-cos-k8sstable3-slow
cluster: k8s-infra-prow-build
decorate: true
decoration_config:
timeout: 2h50m0s
interval: 24h
labels:
preset-k8s-ssh: "true"
preset-service-account: "true"
name: ci-kubernetes-e2e-gce-cos-k8sstable3-slow
spec:
containers:
- args:
- --cluster=test-gce-cos-k8sstable3-slow
- --check-leaked-resources
- --provider=gce
- --gcp-zone=us-west1-b
- --gcp-node-image=gci
- --extract=ci/latest-1.29
- --extract-ci-bucket=k8s-release-dev
- --timeout=150m
- --test_args=--ginkgo.focus=\[Slow\] --ginkgo.skip=\[Driver:.gcepd\]|\[Serial\]|\[Disruptive\]|\[Flaky\]|\[Feature:.+\] --minStartupPods=8
- --ginkgo-parallel=30
command:
- runner.sh
- /workspace/scenarios/kubernetes_e2e.py
image: gcr.io/k8s-staging-test-infra/kubekins-e2e:v20241128-8df65c072f-1.29
name: ""
resources:
limits:
cpu: "1"
memory: 6Gi
requests:
cpu: "1"
memory: 6Gi
postsubmits: {}
presubmits: {}
Loading