Bug 2029836
Summary: | OLM collect-profiles job tries to access non existing volumes after upgrade from 4.8 to 4.9 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Stefan Seifried <s.seifried> |
Component: | OLM | Assignee: | Alexander Greene <agreene> |
OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | agreene, alosingh, cris.teneyck, danijel.tudek, grekeh, k.bohdan.v, krizza, parodrig, powersg, sdodson, sparpate, vrutkovs, wiha1292, wking |
Version: | 4.9 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-03-11 19:33:49 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Stefan Seifried
2021-12-07 12:32:22 UTC
https://search.ci.openshift.org/?search=object+%22openshift-operator-lifecycle-manager%22%2F%22pprof-cert%22+not+registered&maxAge=168h&context=1&type=junit&name=.*upgrade.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job Seems to be happening in 4.8 -> 4.9 and 4.9 -> 4.10 nightly upgrades. Noticed it on my OKD 4.8 -> 4.9, checking if this occurs on 4.9.10 -> 4.10 nightly OCP upgrade Reproduced on 4.9.10 -> 4.10.0-0.ci-2021-12-06-061923, clean install didn't throw MountVolume.SetUp error events, but installing any operator (ArgoCD in my case) triggers it. This doesn't seem to affect any OLM functionality though I'm also seeing these messages appear daily on OCP v4.9.5. This was an IPI test install on VMWare which installed v4.9.0. System subsequently upgraded to 4.9.4 and then 4.9.5. All three messages are listed against pod/collect-profiles-27317325--1-49xm2. I'm new to the OCP game Hopefully above info is useful. This same issue happens in all of our OKD clusters (4 of them) after update to 2021-11-28. However, in our case, the same volume mount errors happen in our projects when scaling DeploymentConfigs and executing CronJobs. They disappear if I update the DC/CronJob, or if I delete and redeploy the same version from our CI/CD. Also, in some projects (not all), the Secret which contains pull credentials for our image registry has disappeared. Same issue here. OKD4.9 bare metal clean install 4.9.0-0.okd-2021-12-12-025847 Same issue on OCP 4.9.11 Still present in OKD 4.9 2022-01-14. Looks like a workaround: I created a test Pod that consumes the same volumes: "secret-volume" and "pprof-cert". The test pod mounted the volumes without a problem. After the test, the "collect-profiles-xxxx" pods started to create, mount, run, and complete properly, without any errors. (just copy the XXXXX parts form your failed "collect-profiles-xxxx" pods) apiVersion: v1 kind: Pod metadata: name: centos namespace: openshift-operator-lifecycle-manager spec: restartPolicy: Never serviceAccountName: collect-profiles imagePullSecrets: - name: collect-profiles-dockercfg-XXXXX schedulerName: default-scheduler enableServiceLinks: true terminationGracePeriodSeconds: 3000000 preemptionPolicy: PreemptLowerPriority nodeName: okd-XXXXX-worker-XXXXX securityContext: seLinuxOptions: level: 's0:c20,c0' fsGroup: 1000380000 containers: - image: centos:8 name: test-test command: ["/bin/sleep", "3650d"] resources: {} volumeMounts: - name: config-volume mountPath: /etc/config - name: secret-volume mountPath: /var/run/secrets/serving-cert volumes: - name: config-volume configMap: name: collect-profiles-config defaultMode: 420 - name: secret-volume secret: secretName: pprof-cert defaultMode: 420 *** This bug has been marked as a duplicate of bug 1999325 *** The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |