Bug 2029836

Summary: OLM collect-profiles job tries to access non existing volumes after upgrade from 4.8 to 4.9
Product: OpenShift Container Platform Reporter: Stefan Seifried <s.seifried>
Component: OLMAssignee: Alexander Greene <agreene>
OLM sub component: OLM QA Contact: Jian Zhang <jiazha>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: medium CC: agreene, alosingh, cris.teneyck, danijel.tudek, grekeh, k.bohdan.v, krizza, parodrig, powersg, sdodson, sparpate, vrutkovs, wiha1292, wking
Version: 4.9   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-11 19:33:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stefan Seifried 2021-12-07 12:32:22 UTC
Description of problem:
I noticed that the OLM collect-profiles job tries and fails to access some volumes.

---
MountVolume.SetUp failed for volume "config-volume" : object "openshift-operator-lifecycle-manager"/"collect-profiles-config" not registered
MountVolume.SetUp failed for volume "kube-api-access-d4kzh" : [object "openshift-operator-lifecycle-manager"/"kube-root-ca.crt" not registered, object "openshift-operator-lifecycle-manager"/"openshift-service-ca.crt" not registered]
MountVolume.SetUp failed for volume "secret-volume" : object "openshift-operator-lifecycle-manager"/"pprof-cert" not registered
---

I noticed, that the same log messages can be found here:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-ovn-upgrade/1428509007814856704

Version-Release number of selected component (if applicable):
4.9.0-0.okd-2021-11-28-035710

How reproducible:
Upgrade existing 4.8 okd cluster to 4.9

Actual results:
Errors as described above

Expected results:
No errors


Additional info:

Comment 1 Stefan Seifried 2021-12-07 12:33:08 UTC
See also https://github.com/openshift/okd/issues/1004

Comment 2 Vadim Rutkovsky 2021-12-07 14:03:25 UTC
https://search.ci.openshift.org/?search=object+%22openshift-operator-lifecycle-manager%22%2F%22pprof-cert%22+not+registered&maxAge=168h&context=1&type=junit&name=.*upgrade.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Seems to be happening in 4.8 -> 4.9 and 4.9 -> 4.10 nightly upgrades. Noticed it on my OKD 4.8 -> 4.9, checking if this occurs on 4.9.10 -> 4.10 nightly OCP upgrade

Comment 3 Vadim Rutkovsky 2021-12-07 15:36:16 UTC
Reproduced on 4.9.10 -> 4.10.0-0.ci-2021-12-06-061923, clean install didn't throw MountVolume.SetUp error events, but installing any operator (ArgoCD in my case) triggers it.

This doesn't seem to affect any OLM functionality though

Comment 4 Steve Power 2021-12-09 08:59:35 UTC
I'm also seeing these messages appear daily on OCP v4.9.5.  This was an IPI test install on VMWare which installed v4.9.0. System subsequently upgraded to 4.9.4 and then 4.9.5.  All three messages are listed against pod/collect-profiles-27317325--1-49xm2.  I'm new to the OCP game  Hopefully above info is useful.

Comment 6 Danijel Tudek 2021-12-16 11:46:57 UTC
This same issue happens in all of our OKD clusters (4 of them) after update to 2021-11-28. However, in our case, the same volume mount errors happen in our projects when scaling DeploymentConfigs and executing CronJobs. They disappear if I update the DC/CronJob, or if I delete and redeploy the same version from our CI/CD.

Also, in some projects (not all), the Secret which contains pull credentials for our image registry has disappeared.

Comment 7 Vladislav Babin 2022-01-05 07:09:57 UTC
Same issue here. OKD4.9 bare metal clean install 4.9.0-0.okd-2021-12-12-025847

Comment 8 Ronald DC 2022-01-11 03:22:05 UTC
Same issue on OCP 4.9.11

Comment 9 Danijel Tudek 2022-01-18 08:51:33 UTC
Still present in OKD 4.9 2022-01-14.

Comment 11 BoZi 2022-01-28 16:55:39 UTC
Looks like a workaround:
I created a test Pod that consumes the same volumes: "secret-volume" and "pprof-cert".
The test pod mounted the volumes without a problem.
After the test, the "collect-profiles-xxxx" pods started to create, mount, run, and complete properly, without any errors.

(just copy the XXXXX parts form your failed "collect-profiles-xxxx" pods)
apiVersion: v1
kind: Pod
metadata:
  name: centos
  namespace: openshift-operator-lifecycle-manager
spec:
  restartPolicy: Never
  serviceAccountName: collect-profiles
  imagePullSecrets:
    - name: collect-profiles-dockercfg-XXXXX
  schedulerName: default-scheduler
  enableServiceLinks: true
  terminationGracePeriodSeconds: 3000000
  preemptionPolicy: PreemptLowerPriority
  nodeName: okd-XXXXX-worker-XXXXX
  securityContext:
    seLinuxOptions:
      level: 's0:c20,c0'
    fsGroup: 1000380000
  containers:
  - image: centos:8
    name: test-test
    command: ["/bin/sleep", "3650d"]
    resources: {}
    volumeMounts:
      - name: config-volume
        mountPath: /etc/config
      - name: secret-volume
        mountPath: /var/run/secrets/serving-cert
  volumes:
    - name: config-volume
      configMap:
        name: collect-profiles-config
        defaultMode: 420
    - name: secret-volume
      secret:
        secretName: pprof-cert
        defaultMode: 420

Comment 18 Alexander Greene 2022-03-11 19:33:49 UTC

*** This bug has been marked as a duplicate of bug 1999325 ***

Comment 19 Red Hat Bugzilla 2023-09-15 01:17:55 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days