Bug 2004568 - Cluster-version operator does not remove unrecognized volume mounts
Summary: Cluster-version operator does not remove unrecognized volume mounts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.9.0
Assignee: W. Trevor King
QA Contact: Yang Yang
URL:
Whiteboard:
Depends On: 2002834
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-15 14:52 UTC by Scott Dodson
Modified: 2022-05-06 12:34 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
If this bug requires documentation, please select an appropriate Doc Type value.
Clone Of: 2002834
Environment:
Last Closed: 2021-10-18 17:51:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 657 0 None open [release-4.9] Bug 2004568: lib/resourcemerge/core: Remove unrecognized volumes and mounts 2021-09-16 01:26:03 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:52:07 UTC

Comment 1 Yang Yang 2021-09-16 05:10:21 UTC
I'm trying the pre-merge test by launching a cluster using cluster-bot.

launch openshift/cluster-version-operator#657 gcp

# oc get clusterversion
NAME      VERSION                                                  AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.ci.test-2021-09-16-032740-ci-ln-zrdr9p2-latest   True        False         4m37s   Cluster version is 4.9.0-0.ci.test-2021-09-16-032740-ci-ln-zrdr9p2-latest

4.9 deployment does not have configmap cluster-autoscaler-operator-ca as a volume.

# oc get deploy -n openshift-machine-api cluster-autoscaler-operator -o json | jq .spec.template.spec.volumes[]
{
  "name": "cert",
  "secret": {
    "defaultMode": 420,
    "items": [
      {
        "key": "tls.crt",
        "path": "tls.crt"
      },
      {
        "key": "tls.key",
        "path": "tls.key"
      }
    ],
    "secretName": "cluster-autoscaler-operator-cert"
  }
}
{
  "configMap": {
    "defaultMode": 420,
    "name": "kube-rbac-proxy-cluster-autoscaler-operator"
  },
  "name": "auth-proxy-config"
}

4.9 cluster does not have configmaps "cluster-autoscaler-operator-ca".

[root@preserve-yangyangmerrn-1 tmp]# oc get cm cluster-autoscaler-operator-ca -n openshift-machine-api
Error from server (NotFound): configmaps "cluster-autoscaler-operator-ca" not found

[root@preserve-yangyangmerrn-1 tmp]# oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.ci-ln-zrdr9p2-f76d1.origin-ci-int-gce.dev.openshift.com:6443".

Inject the volume and volumemount to the cluster-autoscaler-operator deployment.

# oc edit deployment.apps/cluster-autoscaler-operator
deployment.apps/cluster-autoscaler-operator edited
    108         name: cluster-autoscaler-operator
    109         ports:
    110         - containerPort: 8443
    111           protocol: TCP
    112         resources:
    113           requests:
    114             cpu: 20m
    115             memory: 50Mi
    116         terminationMessagePath: /dev/termination-log
    117         terminationMessagePolicy: FallbackToLogsOnError
    118         volumeMounts:
    119         - name: ca-cert
    120           mountPath: /etc/cluster-autoscaler-operator/tls/service-ca
    121           readOnly: true
    122         - mountPath: /etc/cluster-autoscaler-operator/tls
    123           name: cert
    124           readOnly: true
    139       volumes:
    140       - name: ca-cert
    141         configMap:
    142           name: cluster-autoscaler-operator-ca
    143           items:
    144           - key: service-ca.crt
    145             path: ca-cert.pem

Yeah, we have configMap cluster-autoscaler-operator-ca as a volume now.

# oc get deploy -n openshift-machine-api cluster-autoscaler-operator -o json | jq .spec.template.spec.volumes[]
{
  "configMap": {
    "defaultMode": 420,
    "items": [
      {
        "key": "service-ca.crt",
        "path": "ca-cert.pem"
      }
    ],
    "name": "cluster-autoscaler-operator-ca"
  },
  "name": "ca-cert"
}
{
  "name": "cert",
  "secret": {
    "defaultMode": 420,
    "items": [
      {
        "key": "tls.crt",
        "path": "tls.crt"
      },
      {
        "key": "tls.key",
        "path": "tls.key"
      }
    ],
    "secretName": "cluster-autoscaler-operator-cert"
  }
}
{
  "configMap": {
    "defaultMode": 420,
    "name": "kube-rbac-proxy-cluster-autoscaler-operator"
  },
  "name": "auth-proxy-config"
}

Watching the cluster-autoscaler-operator pod

[root@preserve-yangyangmerrn-1 tmp]# oc get po --watch
NAME                                           READY   STATUS              RESTARTS      AGE
cluster-autoscaler-operator-584764d849-gx8x9   0/2     ContainerCreating   0             21s
cluster-autoscaler-operator-6448c6b7fd-t624p   2/2     Running             0             44m
cluster-baremetal-operator-9dbcfcff9-t4448     2/2     Running             1 (76m ago)   81m
machine-api-controllers-56f7897445-d9k8z       7/7     Running             0             76m
machine-api-operator-5fc7876cdf-25g75          2/2     Running             0             81m
cluster-autoscaler-operator-584764d849-gx8x9   0/2     Terminating         0             112s
cluster-autoscaler-operator-584764d849-gx8x9   0/2     Terminating         0             2m4s
cluster-autoscaler-operator-584764d849-gx8x9   0/2     Terminating         0             2m4s

# oc get po
NAME                                           READY   STATUS    RESTARTS      AGE
cluster-autoscaler-operator-6448c6b7fd-t624p   2/2     Running   0             52m
cluster-baremetal-operator-9dbcfcff9-t4448     2/2     Running   1 (83m ago)   88m
machine-api-controllers-56f7897445-d9k8z       7/7     Running   0             83m
machine-api-operator-5fc7876cdf-25g75          2/2     Running   0             88m

The new pod is terminated because the configmap "cluster-autoscaler-operator-ca" not found.

# oc get event -n openshift-machine-api | grep cluster-autoscaler-operator-584764d849-gx8x9
6m57s       Normal    Scheduled           pod/cluster-autoscaler-operator-584764d849-gx8x9    Successfully assigned openshift-machine-api/cluster-autoscaler-operator-584764d849-gx8x9 to ci-ln-zrdr9p2-f76d1-cpg5p-master-0
5m53s       Warning   FailedMount         pod/cluster-autoscaler-operator-584764d849-gx8x9    MountVolume.SetUp failed for volume "ca-cert" : configmap "cluster-autoscaler-operator-ca" not found
4m54s       Warning   FailedMount         pod/cluster-autoscaler-operator-584764d849-gx8x9    Unable to attach or mount volumes: unmounted volumes=[ca-cert], unattached volumes=[kube-api-access-tzh9w ca-cert auth-proxy-config cert]: timed out waiting for the condition
6m57s       Normal    SuccessfulCreate    replicaset/cluster-autoscaler-operator-584764d849   Created pod: cluster-autoscaler-operator-584764d849-gx8x9
5m5s        Normal    SuccessfulDelete    replicaset/cluster-autoscaler-operator-584764d849   Deleted pod: cluster-autoscaler-operator-584764d849-gx8x9

The volume "ca-cert" : configmap "cluster-autoscaler-operator-ca" is removed from the deployment.

# oc get deploy -n openshift-machine-api cluster-autoscaler-operator -o json | jq .spec.template.spec.volumes[]
{
  "name": "cert",
  "secret": {
    "defaultMode": 420,
    "items": [
      {
        "key": "tls.crt",
        "path": "tls.crt"
      },
      {
        "key": "tls.key",
        "path": "tls.key"
      }
    ],
    "secretName": "cluster-autoscaler-operator-cert"
  }
}
{
  "configMap": {
    "defaultMode": 420,
    "name": "kube-rbac-proxy-cluster-autoscaler-operator"
  },
  "name": "auth-proxy-config"
}

Comment 2 Yang Yang 2021-09-16 06:21:47 UTC
To prove the procedure in comment#1 is suitable to test the change, I perform the similar test on a 4.8 cluster.

# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-09-15-162303   True        False         129m    Cluster version is 4.8.0-0.nightly-2021-09-15-162303

Inject the volume ca-cert configMap: cluster-autoscaler-operator-ca to the deployment of cluster-autoscaler-operator.

# oc get pod
NAME                                           READY   STATUS              RESTARTS   AGE
cluster-autoscaler-operator-695cfd657-rfss9    2/2     Running             0          155m
cluster-autoscaler-operator-766f6648bd-sh2f7   0/2     ContainerCreating   0          10m
cluster-baremetal-operator-6468998c6b-tdwpt    2/2     Running             1          155m
machine-api-controllers-58cb4f598-hzm4t        7/7     Running             2          148m
machine-api-operator-b8cc66c9b-xj7gn           2/2     Running             1          155m

# oc get deploy -n openshift-machine-api cluster-autoscaler-operator -o json | jq .spec.template.spec.volumes[]
{
  "configMap": {
    "defaultMode": 420,
    "items": [
      {
        "key": "service-ca.crt",
        "path": "ca-cert.pem"
      }
    ],
    "name": "cluster-autoscaler-operator-ca"
  },
  "name": "ca-cert"
}
{
  "name": "cert",
  "secret": {
    "defaultMode": 420,
    "items": [
      {
        "key": "tls.crt",
        "path": "tls.crt"
      },
      {
        "key": "tls.key",
        "path": "tls.key"
      }
    ],
    "secretName": "cluster-autoscaler-operator-cert"
  }
}
{
  "configMap": {
    "defaultMode": 420,
    "name": "kube-rbac-proxy-cluster-autoscaler-operator"
  },
  "name": "auth-proxy-config"
}

The cluster-autoscaler-operator pod gets stuck on the ContainerCreating and the deployment doesn't get the volume removed.

Comment 4 Yang Yang 2021-09-17 06:24:16 UTC
Following comment#1 to verify it with 4.9.0-0.nightly-2021-09-16-215330 and passed. Moving it to verified state.

Comment 7 errata-xmlrpc 2021-10-18 17:51:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.