Bug 2016286 - cascading mounts happening exponentially on when deleting openstack-cinder-csi-driver-node pods
Summary: cascading mounts happening exponentially on when deleting openstack-cinder-cs...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.z
Assignee: Martin André
QA Contact: rlobillo
URL:
Whiteboard:
: 2012765 2025444 (view as bug list)
Depends On: 1952211
Blocks: 2026197
TreeView+ depends on / blocked
 
Reported: 2021-10-21 08:14 UTC by OpenShift BugZilla Robot
Modified: 2021-12-01 13:35 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-01 13:35:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openstack-cinder-csi-driver-operator pull 58 0 None open [release-4.7] Bug 2016286: Fix error when mounting /var/lib/kubelet/pods 2021-10-21 08:14:29 UTC
Red Hat Product Errata RHBA-2021:4802 0 None None None 2021-12-01 13:35:43 UTC

Comment 1 Martin André 2021-10-21 08:15:36 UTC
*** Bug 2012765 has been marked as a duplicate of this bug. ***

Comment 5 rlobillo 2021-11-18 13:05:33 UTC
Verified on 4.7.0-0.nightly-2021-11-17-094737 over OSP16.1 (RHOS-16.1-RHEL-8-20210903.n.0).

clusteroperator storage is fully functional after IPI installation: 

$ oc get clusteroperators storage
NAME      VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
storage   4.7.0-0.nightly-2021-11-17-094737   True        False         False      118m


and the manifest changes are present (there is not any volume targeting /var/lib/kubelet/pods):

$ export WORKER=ostest-8m2wp-worker-0-d4bvx

$ oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=$WORKER -o json | jq '.items[0].spec.volumes[]'
{
  "hostPath": {
    "path": "/var/lib/kubelet/plugins/cinder.csi.openstack.org",
    "type": "DirectoryOrCreate"
  },
  "name": "socket-dir"
}
{
  "hostPath": {
    "path": "/var/lib/kubelet/plugins_registry/",
    "type": "Directory"
  },
  "name": "registration-dir"
}
{
  "hostPath": {
    "path": "/var/lib/kubelet",
    "type": "Directory"
  },
  "name": "kubelet-dir"
}
{
  "hostPath": {
    "path": "/dev",
    "type": "Directory"
  },
  "name": "pods-probe-dir"
}
{
  "name": "secret-cinderplugin",
  "secret": {
    "defaultMode": 420,
    "items": [
      {
        "key": "clouds.yaml",
        "path": "clouds.yaml"
      }
    ],
    "secretName": "openstack-cloud-credentials"
  }
}
{
  "configMap": {
    "defaultMode": 420,
    "items": [
      {
        "key": "cloud.conf",
        "path": "cloud.conf"
      }
    ],
    "name": "openstack-cinder-config"
  },
  "name": "config-cinderplugin"
}
{
  "configMap": {
    "defaultMode": 420,
    "items": [
      {
        "key": "ca-bundle.pem",
        "path": "ca-bundle.pem"
      }
    ],
    "name": "cloud-provider-config",
    "optional": true
  },
  "name": "cacert"
}
{
  "name": "openstack-cinder-csi-driver-node-sa-token-7jztr",
  "secret": {
    "defaultMode": 420,
    "secretName": "openstack-cinder-csi-driver-node-sa-token-7jztr"
  }
}


Inside csi-driver pod, as expected, there is no partition on /var/lib/kubelet/pods and it is on /var/lib/kubelet:

$ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=$WORKER -o NAME)
Defaulting container name to csi-driver.
Use 'oc describe pod/openstack-cinder-csi-driver-node-2zbqn -n openshift-cluster-csi-drivers' to see all of the containers in this pod.
sh-4.4# findmnt -D | grep '/var/lib/kubelet/pods$'
sh-4.4# findmnt -D | grep '/var/lib/kubelet$'
/dev/vda4[/ostree/deploy/rhcos/var/lib/kubelet]                                                                                               xfs      39.5G  8.2G 31.3G  21% /var/lib/kubelet
sh-4.4# 


After restarting the pod 100 times, the system remains stable:

$ for i in {1..100}; do echo $i; oc delete -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=$WORKER -o name); done
...

(shiftstack) [stack@undercloud-0 ~]$ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=$WORKER -o NAME)
Defaulting container name to csi-driver.
Use 'oc describe pod/openstack-cinder-csi-driver-node-cwmrh -n openshift-cluster-csi-drivers' to see all of the containers in this pod.
sh-4.4# findmnt -D | grep '/var/lib/kubelet$'
/dev/vda4[/ostree/deploy/rhcos/var/lib/kubelet]                                                                                               xfs      39.5G  8.4G 31.1G  21% /var/lib/kubelet
sh-4.4# 
                                                                                      xfs      39.5G  8.8G 30.7G  22% /var/lib/kubelet

(shiftstack) [stack@undercloud-0 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-11-17-094737   True        False         164m    Cluster version is 4.7.0-0.nightly-2021-11-17-094737
(shiftstack) [stack@undercloud-0 ~]$ oc get clusteroperator storage
NAME      VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
storage   4.7.0-0.nightly-2021-11-17-094737   True        False         False      3h3m

Comment 7 Pierre Prinetti 2021-11-22 09:02:16 UTC
*** Bug 2025444 has been marked as a duplicate of this bug. ***

Comment 8 rlobillo 2021-11-22 09:24:07 UTC
This fix will be included in the next 4.7.38. Estimated Ship date: 01-Dec.

Comment 10 ShiftStack Bugwatcher 2021-11-25 16:12:48 UTC
Removing the Triaged keyword because:

* the QE automation assessment (flag qe_test_coverage) is missing

Comment 12 errata-xmlrpc 2021-12-01 13:35:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.38 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4802


Note You need to log in before you can comment on or make changes to this bug.