Bug 2016286

Summary: cascading mounts happening exponentially on when deleting openstack-cinder-csi-driver-node pods
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: StorageAssignee: Martin André <m.andre>
Storage sub component: OpenStack CSI Drivers QA Contact: rlobillo
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: adduarte, aos-bugs, m.andre, mbagga, mfedosin, palonsor, pprinett, ssonigra, tkimura, tsmetana, wking
Version: 4.7   
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-01 13:35:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1952211    
Bug Blocks: 2026197    

Comment 1 Martin André 2021-10-21 08:15:36 UTC
*** Bug 2012765 has been marked as a duplicate of this bug. ***

Comment 5 rlobillo 2021-11-18 13:05:33 UTC
Verified on 4.7.0-0.nightly-2021-11-17-094737 over OSP16.1 (RHOS-16.1-RHEL-8-20210903.n.0).

clusteroperator storage is fully functional after IPI installation: 

$ oc get clusteroperators storage
NAME      VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
storage   4.7.0-0.nightly-2021-11-17-094737   True        False         False      118m


and the manifest changes are present (there is not any volume targeting /var/lib/kubelet/pods):

$ export WORKER=ostest-8m2wp-worker-0-d4bvx

$ oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=$WORKER -o json | jq '.items[0].spec.volumes[]'
{
  "hostPath": {
    "path": "/var/lib/kubelet/plugins/cinder.csi.openstack.org",
    "type": "DirectoryOrCreate"
  },
  "name": "socket-dir"
}
{
  "hostPath": {
    "path": "/var/lib/kubelet/plugins_registry/",
    "type": "Directory"
  },
  "name": "registration-dir"
}
{
  "hostPath": {
    "path": "/var/lib/kubelet",
    "type": "Directory"
  },
  "name": "kubelet-dir"
}
{
  "hostPath": {
    "path": "/dev",
    "type": "Directory"
  },
  "name": "pods-probe-dir"
}
{
  "name": "secret-cinderplugin",
  "secret": {
    "defaultMode": 420,
    "items": [
      {
        "key": "clouds.yaml",
        "path": "clouds.yaml"
      }
    ],
    "secretName": "openstack-cloud-credentials"
  }
}
{
  "configMap": {
    "defaultMode": 420,
    "items": [
      {
        "key": "cloud.conf",
        "path": "cloud.conf"
      }
    ],
    "name": "openstack-cinder-config"
  },
  "name": "config-cinderplugin"
}
{
  "configMap": {
    "defaultMode": 420,
    "items": [
      {
        "key": "ca-bundle.pem",
        "path": "ca-bundle.pem"
      }
    ],
    "name": "cloud-provider-config",
    "optional": true
  },
  "name": "cacert"
}
{
  "name": "openstack-cinder-csi-driver-node-sa-token-7jztr",
  "secret": {
    "defaultMode": 420,
    "secretName": "openstack-cinder-csi-driver-node-sa-token-7jztr"
  }
}


Inside csi-driver pod, as expected, there is no partition on /var/lib/kubelet/pods and it is on /var/lib/kubelet:

$ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=$WORKER -o NAME)
Defaulting container name to csi-driver.
Use 'oc describe pod/openstack-cinder-csi-driver-node-2zbqn -n openshift-cluster-csi-drivers' to see all of the containers in this pod.
sh-4.4# findmnt -D | grep '/var/lib/kubelet/pods$'
sh-4.4# findmnt -D | grep '/var/lib/kubelet$'
/dev/vda4[/ostree/deploy/rhcos/var/lib/kubelet]                                                                                               xfs      39.5G  8.2G 31.3G  21% /var/lib/kubelet
sh-4.4# 


After restarting the pod 100 times, the system remains stable:

$ for i in {1..100}; do echo $i; oc delete -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=$WORKER -o name); done
...

(shiftstack) [stack@undercloud-0 ~]$ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=$WORKER -o NAME)
Defaulting container name to csi-driver.
Use 'oc describe pod/openstack-cinder-csi-driver-node-cwmrh -n openshift-cluster-csi-drivers' to see all of the containers in this pod.
sh-4.4# findmnt -D | grep '/var/lib/kubelet$'
/dev/vda4[/ostree/deploy/rhcos/var/lib/kubelet]                                                                                               xfs      39.5G  8.4G 31.1G  21% /var/lib/kubelet
sh-4.4# 
                                                                                      xfs      39.5G  8.8G 30.7G  22% /var/lib/kubelet

(shiftstack) [stack@undercloud-0 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-11-17-094737   True        False         164m    Cluster version is 4.7.0-0.nightly-2021-11-17-094737
(shiftstack) [stack@undercloud-0 ~]$ oc get clusteroperator storage
NAME      VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
storage   4.7.0-0.nightly-2021-11-17-094737   True        False         False      3h3m

Comment 7 Pierre Prinetti 2021-11-22 09:02:16 UTC
*** Bug 2025444 has been marked as a duplicate of this bug. ***

Comment 8 rlobillo 2021-11-22 09:24:07 UTC
This fix will be included in the next 4.7.38. Estimated Ship date: 01-Dec.

Comment 10 ShiftStack Bugwatcher 2021-11-25 16:12:48 UTC
Removing the Triaged keyword because:

* the QE automation assessment (flag qe_test_coverage) is missing

Comment 12 errata-xmlrpc 2021-12-01 13:35:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.38 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4802