Description of problem: This is about the Upstream issue - https://github.com/kubernetes/cloud-provider-openstack/issues/772 When the CSI driver pod is created through its DaemonSet, it contains the following mounts - ~~~ volumeMounts: - mountPath: /var/lib/kubelet/pods mountPropagation: Bidirectional name: pods-mount-dir - mountPath: /var/lib/kubelet mountPropagation: Bidirectional name: kubelet-dir volumes: - hostPath: path: /var/lib/kubelet type: Directory name: kubelet-dir - hostPath: path: /var/lib/kubelet/pods type: Directory name: pods-mount-dir ~~~ When a `openstack-cinder-csi-driver-node` pod created through this DaemonSet is deleted multiple times, the mount entries for `/var/lib/kubelet/pods` get on increasing exponentially with every restart of the pod. ~~~ [root@vm ~]# findmnt -D | grep '/var/lib/kubelet/pods$' | wc -l 127 ~~~ When this number exceeds 255, the following error is seen - ~~~ Warning Failed 9s kubelet, master2 Error: container create failed: time="2021-04-07T09:41:40Z" level=warning msg="unable to terminate initProcess" error="exit status 1" time="2021-04-07T09:41:41Z" level=error msg="container_linux.go:366: starting container process caused: process_linux.go:472: container init caused: rootfs_linux.go:60: mounting \"/var/lib/kubelet/pods\" to rootfs at \"/var/lib/containers/storage/overlay/bed6a1f4bd7769f025ce7179358d7ad61cf1af681e4b9ba65ca07e0048584e45/merged/var/lib/kubelet/pods\" caused: no space left on device" ~~~ When this hostpath mount Block is removed from the Daemon set, the mount entries are way much lower than this - ~~~ [root@vm ~]# findmnt -D | grep '/var/lib/kubelet/pods$' | wc -l 3 ~~~ There is a PullRequest created which just removed this Volume and Mount block from the DaemonSet - https://github.com/kubernetes/cloud-provider-openstack/pull/773 Although this is kubernetes but this should be fixed in OpenShift's side as well in - https://github.com/openshift/cloud-provider-openstack Let me know if anything else is required
As I see we just need to backport the upstream fix, right? Let's do it then.
(In reply to Mike Fedosin from comment #1) > As I see we just need to backport the upstream fix, right? Let's do it then. Yes, it seems so. Please keep me apprised on the progress. Along with that, please do check on the fact the few mounts were still present after restarting the pod with those hostpath volume and mount block were removed - ~~ When this hostpath mount Block is removed from the Daemon set, the mount entries are way much lower than this - ~~~ [root@vm ~]# findmnt -D | grep '/var/lib/kubelet/pods$' | wc -l 3 ~~~ ~~ Are these expected?
Verified on 4.8.0-0.nightly-2021-05-29-114625 over OSP16.1 (RHOS-16.1-RHEL-8-20210323.n.0). clusteroperator storage is fully functional after IPI installation: $ oc get clusteroperators storage NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE storage 4.8.0-0.nightly-2021-05-29-114625 True False False 46h and the manifest changes are present (there is not any volume targeting /var/lib/kubelet/pods): $ oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=ostest-snz9z-worker-0-jgqhd -o json | jq '.items[0].spec.volumes[]' { "hostPath": { "path": "/var/lib/kubelet/plugins/cinder.csi.openstack.org", "type": "DirectoryOrCreate" }, "name": "socket-dir" } { "hostPath": { "path": "/var/lib/kubelet/plugins_registry/", "type": "Directory" }, "name": "registration-dir" } { "hostPath": { "path": "/var/lib/kubelet", "type": "Directory" }, "name": "kubelet-dir" } { "hostPath": { "path": "/dev", "type": "Directory" }, "name": "pods-probe-dir" } { "name": "secret-cinderplugin", "secret": { "defaultMode": 420, "items": [ { "key": "clouds.yaml", "path": "clouds.yaml" } ], "secretName": "openstack-cloud-credentials" } } { "configMap": { "defaultMode": 420, "items": [ { "key": "cloud.conf", "path": "cloud.conf" } ], "name": "openstack-cinder-config" }, "name": "config-cinderplugin" } { "configMap": { "defaultMode": 420, "items": [ { "key": "ca-bundle.pem", "path": "ca-bundle.pem" } ], "name": "cloud-provider-config", "optional": true }, "name": "cacert" } { "name": "openstack-cinder-csi-driver-node-sa-token-2pn5x", "secret": { "defaultMode": 420, "secretName": "openstack-cinder-csi-driver-node-sa-token-2pn5x" } } Inside csi-driver pod, as expected, there is no partition on /var/lib/kubelet/pods and it is on /var/lib/kubelet: $ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=ostest-snz9z-worker-0-jgqhd -o NAME) Defaulted container "csi-driver" out of: csi-driver, node-driver-registrar sh-4.4# findmnt -D | grep '/var/lib/kubelet/pods$' sh-4.4# findmnt -D | grep '/var/lib/kubelet$' /dev/vda4[/ostree/deploy/rhcos/var/lib/kubelet] xfs 39.5G 8.9G 30.6G 23% /var/lib/kubelet sh-4.4# After restarting the pod 100 times, the system remains stable: $ for i in {1..100}; do echo $i; oc delete -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=ostest-snz9z-worker-0-jgqhd -o name); done ... [stack@undercloud-0 ~]$ ​oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=ostest-snz9z-worker-0-jgqhd -o NAME) sh-4.4# findmnt -D | grep '/var/lib/kubelet$' /dev/vda4[/ostree/deploy/rhcos/var/lib/kubelet] xfs 39.5G 8.8G 30.7G 22% /var/lib/kubelet [stack@undercloud-0 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-29-114625 True False 47h Cluster version is 4.8.0-0.nightly-2021-05-29-114625 [stack@undercloud-0 ~]$ oc get clusteroperator storage NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE storage 4.8.0-0.nightly-2021-05-29-114625 True False False 2d
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438
I see that there is a NEEDINFO open on this bug, however it seems that in the meantime this bug has been closed as fixed. If the solution did not work or if additional information is required, please ask again down below here. Otherwise, the team considers this to be fixed and closed.