Bug 1952211
Summary: | cascading mounts happening exponentially on when deleting openstack-cinder-csi-driver-node pods | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Anshul Verma <ansverma> | |
Component: | Storage | Assignee: | Mike Fedosin <mfedosin> | |
Storage sub component: | OpenStack CSI Drivers | QA Contact: | rlobillo | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | medium | CC: | adduarte, aos-bugs, mbagga, mfedosin, palonsor, pprinett, tkimura, wking | |
Version: | 4.7 | Keywords: | Triaged | |
Target Milestone: | --- | |||
Target Release: | 4.8.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: The folder /var/lib/kubelet was mounted twice in Cinder CSI Node Controller container.
Consequence: When running Cinder CSI Node Controller, it doesn't start and throws an error about not being able to mount /var/lib/kubelet/pods because no more space is left.
Fix: Removes duplicate mount of /var/lib/kubelet and /var/lib/kubelet/pods which results in an error.
Result: The driver always runs successfully.
|
Story Points: | --- | |
Clone Of: | ||||
: | 2025444 2026197 (view as bug list) | Environment: | ||
Last Closed: | 2021-07-27 23:02:36 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2016286 |
Description
Anshul Verma
2021-04-21 18:46:04 UTC
As I see we just need to backport the upstream fix, right? Let's do it then. (In reply to Mike Fedosin from comment #1) > As I see we just need to backport the upstream fix, right? Let's do it then. Yes, it seems so. Please keep me apprised on the progress. Along with that, please do check on the fact the few mounts were still present after restarting the pod with those hostpath volume and mount block were removed - ~~ When this hostpath mount Block is removed from the Daemon set, the mount entries are way much lower than this - ~~~ [root@vm ~]# findmnt -D | grep '/var/lib/kubelet/pods$' | wc -l 3 ~~~ ~~ Are these expected? Verified on 4.8.0-0.nightly-2021-05-29-114625 over OSP16.1 (RHOS-16.1-RHEL-8-20210323.n.0). clusteroperator storage is fully functional after IPI installation: $ oc get clusteroperators storage NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE storage 4.8.0-0.nightly-2021-05-29-114625 True False False 46h and the manifest changes are present (there is not any volume targeting /var/lib/kubelet/pods): $ oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=ostest-snz9z-worker-0-jgqhd -o json | jq '.items[0].spec.volumes[]' { "hostPath": { "path": "/var/lib/kubelet/plugins/cinder.csi.openstack.org", "type": "DirectoryOrCreate" }, "name": "socket-dir" } { "hostPath": { "path": "/var/lib/kubelet/plugins_registry/", "type": "Directory" }, "name": "registration-dir" } { "hostPath": { "path": "/var/lib/kubelet", "type": "Directory" }, "name": "kubelet-dir" } { "hostPath": { "path": "/dev", "type": "Directory" }, "name": "pods-probe-dir" } { "name": "secret-cinderplugin", "secret": { "defaultMode": 420, "items": [ { "key": "clouds.yaml", "path": "clouds.yaml" } ], "secretName": "openstack-cloud-credentials" } } { "configMap": { "defaultMode": 420, "items": [ { "key": "cloud.conf", "path": "cloud.conf" } ], "name": "openstack-cinder-config" }, "name": "config-cinderplugin" } { "configMap": { "defaultMode": 420, "items": [ { "key": "ca-bundle.pem", "path": "ca-bundle.pem" } ], "name": "cloud-provider-config", "optional": true }, "name": "cacert" } { "name": "openstack-cinder-csi-driver-node-sa-token-2pn5x", "secret": { "defaultMode": 420, "secretName": "openstack-cinder-csi-driver-node-sa-token-2pn5x" } } Inside csi-driver pod, as expected, there is no partition on /var/lib/kubelet/pods and it is on /var/lib/kubelet: $ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=ostest-snz9z-worker-0-jgqhd -o NAME) Defaulted container "csi-driver" out of: csi-driver, node-driver-registrar sh-4.4# findmnt -D | grep '/var/lib/kubelet/pods$' sh-4.4# findmnt -D | grep '/var/lib/kubelet$' /dev/vda4[/ostree/deploy/rhcos/var/lib/kubelet] xfs 39.5G 8.9G 30.6G 23% /var/lib/kubelet sh-4.4# After restarting the pod 100 times, the system remains stable: $ for i in {1..100}; do echo $i; oc delete -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=ostest-snz9z-worker-0-jgqhd -o name); done ... [stack@undercloud-0 ~]$ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l app=openstack-cinder-csi-driver-node --field-selector spec.nodeName=ostest-snz9z-worker-0-jgqhd -o NAME) sh-4.4# findmnt -D | grep '/var/lib/kubelet$' /dev/vda4[/ostree/deploy/rhcos/var/lib/kubelet] xfs 39.5G 8.8G 30.7G 22% /var/lib/kubelet [stack@undercloud-0 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-29-114625 True False 47h Cluster version is 4.8.0-0.nightly-2021-05-29-114625 [stack@undercloud-0 ~]$ oc get clusteroperator storage NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE storage 4.8.0-0.nightly-2021-05-29-114625 True False False 2d Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 I see that there is a NEEDINFO open on this bug, however it seems that in the meantime this bug has been closed as fixed. If the solution did not work or if additional information is required, please ask again down below here. Otherwise, the team considers this to be fixed and closed. |