Bug 1877812
Summary: | [ROKS] IBM cloud failed to terminate OSDs when upgraded between internal builds of OCS 4.5 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Petr Balogh <pbalogh> | ||||||
Component: | Storage | Assignee: | Jan Safranek <jsafrane> | ||||||
Storage sub component: | Storage | QA Contact: | Qin Ping <piqin> | ||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||
Severity: | high | ||||||||
Priority: | high | CC: | akgunjal, aos-bugs, cewong, jritter, jsafrane, madam, mumutyal, ocs-bugs, owasserm, rgeorge, rphillips, sabose, shan, tnielsen | ||||||
Version: | 4.5 | ||||||||
Target Milestone: | --- | ||||||||
Target Release: | 4.7.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1898811 (view as bug list) | Environment: | |||||||
Last Closed: | 2021-02-24 15:17:43 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1892623, 1894836 | ||||||||
Attachments: |
|
Description
Petr Balogh
2020-09-10 13:54:19 UTC
moving to 4.5.z as discussed with Petr. The operator log shows it isn't able to start the OSDs, which is aligned with what has already been observed: 2020-09-10T04:49:13.565882563-05:00 2020-09-10 09:49:13.565762 E | op-osd: failed to update osd deployment 1. gave up waiting for deployment "rook-ceph-osd-1" to update 2020-09-10T04:49:13.565882563-05:00 2020-09-10 09:49:13.565827 I | op-osd: started deployment for osd 1 on pvc 2020-09-10T04:49:13.772234815-05:00 2020-09-10 09:49:13.772174 I | op-osd: osd orchestration status for node ocs-deviceset-2-data-0-xwndp is completed 2020-09-10T04:49:13.772234815-05:00 2020-09-10 09:49:13.772190 I | op-osd: starting 1 osd daemons on pvc ocs-deviceset-2-data-0-xwndp 2020-09-10T04:49:13.772234815-05:00 2020-09-10 09:49:13.772217 I | op-osd: OSD will have its main bluestore block on "ocs-deviceset-2-data-0-xwndp" 2020-09-10T04:49:14.10411389-05:00 2020-09-10 09:49:14.104057 I | cephclient: getting or creating ceph auth key "osd.2" 2020-09-10T04:49:15.912533229-05:00 2020-09-10 09:49:15.912490 I | op-osd: deployment for osd 2 already exists. updating if needed 2020-09-10T04:49:17.024112085-05:00 2020-09-10 09:49:17.024065 I | op-mon: checking if we can stop the deployment rook-ceph-osd-2 2020-09-10T04:49:18.907612229-05:00 2020-09-10 09:49:18.907557 I | op-mon: The osd daemon 2 is not ok-to-stop but 'continueUpgradeAfterChecksEvenIfNotHealthy' is true, so proceeding to stop... ... 2020-09-10T05:14:10.642797258-05:00 2020-09-10 10:14:10.642743 E | op-osd: failed to update osd deployment 2. gave up waiting for deployment "rook-ceph-osd-2" to update 2020-09-10T05:14:10.642797258-05:00 2020-09-10 10:14:10.642774 I | op-osd: started deployment for osd 2 on pvc 2020-09-10T05:14:10.88199413-05:00 2020-09-10 10:14:10.881907 I | op-osd: start provisioning the osds on nodes, if needed 2020-09-10T05:14:10.88199413-05:00 2020-09-10 10:14:10.881931 W | op-osd: skipping osd provisioning where no dataDirHostPath is set 2020-09-10T05:14:11.857362864-05:00 2020-09-10 10:14:11.857275 I | op-osd: completed running osds in namespace openshift-storage The must-gather doesn't show anything interesting in the osd pod logs. In fact, the OSD daemons aren't showing logs for the terminating pods. The must-gather only shows logs for the OSD pods that are in pending state that are trying to start after the upgrade, but they are stuck waiting for the previous OSD to terminate. We really need to see what is blocking the OSDs from terminating. @Petr 1. Do you have a connection to the cluster that is in this state? Or can you get any logs for the OSDs that are in terminating state since the must-gather didn't seem to collect them? 2. If you force delete an OSD pod, does it finally terminate and start the new pod? If that's the case, the volumes underneath the OSDs must not be releasing. IBM would need to look into that. The OSD deployments have been updated, but the OSD pods are stuck terminating, so the upgrade is blocked. They are refusing to terminate, even with an "oc delete <pod>", which will force delete them. These are the OSDs that are stuck terminating: rook-ceph-osd-0-5648b79f69-76b8p 0/1 Terminating 0 46d rook-ceph-osd-1-5fb88fbfdc-q9csg 0/1 Terminating 0 46d rook-ceph-osd-2-7cfb859c79-fp26l 0/1 Terminating 0 46d These are the PVs on which the OSDs are running: ocs-deviceset-0-data-0-cmw5v Bound local-pv-b5d93336 1787Gi RWO localblock 46d ocs-deviceset-1-data-0-jbq5j Bound local-pv-f91eefa8 1787Gi RWO localblock 46d ocs-deviceset-2-data-0-xwndp Bound local-pv-188f0347 1787Gi RWO localblock 46d "oc describe" on an osd pod shows that the container has exited, although there are no events on the pod. State: Terminated Reason: Completed Exit Code: 0 Started: Thu, 06 Aug 2020 10:02:08 +0000 Finished: Thu, 06 Aug 2020 10:02:08 +0000 I would expect this means that the volumes underneath the OSDs are not being unmounted, and are holding the pod hostage. Rook has deployed the OSDs in "raw" mode, meaning that lvm would not need to free the devices. @Petr, can someone from IBM investigate why their devices are not being freed? Hey Travis, I've asked Murali from IBM Cloud team to take a look. Tried to add his email: mumutyal.com to need info, but it's failing. So pinged Murali on slack again. Murali, can you please take a look? Thanks We did not use the volumes and we don't have any data on the volumes. The occurrence of this case is not frequent . Some times during the OCS un-installtion also OSD pods struck . We still ahve the cluster in same stage . Please let me know if you need any logs Hey @Travis, I have been seeing this issue consistently now in all clusters of IBM Cloud and its present on both Classic and VPC Gen2. I see the detach of volumes used by OSD pods are failing. I am pasting a snippet below from kubelet log for reference. ``` Oct 22 06:34:33 kube-brc9edsd0jj7h6bgvdqg-ocsbm-default-00000378 kubelet.service: I1022 06:34:33.074051 15601 mount_linux.go:178] Detected OS with systemd Oct 22 06:34:33 kube-brc9edsd0jj7h6bgvdqg-ocsbm-default-00000378 kubelet.service: E1022 06:34:33.078811 15601 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/local-volume/local-pv-fbb41034 podName:439cf04f-4144-411e-a398-bef87c93202c nodeName:}" failed. No retries permitted until 2020-10-22 06:34:33.578774714 -0500 CDT m=+51435.194625517 (durationBeforeRetry 500ms). Error: "UnmapVolume.UnmapBlockVolume failed for volume \"ocs-deviceset-2-data-0-4bt75\" (UniqueName: \"kubernetes.io/local-volume/local-pv-fbb41034\") pod \"439cf04f-4144-411e-a398-bef87c93202c\" (UID: \"439cf04f-4144-411e-a398-bef87c93202c\") : blkUtil.DetachFileDevice failed. globalUnmapPath:/var/lib/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-fbb41034, podUID: 439cf04f-4144-411e-a398-bef87c93202c, bindMount: true: failed to unmount linkPath /var/lib/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-fbb41034/439cf04f-4144-411e-a398-bef87c93202c: unmount failed: exit status 32\nUnmounting arguments: /var/lib/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-fbb41034/439cf04f-4144-411e-a398-bef87c93202c\nOutput: umount: /var/data/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-fbb41034/439cf04f-4144-411e-a398-bef87c93202c: target is busy.\n (In some cases useful info about processes that use\n the device is found by lsof(8) or fuser(1))\n" ``` What storage class is being used for the local pvs, and what is the driver? It seems the driver is failing the unmount the volume because it is busy. We haven't seen this issue with other environments such as AWS/EDS, vmware, etc, so an understanding of the underlying device driver is needed. @Travis, I have seen this same issue on both classic cluster and VPC cluster. Classic cluster uses local volumes as its a bare metal server and VPC cluster uses remote block storage. Storage class used for local volumes in Classic cluster: ``` Name: localblock IsDefaultClass: No Annotations: <none> Provisioner: kubernetes.io/no-provisioner Parameters: <none> AllowVolumeExpansion: <unset> MountOptions: <none> ReclaimPolicy: Delete VolumeBindingMode: WaitForFirstConsumer Events: <none> ``` Storage class used for remote block volumes in VPC cluster: ``` Name: custom-10iops-tier IsDefaultClass: No Annotations: armada-service=addon-vpc-block-csi-driver,kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"armada-service":"addon-vpc-block-csi-driver","version":"2.0.3_385"},"creationTimestamp":"2020-10-20T08:33:36Z","labels":{"addonmanager.kubernetes.io/mode":"Reconcile","app":"ibm-vpc-block-csi-driver","razee/force-apply":"true"},"name":"custom-10iops-tier","resourceVersion":"6412","selfLink":"/apis/storage.k8s.io/v1/storageclasses/ibmc-vpc-block-10iops-tier","uid":"b2ed23fb-86fc-44fa-a64e-8d135462df3c"},"parameters":{"billingType":"hourly","classVersion":"1","csi.storage.k8s.io/fstype":"ext4","encrypted":"false","encryptionKey":"","profile":"10iops-tier","region":"","resourceGroup":"","sizeRange":"[10-2000]GiB","tags":"","zone":""},"provisioner":"vpc.block.csi.ibm.io","reclaimPolicy":"Delete","volumeBindingMode":"WaitForFirstConsumer"} ,version=2.0.3_385 Provisioner: vpc.block.csi.ibm.io Parameters: billingType=hourly,classVersion=1,csi.storage.k8s.io/fstype=ext4,encrypted=false,encryptionKey=,profile=10iops-tier,region=,resourceGroup=,sizeRange=[10-2000]GiB,tags=,zone= AllowVolumeExpansion: <unset> MountOptions: <none> ReclaimPolicy: Delete VolumeBindingMode: WaitForFirstConsumer Events: <none> ``` The same issue is present with different storage classes, and there is no issue with unmount of other volumes created with those classes. Is this issue seen in clusters that are fresh installed from OCS 4.5 or newer? Or only when upgrading from OCS 4.4? There was an important change in OCS 4.5 where we stopped using LVM to create OSDs. This was causing a number of issues similar to this where LVM was still holding on to volumes. If the cluster is upgraded from 4.4, we will continue to use LVM in the OSDs. So if you don't hit this issue installed fresh from OCS 4.5 clusters, it is very likely an issue with LVM. @Travis, We have tested the below scenarios and all of them have this issue. Scenario 1: Install fresh OCS 4.4 on OCP 4.5 cluster. Then upgrade OCS to 4.5 version by updating the subscription. Scenario 2: Install fresh OCS 4.4 on OCP 4.4 cluster. Then upgrade OCS to 4.5 version by updating the subscription. So the OCS upgrade fails with same issue on OCP 4.4 and 4.5 clusters for us. Scenario 3: Install fresh OCS 4.5 on OCP 4.5 cluster. Uninstall OCS 4.5 and OSDs remain in terminating state. So basically OCS uninstall has issues across OCS 4.4 and OCS 4.5 versions. This could be similar to something we have experienced already. When the Ceph OSD stops, it releases its lock on the block device, as a result, systemd-udevd intervenes and probes the block by taking a lock on it. Since, this seems intermittent, what you observed can be a race between systemd-udevd finishing its probing and kubelet trying to umount the block. However, the lock is only held for a few milliseconds so eventually when kubelet retries the lock should be gone and the device could be unmounted. Unfortunately, this doesn't really explain why the pod gets stuck in terminating, a bug in OCP? That's the only clue I have. (In reply to leseb from comment #14) > This could be similar to something we have experienced already. > When the Ceph OSD stops, it releases its lock on the block device, as a > result, systemd-udevd intervenes and probes the block by taking a lock on it. > Since, this seems intermittent, what you observed can be a race between > systemd-udevd finishing its probing and kubelet trying to umount the block. > > However, the lock is only held for a few milliseconds so eventually when > kubelet retries the lock should be gone and the device could be unmounted. > Unfortunately, this doesn't really explain why the pod gets stuck in > terminating, a bug in OCP? > > That's the only clue I have. The issue seems to be consistent on every upgrade and drain. And IBM ROKS does not see this with volumes from other storage classes. Could this still be an OCP issue? How do we proceed? Is it possible that this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1892623 is related to this one? Sahina, I don't know how to proceed but we must understand what resource is holding that block. Also, we need to understand why the retry is not working. Akash, can we check what resource is using the device? Created attachment 1726191 [details]
output of lsof command
@Sahina, Please find the lsblk output in addition to lsof. sh-4.2# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk |-sda1 8:1 0 256M 0 part /boot |-sda2 8:2 0 1G 0 part `-sda3 8:3 0 1.8T 0 part / sdb 8:16 0 893.8G 0 disk `-sdb1 8:17 0 893.8G 0 part `-docker_data 253:0 0 893.8G 0 crypt /var/data sdc 8:32 0 1.8T 0 disk loop0 7:0 0 1.8T 0 loop loop1 7:1 0 1.8T 0 loop sh-4.2# Ok so far, I still don't have the answer on what is accessing the block periodically even though I can see it, I just haven't found which process it is. Something strange though, the periodic task keeps complaining about a "busy" target even if the block is not mounted anymore. I still need to investigate the fs namespace route, I just didn't have the time, yesterday was a National Holiday in France and I'm busy with the Rook 1.5 release until tomorrow. So I can resume next week, "maybe" a bit before the end of the week, it depends on how the release goes... I have found similar issues: * https://github.com/kubernetes/kubernetes/issues/85585 * https://github.com/longhorn/longhorn/issues/1719 This needs more investigation. Thanks! Jan, could you provide any inputs here? This is similar to issue you commented on https://bugzilla.redhat.com/show_bug.cgi?id=1894836#c8 Ok so something runs "lsblk --all --noheadings --pairs --output KNAME,PKNAME,TYPE,MOUNTPOINT" every 10sec. I assume it's LSO. That's the only thing accessing the block device on that machine with and OPEN and CLOSE_NOWRITE,CLOSE. I'm having a hard time thinking that this could be the reason for the "busy" target on unmount but all we have so far. We can repro the error but execing inside the mount namespace of the kubelet PID: sh-4.2# nsenter -t 15619 --m -sh-4.2# mount|grep /var/data/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-b3698db3/f78ff720-d582-44d3-abc8-b18a46726ef8 devtmpfs on /var/data/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-b3698db3/f78ff720-d582-44d3-abc8-b18a46726ef8 type devtmpfs (rw,seclabel,size=32826904k,nr_inodes=8206726,mode=755) -sh-4.2# umount /var/data/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-b3698db3/f78ff720-d582-44d3-abc8-b18a46726ef8 umount: /var/data/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-b3698db3/f78ff720-d582-44d3-abc8-b18a46726ef8: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) The problem is that the loopdevice is still holding the mount: -sh-4.2# losetup -a /dev/loop3: [0005]:12315 (/var/data/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-b3698db3/f78ff720-d582-44d3-abc8-b18a46726ef8) -sh-4.2# -sh-4.2# -sh-4.2# losetup -d /dev/loop3 -sh-4.2# losetup -a -sh-4.2# -sh-4.2# -sh-4.2# -sh-4.2# umount --verbose /var/data/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-b3698db3/f78ff720-d582-44d3-abc8-b18a46726ef8 umount: /var/data/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-b3698db3/f78ff720-d582-44d3-abc8-b18a46726ef8 (devtmpfs) unmounted Et voilà! So before attempting to umount Kubernetes should release the loop device. Should we move this to OCP? Sahina? "lsblk --all" can't hold a device for too long (unless something is really broken). > We can repro the error but execing inside the mount namespace of the kubelet PID: > sh-4.2# nsenter -t 15619 --m Wait, do you run kubelet in a different namespace than the host? That's quite unusual and I am not quite sure it's supported. It was very problematic in OCP 3.x and we run kubelet without any namespaces in 4.x. > So before attempting to umount Kubernetes should release the loop device. Kubernetes releases the loop device under normal conditions. Can you reproduce it without OCS, i.e. just with LSO + a pod? (In reply to Jan Safranek from comment #27) > "lsblk --all" can't hold a device for too long (unless something is really > broken). Agreed, as pointed that earlier, it's not an issue but still something accessing the block. > > > We can repro the error but execing inside the mount namespace of the kubelet PID: > > sh-4.2# nsenter -t 15619 --m > > Wait, do you run kubelet in a different namespace than the host? That's > quite unusual and I am not quite sure it's supported. It was very > problematic in OCP 3.x and we run kubelet without any namespaces in 4.x. To be honest, I have no idea, I just debugged the setup and reported my findings. At least, the namespace does not seem to be the one from the host. The block path was mounted as devtmpfs inside my "oc debug" container, and unmounting it was doing nothing and the issue non-reproducable. Only the nsenter was reproducing what kubelet was printing in the logs. > > > So before attempting to umount Kubernetes should release the loop device. > > Kubernetes releases the loop device under normal conditions. Can you > reproduce it without OCS, i.e. just with LSO + a pod? Petr? Someone else? I think Akash tried on other application and haven't seen the issue IIRC. I haven't tried to do any further testing on that outside of OCS. I am going to be on PTO till Tuesday, so can take a look on or after Wednesday. Akash, maybe you can assist here to provide feedback? Thanks @leseb, @Petr, We have tried it in IBM Cloud VPC cluster where we dont use LSO. The volumes used by OSDs are remote block volumes created by our VPC plugin. So this does not seem to be LSO related to me. I had already indicated earlier that I see loop0 kind of devices on worker when I do lsblk which is holding lock. I have already tried to create a different PVC from VPC volumes and able to unmount them easily. Its getting stuck only for OSD and its not using LSO. @rphillips this looks like a kubelet issue, is it something you can help with? Created OCP bug 1898811. Ryan, please take a look > Its getting stuck only for OSD and its not using LSO. I'm glad to hear that. Trying to sum up the previous findings and bug #1894836: we have a block volume that's used by something and we still don't know what it is? > Oct 22 06:34:33 kube-brc9edsd0jj7h6bgvdqg-ocsbm-default-00000378 kubelet.service: E1022 06:34:33.078811 15601 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/local-volume/local-pv-fbb41034 podName:439cf04f-4144-411e-a398-bef87c93202c nodeName:}" failed. No retries permitted until 2020-10-22 06:34:33.578774714 -0500 CDT m=+51435.194625517 (durationBeforeRetry 500ms). Error: "UnmapVolume.UnmapBlockVolume failed for volume \"ocs-deviceset-2-data-0-4bt75\" (UniqueName: \"kubernetes.io/local-volume/local-pv-fbb41034\") pod \"439cf04f-4144-411e-a398-bef87c93202c\" (UID: \"439cf04f-4144-411e-a398-bef87c93202c\") : blkUtil.DetachFileDevice failed. globalUnmapPath:/var/lib/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-fbb41034, podUID: 439cf04f-4144-411e-a398-bef87c93202c, bindMount: true: failed to unmount linkPath /var/lib/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-fbb41034/439cf04f-4144-411e-a398-bef87c93202c: unmount failed: exit status 32\nUnmounting arguments: /var/lib/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-fbb41034/439cf04f-4144-411e-a398-bef87c93202c\nOutput: umount: /var/data/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-fbb41034/439cf04f-4144-411e-a398-bef87c93202c: target is busy.\n (In some cases useful info about processes that use\n the device is found by lsof(8) or fuser(1))\n" Once this volume is unmounted, does kubelet remove /dev/loop0? The best would probably be if I could ssh into such a problematic node when it happens again and look at it from OCP perspective. *** Bug 1898811 has been marked as a duplicate of this bug. *** The RedHat team has the cluster where we get the issue and we have kept it for debugging. Good news, there is a bug in OCP! I debugged cluster with API server at https://c115-e.us-south.containers.cloud.ibm.com On node 10.240.129.31 I found three volumes that are reported as "busy" in recent kubelet logs: $ journalctl -r | grep busy | head -n 100 | grep -o "pvc-[0-9a-f-]*" | sort | uniq pvc-171db671-0f81-4279-8eba-03e09ef51b6d pvc-32704cd6-a242-494c-8a47-7e6c213f413f pvc-783f4fd2-27bb-4d75-a539-855cbaf59636 Picking pvc-32704cd6-a242-494c-8a47-7e6c213f413f and its usage by pod 2ffdeda3-8aed-4bee-9793-9719878c5677: it was started at Nov 25 10:15:44, volumes were set up at Nov 25 10:16:24. The pod was crashlooping (did not dig into details) and it was deleted at Nov 25 10:17:34: Nov 25 10:17:34 kube-bufu962d064ogc6i6s8g-ocsnoobaa-default-00000459 kubelet.service[100500]: I1125 10:17:34.317948 100500 kubelet.go:1942] SyncLoop (DELETE, "api"): "rook-ceph-osd-prepare-ocs-deviceset-1-0-947zv-jtwjt_openshift-storage(2ffdeda3-8aed-4bee-9793-9719878c5677)" Tearing down of the PVC mounts/loop devices failed: Nov 25 10:17:34 kube-bufu962d064ogc6i6s8g-ocsnoobaa-default-00000459 kubelet.service[100500]: W1125 10:17:34.467669 100500 volume_path_handler_linux.go:61] couldn't find loopback device which takes file descriptor lock. Skip detaching device. device path: "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-32704cd6-a242-494c-8a47-7e6c213f413f/dev/2ffdeda3-8aed-4bee-9793-9719878c5677" This means kubelet can't remove the loop device and therefore various bind-mounts that kubelet does can't be unmounted. Looking into code, "couldn't find loopback device" most probably means that kubelet can't parse output of "losetup -j /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-32704cd6-a242-494c-8a47-7e6c213f413f/dev/2ffdeda3-8aed-4bee-9793-9719878c5677" (if losetup failed, we would see some "exit code 1" in the log, but we don't see it...) "losetup -j <the device>" shows: /dev/loop5: [0005]:193908341 (/var/data/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-32704cd6-a242-494c-8a47-7e6c213f413f/dev/2ffdeda3-8aed-4bee-9793-9719878c5677 (deleted)) And here is the bug: kubelet looks for "/var/lib/kubelet/plugins/...", where losetup shows "/var/data/kubelet/...", because /var/lib/kubelet is a symlink to /var/data/kubelet. There were numerous issues with /var/lib/kubelet being a symlink (or there is symlink somewhere in the path), we gradually fix all of them. As seen in this example, new bugs are popping up. It is *much* safer not to have symlinks anywhere in /var/lib/kubelet. Use mounts if you really want it on a different device. I can fix OCP, it will take some time. As workaround, please try to use (bind?) mounts instead of symlinks. As cleanup, remove the loopback devices of deleted pods. As side note, using a block pvc + the most stupid pod with "sleep 9999" that uses the PVC, I was able to reproduce it without OCS. Upstream issue: https://github.com/kubernetes/kubernetes/issues/96888 Upstream PR: https://github.com/kubernetes/kubernetes/pull/96889 (upstream is frozen, it will take some time to merge) Waiting for upstream to un-freeze... *** Bug 1894836 has been marked as a duplicate of this bug. *** Hi, I'm currently troubleshooting an extremely similar issue, but not on IBM Cloud but on top of the vSphere CSI Driver (Version 2.0.1). The symptoms are: - Ceph OSD Pods remain in Terminating state - Reproducable with a dummy pod and an attached Block PVC (filesystem PVCs seem to work without issues) - Pod terminates after unmounting the devtmpfs mount in /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/ on the host - Kubelet tells me "couldn't find loopback device which takes file descriptor lock. Skip detaching device. device path: "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-2dc62ba2-96ed-4bab-a412-b04c34f031c5/dev/3a981daa-d7a7-4369-a0ae-e223d074b3d9" - vsphere-csi-node pod tells me "/csi.v1.Node/NodeUnpublishVolume: REP 0477: rpc error: code = Internal desc = error getting block device for volume: 351c47f9-6b5d-42fb-ac77-50dea17a1333, err: lstat devtmpfs: no such file or directory" I only see this when using block PVCs on the vsphere csi storage class, when using the ocs-storagecluster-ceph-rbd or thin (vsphere intree) storage classes, block pvcs work. /var/lib/kubelet is NOT a symlink in my setup (standard OCP). Now to the bedrock of my question: Above you mention that this is a bug in OCP. Is it possible at all that I also hit this bug with the vsphere csi driver? I am currently running OCP 4.5.19 and OCS 4.5.1 and CSI Driver 2.0.1 I'll attach a file where I documented my steps during troubleshooting. --Juerg Created attachment 1740041 [details]
Block PVC detach testcases
This files includes the procedure I followed for reproducting the OCS problem I encountered while using block PVCs on new CSI storage class.
Juerg, comment #41 looks like a different bug, unless there is a symlink anywhere in /var/lib/kubelet path. My gut feeling would be that the vSphere driver does not implement idempotency correctly - NodeUnpublish on unmounted/missing volume should return success and it's a common error in CSI drivers. But it could be a bug in OCP too. Found this upstream: https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/338 Jan, thanks for your feedback and the GitHub issue. I added a comment there, describing what I am observing. Jan, comment #41 turned out to be related to a RHCOS incopatibility in NodeUnpublishVolume of the vsphere CSI driver. I came up with a simple fix and created a PR: https://github.com/kubernetes-sigs/vsphere-csi-driver/pull/572 Verified with: 4.7.0-0.nightly-2021-01-17-211555 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |