Bug 2216964 - VMs IO Errors - Unable to attach or mount volumes.
Summary: VMs IO Errors - Unable to attach or mount volumes.
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: csi-driver
Version: 4.12
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Niels de Vos
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-23 12:21 UTC by Boaz
Modified: 2024-06-08 04:25 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-02-08 09:31:53 UTC
Embargoed:


Attachments (Terms of Use)

Description Boaz 2023-06-23 12:21:23 UTC
I'm running a scale regression setup on :
=========================================
OpenShift 4.13.2
OpenShift Virtualization 4.13.1
OpenShift Container Storage - 4.12.4-rhodf

this is a large-scale setup with 132 nodes running 6000 RHEL VMs on an external RHCS.
after the cluster was idle for a few days, I returned to find 96 VMs at Paused state due to IO errors,(Message:  VMI was paused, IO error)
and the csi-addons-controller-manager was crashing in a loop due to memory usage:

============================================================================================================================================
NAME                                               READY   STATUS      
csi-addons-controller-manager-58f765c9ff-2pzb4     1/2     OOMKilled  
============================================================================================================================================

I increased the memory limit to 1GiB , using "oc edit csv odf-csi-addons-operator.v4.12.4-rhodf -n openshift-storage"
which recovered the csi-addons-controller-manager, when it got back up it used ~ 650MiB of VmRSS.

now with the csi-addons-controller-manager up, I was looking at the Paused VMs -  Isolating the nodes which had Paused VM's hosted on them:

============================================================================================================================================

[root@d16-h06-000-r650 ~]# oc get vm -o wide |grep Paused|xargs -l bash -c 'oc get vmi $0'|grep worker|awk '{print $4}'|uniq
worker010
worker084
worker094
worker005
worker152
worker115
worker090
worker003
worker070
worker090
worker019
============================================================================================================================================


looking at dmesg on worker052 which had most of the VMs running on:
============================================================================================================================================

[Thu Jun 22 14:30:29 2023] rbd: rbd45: write result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd45: write result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd9: write result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd9: write result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd9: write result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd9: write result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd42: write at objno 2977 4128768~16384 result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd42: write at objno 3086 393216~16384 result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd0: failed to lock header: -108
[Thu Jun 22 14:30:29 2023] rbd: rbd42: write result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd45: write at objno 1018 3948544~16384 result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd39: write at objno 4304 3297280~237568 result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd42: write at objno 2944 16384~4096 result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd42: write at objno 1015 2244608~16384 result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd42: write result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd9: write at objno 4031 3149824~8192 result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd45: write result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd42: write result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd9: write at objno 3079 360448~16384 result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd9: write result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd45: write at objno 2944 1024~512 result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd0: write result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd9: write at objno 2944 16384~4096 result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd9: write at objno 768 2101248~8192 result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd9: write result -108
[Thu Jun 22 14:30:29 2023] rbd: rbd45: write at objno 922 2019328~4096 result -108
============================================================================================================================================

the CEPH cluster didn't experience any issues and had good health:

============================================================================================================================================
[ceph: root@d28-h01-000-r750 /]# ceph health detail
HEALTH_OK
============================================================================================================================================

my next step was to verify that the VMs didn't get corrupted, I achieved that by exporting the VM RBD image from a few VMs and using xfs_repair, this is for "rhel92-vm5403":

============================================================================================================================================
rbd export --export-format 1 ocp_pool/csi-vol-08720968-ab38-4ff0-bffb-b6104f10aa2e /mnt/my_mount_point/csi-vol-08720968-ab38-4ff0-bffb-b6104f10aa2e_format1.export
using qemu nbd to mount the image:
qemu-nbd --connect=/dev/nbd2 -f raw csi-vol-08720968-ab38-4ff0-bffb-b6104f10aa2e_format1.export
we can get the info about the vm here:
xfs_info /dev/nbd2p
we can test that the image is not corrupted using:
xfs_repair -n /dev/nbd2p1
example output:
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
sb_fdblocks 184902, counted 194911
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 2
        - agno = 1
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
============================================================================================================================================

I also verified that I can create new RBD images:

============================================================================================================================================
csi-vol-ff6e4494-14ff-49f6-9795-ea99b8f25ef6
csi-vol-ff86834a-c950-4f21-aa5e-023c7b5c6221
csi-vol-ff87e622-fe6c-41dc-aea7-84581756e43c
csi-vol-ff8914a2-9749-46c2-b409-66518a0c1784
csi-vol-ff8a385c-9400-448c-9d3e-6881ffa6e908
csi-vol-ff92a77d-dcc3-4819-a188-058352114ead
csi-vol-ffac9f5d-6ac3-4839-9fec-3f13069a6e4f
csi-vol-ffb4b71a-df89-47d9-901e-4434f532c1d0
csi-vol-ffd35b1f-a2e5-4532-9127-b1f3b3799e51
csi-vol-ffd78cb8-6a4b-4f30-8181-7de112c5b5c9
csi-vol-ffe74211-336e-4efc-a527-36d3e425c6c5
csi-vol-ffe7ff55-ceab-4311-b70d-2cbe7c8749a1
csi-vol-ffedd0c8-e2f8-44fa-a131-5349e78d70cf
csi-vol-fff85a28-8465-486e-a6fe-9b0f153e6a44
csi-vol-fffea38e-6f41-4e63-9b1c-33132a78439a
test <===== image I created.
============================================================================================================================================

looking at the VM itself it looks like we are unable to mount volumes, it's the same issue for all Paused VMs:

============================================================================================================================================
mount volumes: unmounted volumes=[rhel92-vm5403-root-disk], unattached volumes=[hotplug-disks private public rhel92-vm5403-root-disk ephemeral-disks container-disks libvirt-runtime sockets]: timed out waiting for the condition
  Warning  FailedMount      14m                  kubelet            Unable to attach or mount volumes: unmounted volumes=[rhel92-vm5403-root-disk], unattached volumes=[rhel92-vm5403-root-disk private public ephemeral-disks container-disks libvirt-runtime sockets hotplug-disks]: timed out waiting for the condition
  Warning  FailedMount      12m                  kubelet            Unable to attach or mount volumes: unmounted volumes=[rhel92-vm5403-root-disk], unattached volumes=[container-disks libvirt-runtime rhel92-vm5403-root-disk sockets hotplug-disks private public ephemeral-disks]: timed out waiting for the condition
  Warning  FailedMount      10m                  kubelet            Unable to attach or mount volumes: unmounted volumes=[rhel92-vm5403-root-disk], unattached volumes=[libvirt-runtime sockets hotplug-disks private public rhel92-vm5403-root-disk ephemeral-disks container-disks]: timed out waiting for the condition
  Warning  FailedMount      8m8s                 kubelet            Unable to attach or mount volumes: unmounted volumes=[rhel92-vm5403-root-disk], unattached volumes=[libvirt-runtime rhel92-vm5403-root-disk sockets hotplug-disks private public ephemeral-disks container-disks]: timed out waiting for the condition
  Warning  FailedMount      4m1s (x2 over 6m5s)  kubelet            Unable to attach or mount volumes: unmounted volumes=[rhel92-vm5403-root-disk], unattached volumes=[hotplug-disks private public ephemeral-disks rhel92-vm5403-root-disk container-disks libvirt-runtime sockets]: timed out waiting for the condition
  Warning  FailedMapVolume  2m (x18 over 22m)    kubelet            MapVolume.SetUpDevice failed for volume "pvc-a3ac35c2-931d-4cf7-ab91-c0487c632dbe" : rpc error: code = Internal desc = rbd: map failed with error an error (exit status 108) occurred while running rbd args: [--id csi-rbd-node -m 192.168.108.33:6789,192.168.108.153:6789,192.168.246.96:6789,192.168.246.20:6789,192.168.244.240:6789 --keyfile=**stripped** map ocp_pool/csi-vol-08720968-ab38-4ff0-bffb-b6104f10aa2e --device-type krbd --options noudev], rbd error output: rbd: sysfs write failed
rbd: map failed: (108) Cannot send after transport endpoint shutdown
  Warning  FailedMount  118s  kubelet  Unable to attach or mount volumes: unmounted volumes=[rhel92-vm5403-root-disk], unattached volumes=[ephemeral-disks container-disks libvirt-runtime sockets hotplug-disks rhel92-vm5403-root-disk private public]: timed out waiting for the condition
===============================================================================================================================================

I verified that all the above RHCS Interfaces IPs are fully accessible from each of the nodes:

===============================================================================================================================================

192.168.108.33
192.168.108.153
192.168.246.96
192.168.246.20
192.168.244.240
===============================================================================================================================================

the above issue will be replicated for almost any VM I stop and then start again.
by verifying the eternal RHCS is not the cause of the issues, and that VMs didn't get corrupted I assume that ODF is the culprit however I couldn't figure out why we are unable to mount the volumes.


the below logs were collected after I cordoned & drained worker052, and was able to recover the VMs there were running on it to become operational again
but here are a few VMs that are still Paused


[root@d16-h06-000-r650 ~]# oc get vm --no-headers |grep Paused|xargs -l bash -c 'oc get vmi --no-headers $0'
rhel92-vm0165   11d   Running         worker010   False
rhel92-vm0416   11d   Running         worker084   False
rhel92-vm1035   11d   Running         worker094   False
rhel92-vm2180   11d   Running         worker005   False
rhel92-vm2190   11d   Running         worker115   False
rhel92-vm2223   11d   Running         worker090   False
rhel92-vm2245   11d   Running         worker003   False
rhel92-vm2588   11d   Running         worker070   False
rhel92-vm2859   11d   Running         worker090   False
rhel92-vm3223   11d   Running         worker019   False
rhel92-vm3458   11d   Running         worker019   False

===============================================================================================================================================

note this log is 60GiB in size once extracted:
http://perf148h.perf.lab.eng.bos.redhat.com/share/BZ_logs/ceph_vms_cant_boot.tar.gz

I also collected SOS report from worker052:
http://perf148h.perf.lab.eng.bos.redhat.com/share/BZ_logs/sosreport-worker052-2023-06-23-tuhkqog.tar.xz

and worker019:
http://perf148h.perf.lab.eng.bos.redhat.com/share/BZ_logs/sosreport-worker019-2023-06-23-ktqbjor.tar.xz

Comment 15 Red Hat Bugzilla 2024-06-08 04:25:09 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.