Can we get the exact actions on the system so that we can reproduce it? When the volume is detached from VM it should be let go by vdsm. Also, you mention lvm filter...how was that supposed to work, if it’s filtered out then it can’t be used as a VM disk, no?
(In reply to Michal Skrivanek from comment #2) > Can we get the exact actions on the system so that we can reproduce it? When > the volume is detached from VM it should be let go by vdsm. I was able to reproduce the issue in my test environment. The supervdsm is holding the unmapped device during the storageDevicesList call. The engine will only initiate this call if the cluster is enabled with "gluster service". So we won't see this issue in normal setup as gluster and virt service are not enabled by default but we will see this issue in RHHI. The issue can be reproduced by steps below. [1] Map a LUN to the server. [2] Create a partition on this LUN. [3] Unamp the LUN from the storage. [4] Fush the cache on the host. # echo 3 > /proc/sys/vm/drop_caches [5] Login to RHV-M => click on hosts => Storage devices => sync The step [5] will initiate the storageDevicesList in the supervdsm which will use blivet. While blivet reads the mbr, it will fail with i/o error as below. === MainProcess|jsonrpc/1::WARNING::2020-09-15 10:38:25,870::edd::169::blivet::(collect_mbrs) edd: error reading mbrsig from disk 36001405097fdbe4d6e04e3b9bdc97014: [Errno 5] Input/output error multipath -ll |grep -A3 36001405097fdbe4d6e04e3b9bdc97014 36001405097fdbe4d6e04e3b9bdc97014 dm-22 LIO-ORG ,sdk size=10G features='0' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=0 status=enabled `- 5:0:0:1 sde 8:64 failed faulty running === From the strace output, we can see that the blivet opens the device, tried to read it and then failed with EIO. However, it doesn't care to close it. === 10078 10:38:25.869889 open("/dev/mapper/36001405097fdbe4d6e04e3b9bdc97014", O_RDONLY <unfinished ...> 10078 10:38:25.870015 <... open resumed>) = 25 10078 10:38:25.870415 lseek(25, 440, SEEK_SET <unfinished ...> 10078 10:38:25.870600 read(25, <unfinished ...> 10078 10:38:25.870759 <... read resumed>0x7f865c0ec984, 4) = -1 EIO (Input/output error) < -- then jumped to next device without closing the fd 25--> 10078 10:38:25.871192 open("/dev/sda", O_RDONLY <unfinished ...> === lsof also shows fd 25 is not closed by supervdsm. === lsof |grep supervdsm |grep 25r supervdsm 9592 root 25r BLK 253,22 0t440 58811969 /dev/dm-22 === blivlet is not closing the device if there is an exception while accessing the device. === blivet/devicelibs/edd.py 153 def collect_mbrs(devices): 154 """ Read MBR signatures from devices. 155 156 Returns a dict mapping device names to their MBR signatures. It is not 157 guaranteed this will succeed, with a new disk for instance. 158 """ 159 mbr_dict = {} 160 for dev in devices: 161 try: 162 fd = os.open(dev.path, os.O_RDONLY) 163 # The signature is the unsigned integer at byte 440: 164 os.lseek(fd, 440, 0) 165 mbrsig = struct.unpack('I', os.read(fd, 4)) 166 os.close(fd) 167 except OSError as e: 168 log.warning("edd: error reading mbrsig from disk %s: %s", 169 dev.name, str(e)) 170 continue ===> Not closing fd if it fails to access the device and continues with next device ==== The latest upstream code also has the same behaviour. So now when customer tries to remove the unammped device from multipath, it will fail with the error "map in use".
I have submitted this PR for blivet https://github.com/storaged-project/blivet/pull/899
The fix is available in python-blivet-0.61.15.76-1.el7_9. See bug 1879920. Closing this as duplicate. *** This bug has been marked as a duplicate of bug 1879920 ***