Description of problem: After resizing a directlt iSCSI lun, disk name changed from /dev/vdb to /dev/vdc with in the VM which caused disk corruption. Version-Release number of selected component (if applicable): rhevm 3.4.4 How reproducible: 100% Steps to Reproduce: 1. Unmount the filesystem in the guest OS related to direct lun. 2. Deactivat the lun from the rhevm GUI 3. then extended the lun from the storage side 4. Then run the below command to update the tgt-adm and it should reflect the correctl sizein 'multipath -ll' too: # tgt-admin --update tid=3 -v -f # tgt-admin -s # multipathd -k"resize map 1IET_00030001" # multipath -ll 5. Also check the 'dmesg' to confirm the new size. 6. then activated the lun in the rhevm GUI. it will be activated but still it will show the old size of the LUN as rhevm DB is not updated due to Bugzilla#1176550 but in the guest VM the new size will be reflected. But the disk name got changes from /dev/vdb to /dev/vdc. LVM was created on top of /dev/vdb disk. === Out Put from the guest vm === # pvs /dev/mydisk/mylv: read failed after 0 of 4096 at 5364449280: Input/output error /dev/mydisk/mylv: read failed after 0 of 4096 at 5364506624: Input/output error /dev/mydisk/mylv: read failed after 0 of 4096 at 0: Input/output error /dev/mydisk/mylv: read failed after 0 of 4096 at 4096: Input/output error PV VG Fmt Attr PSize PFree /dev/vda2 vg_dhcp2108 lvm2 a-- 9.51g 0 /dev/vdc1 mydisk lvm2 a-- 5.00g 0 # vgs /dev/mydisk/mylv: read failed after 0 of 4096 at 5364449280: Input/output error /dev/mydisk/mylv: read failed after 0 of 4096 at 5364506624: Input/output error /dev/mydisk/mylv: read failed after 0 of 4096 at 0: Input/output error /dev/mydisk/mylv: read failed after 0 of 4096 at 4096: Input/output error VG #PV #LV #SN Attr VSize VFree mydisk 1 1 0 wz--n- 5.00g 0 vg_dhcp2108 1 2 0 wz--n- 9.51g 0 # lvs /dev/mydisk/mylv: read failed after 0 of 4096 at 5364449280: Input/output error /dev/mydisk/mylv: read failed after 0 of 4096 at 5364506624: Input/output error /dev/mydisk/mylv: read failed after 0 of 4096 at 0: Input/output error /dev/mydisk/mylv: read failed after 0 of 4096 at 4096: Input/output error LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert mylv mydisk -wi-a----- 5.00g lv_root vg_dhcp2108 -wi-ao---- 8.51g lv_swap vg_dhcp2108 -wi-ao---- 1.00g ======== Actual results: Disk name is changing after direct lun resize. Expected results: Disk name should not be changed.
Created attachment 972939 [details] LVM Backup captured from the guest system after this issue.
Liron - if we unplug and replug a disk from a VM - shouldn't it get the same device?
Allon, we currently do not attempt to maintain the device name (though we report it if it's reported by the guest). regardless, we can only "suggest" a device name, it's up to the guest OS whether to accept the offer or not (from https://libvirt.org/formatdomain.html - The dev attribute indicates the "logical" device name. The actual device name specified is not guaranteed to map to the device name in the guest OS. Treat it as a device ordering hint.). I'd also like to point out that in BZ https://bugzilla.redhat.com/show_bug.cgi?id=1079697 it was decided to clear the address after a disk is unplugged to vm.
automatically use a "free" device name might be problematic, let's assume that we unplugged disk A and then plugged disk B, do we want disk B to be "mapped" automatically as disk A? I believe that the answer is no- that can be reported as a bug (as well :) ), what we do want is the option for the user to suggest a device name if he know that he needs it to be assigned to the plugged device (or much less preferably, to save and use the last device name that was assigned to that plugged disk by the id). currently on the vdsm side we don't clear the used indexes used to generate the suggested name on unplug - which means that the index is generally only being incremented. I think that the solution is to support passing the suggested device name from the engine - so it can be specified by the user or to suggest avoid unplugging disks that are currently "used" by mounts or anything else. If it's really needed now, it can be checked whether the index used by the unplugged disk can be cleared by using a hook.
Why aren't we clearing the index? And what happens if we suggest a "used" index as you suggest here?
Allon, that's just how it is today - if we use a used one it depends wether it's already taken or not. Udayendu, which os version are you running on the guest?
Udayendu, please also specify if you deactivated/activated only one disks or more. thanks.
and if you can please also specify qemu/libvirt version.
Liron, Guest was running with RHEL6.5 and I have used rhev-hypervisor6-6.5-20141017.0.el6 version of hypervisor. So you can get the qemu/libvirt from this version.
Initially tried with one disk but after getting this issue also tried with more than 2 disks.
I moving this to 3.6.0, if a customer asks for this we will consider to 3.5.z.
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Amit, does this still happen after all the fixes you had from plugging/unplugging disks in 4.0?
My changes aren't supposed to reflect on this scenario- however, related changes have been made in this area in libvirt and qemu which could possibly solve that. will check.
The fact the name was changed from /dev/vdb to /dev/vdc is not our fault - it can happen when using SCSI due to races in detection (by udev?) of the drives. For safety, you should use 'by-id'. In case of virtio-SCSI, for example: /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_2 (when the serial ID of the disk is '2') and so on. So unless it is changed location, not much we can do here. Is that the case?
Yet another example where things go south when you rely on SCSI discovery ordering - bug 1349696
I believe that the info I've provided in https://bugzilla.redhat.com/show_bug.cgi?id=1177229#c3 and https://bugzilla.redhat.com/show_bug.cgi?id=1177229#c9 is still relevant. Allon/Tal - how do we want to proceed with it?
(In reply to Liron Aravot from comment #23) > I believe that the info I've provided in > https://bugzilla.redhat.com/show_bug.cgi?id=1177229#c3 and > https://bugzilla.redhat.com/show_bug.cgi?id=1177229#c9 is still relevant. > > Allon/Tal - how do we want to proceed with it? See comment 21. The only thing we can and MUST do, is provide a serial number to the disks - and keep it (and use it). It's imperative for virtio-scsi.
Yaniv, thanks. For regular disk image we do pass their id in the engine as serial (unless there's an issue i'm not aware of) and for LUN disks we have BZ 957788. Is there any action item left on this BZ?
(In reply to Liron Aravot from comment #25) > Yaniv, thanks. > For regular disk image we do pass their id in the engine as serial (unless > there's an issue i'm not aware of) and for LUN disks we have BZ 957788. > > Is there any action item left on this BZ? If it's also for virtio-scsi, I suggest moving this bug to Docs, to ensure this is properly documented.
Please move to Docs.
Verified that the disk id is passed and listed for virtio-scsi as well.
FYI, This bug wasn't moved to ON_QA because it doesn't contain any external tracker attached. Since there is no way to determine if this bug was fixed in the release, it will stay on MODIFIED until the relevant patch with the fix will be attached.
Moving to 4.1.1 not being trackes in 4.1.0 RC and not marked as blocker. Assigning to Liron who moved the bug to Modified.
1) Added a 50G direct LUN to a VM with RHEL7.3 OS, the guest saw it as /dev/sda: [root@localhost ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 50G 0 disk sr0 11:0 1 1024M 0 rom vda 253:0 0 10G 0 disk ├─vda1 253:1 0 200M 0 part /boot ├─vda2 253:2 0 2G 0 part [SWAP] └─vda3 253:3 0 7.8G 0 part / 2) Resized the LUN from the storage server to 70G 3) Put all hosts to maintenance and activated them in order to refresh the LUNs info 4) Started the VM, guest sees the disk as /dev/sda: [root@localhost ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 70G 0 disk sr0 11:0 1 1024M 0 rom vda 253:0 0 10G 0 disk ├─vda1 253:1 0 200M 0 part /boot ├─vda2 253:2 0 2G 0 part [SWAP] └─vda3 253:3 0 7.8G 0 part / Verified using: rhevm-4.1.1.8-0.1.el7.noarch vdsm-4.19.10.1-1.el7ev.x86_64 libvirt-2.0.0-10.el7_3.5.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64