1177229 – Vm's disk names ('/dev/vdX') changed after direct lun resize - need to document that virtio-scsi disk can change numbering

Bug 1177229 - Vm's disk names ('/dev/vdX') changed after direct lun resize - need to document that virtio-scsi disk can change numbering

Summary: Vm's disk names ('/dev/vdX') changed after direct lun resize - need to docume...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Documentation
Sub Component:
Version:	---
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	ovirt-4.1.1
Target Release:	4.1.1.6
Assignee:	Liron Aravot
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1429751
TreeView+	depends on / blocked

Reported:	2014-12-25 09:21 UTC by Udayendu Sekhar Kar
Modified:	2017-04-21 09:50 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-04-21 09:50:54 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.1+ rule-engine: planning_ack+ rule-engine: devel_ack+ ratamir: testing_ack+

Attachments	(Terms of Use)
LVM Backup captured from the guest system after this issue. (15.58 KB, application/x-bzip) 2014-12-25 09:22 UTC, Udayendu Sekhar Kar	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1155275	0	high	CLOSED	[RFE] - Online update LUN size to the Guest after LUN resize	2024-10-01 16:01:04 UTC

Internal Links: 1155275

Description Udayendu Sekhar Kar 2014-12-25 09:21:47 UTC

Description of problem:
After resizing a directlt iSCSI lun, disk name changed from /dev/vdb to /dev/vdc with in the VM which caused disk corruption. 

Version-Release number of selected component (if applicable):
rhevm 3.4.4

How reproducible:
100%

Steps to Reproduce:
 1. Unmount the filesystem in the guest OS related to direct lun.
 2. Deactivat the lun from the rhevm GUI
 3. then extended the lun from the storage side
 4. Then run the below command to update the tgt-adm and it should reflect the correctl sizein 'multipath -ll' too:
     # tgt-admin --update tid=3 -v -f
     # tgt-admin  -s
     # multipathd -k"resize map 1IET_00030001"
     # multipath -ll
5. Also check the 'dmesg' to confirm the new size.
6. then activated the lun in the rhevm GUI. it will be activated but still it will show the old size of the LUN as rhevm DB is not updated due to Bugzilla#1176550  but in the guest VM the new size will be reflected.

But the disk name got changes from /dev/vdb to /dev/vdc. LVM was created on top of /dev/vdb disk.

=== Out Put from the guest vm ===
# pvs
  /dev/mydisk/mylv: read failed after 0 of 4096 at 5364449280: Input/output error
  /dev/mydisk/mylv: read failed after 0 of 4096 at 5364506624: Input/output error
  /dev/mydisk/mylv: read failed after 0 of 4096 at 0: Input/output error
  /dev/mydisk/mylv: read failed after 0 of 4096 at 4096: Input/output error
  PV         VG          Fmt  Attr PSize PFree
  /dev/vda2  vg_dhcp2108 lvm2 a--  9.51g    0 
  /dev/vdc1  mydisk      lvm2 a--  5.00g    0 

# vgs
  /dev/mydisk/mylv: read failed after 0 of 4096 at 5364449280: Input/output error
  /dev/mydisk/mylv: read failed after 0 of 4096 at 5364506624: Input/output error
  /dev/mydisk/mylv: read failed after 0 of 4096 at 0: Input/output error
  /dev/mydisk/mylv: read failed after 0 of 4096 at 4096: Input/output error
  VG          #PV #LV #SN Attr   VSize VFree
  mydisk        1   1   0 wz--n- 5.00g    0 
  vg_dhcp2108   1   2   0 wz--n- 9.51g    0 

# lvs
  /dev/mydisk/mylv: read failed after 0 of 4096 at 5364449280: Input/output error
  /dev/mydisk/mylv: read failed after 0 of 4096 at 5364506624: Input/output error
  /dev/mydisk/mylv: read failed after 0 of 4096 at 0: Input/output error
  /dev/mydisk/mylv: read failed after 0 of 4096 at 4096: Input/output error
  LV      VG          Attr       LSize Pool Origin Data%  Move Log Cpy%Sync Convert
  mylv    mydisk      -wi-a----- 5.00g                                             
  lv_root vg_dhcp2108 -wi-ao---- 8.51g                                             
  lv_swap vg_dhcp2108 -wi-ao---- 1.00g             
========                        


Actual results:
Disk name is changing after direct lun resize.

Expected results:
Disk name should not be changed.

Comment 1 Udayendu Sekhar Kar 2014-12-25 09:22:54 UTC

Created attachment 972939 [details]
LVM Backup captured from the guest system after this issue.

Comment 2 Allon Mureinik 2014-12-28 14:29:12 UTC

Liron - if we unplug and replug a disk from a VM - shouldn't it get the same device?

Comment 3 Liron Aravot 2014-12-29 08:30:26 UTC

Allon,
we currently do not attempt to maintain the device name (though we report it if it's reported by the guest).

regardless, we can only "suggest" a device name, it's up to the guest OS whether to accept the offer or not (from https://libvirt.org/formatdomain.html - The dev attribute indicates the "logical" device name. The actual device name specified is not guaranteed to map to the device name in the guest OS. Treat it as a device ordering hint.).

I'd also like to point out that in BZ https://bugzilla.redhat.com/show_bug.cgi?id=1079697 it was decided to clear the address after a disk is unplugged to vm.

Comment 9 Liron Aravot 2015-01-18 14:37:27 UTC

automatically use a "free" device name might be problematic, let's assume that we unplugged disk A and then plugged disk B, do we want disk B to be "mapped" automatically as disk A? I believe that the answer is no- that can be reported as a bug (as well :) ), what we do want is the option for the user to suggest a device name if he know that he needs it to be assigned to the plugged device (or much less preferably, to save and use the last device name that was assigned to that plugged disk by the id).

currently on the vdsm side we don't clear the used indexes used to generate the suggested name on unplug - which means that the index is generally only being incremented.

I think that the solution is to support passing the suggested device name from the engine - so it can be specified by the user or to suggest avoid unplugging disks that are currently "used" by mounts or anything else.

If it's really needed now, it can be checked whether the index used by the unplugged disk can be cleared by using a hook.

Comment 10 Allon Mureinik 2015-01-18 15:06:19 UTC

Why aren't we clearing the index?
And what happens if we suggest a "used" index as you suggest here?

Comment 11 Liron Aravot 2015-01-20 10:19:47 UTC

Allon, that's just how it is today - if we use a used one it depends wether it's already taken or not.

Udayendu, which os version are you running on the guest?

Comment 12 Liron Aravot 2015-01-20 10:38:03 UTC

Udayendu, please also specify if you deactivated/activated only one disks or more.
thanks.

Comment 13 Liron Aravot 2015-01-20 10:47:19 UTC

and if you can please also specify qemu/libvirt version.

Comment 14 Udayendu Sekhar Kar 2015-01-22 21:33:40 UTC

Liron,

Guest was running with RHEL6.5 and I have used rhev-hypervisor6-6.5-20141017.0.el6 version of hypervisor. So you can get the qemu/libvirt from this version.

Comment 15 Udayendu Sekhar Kar 2015-01-22 21:36:15 UTC

Initially tried with one disk but after getting this issue also tried with more than 2 disks.

Comment 16 Yaniv Lavi 2015-01-25 15:15:00 UTC

I moving this to 3.6.0, if a customer asks for this we will consider to 3.5.z.

Comment 18 Red Hat Bugzilla Rules Engine 2015-11-30 19:07:01 UTC

Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 19 Allon Mureinik 2016-03-28 14:44:08 UTC

Amit, does this still happen after all the fixes you had from plugging/unplugging disks in 4.0?

Comment 20 Amit Aviram 2016-03-28 15:00:33 UTC

My changes aren't supposed to reflect on this scenario- however, related changes have been made in this area in libvirt and qemu which could possibly solve that. will check.

Comment 21 Yaniv Kaul 2016-11-21 10:21:19 UTC

The fact the name was changed from /dev/vdb to /dev/vdc is not our fault - it can happen when using SCSI due to races in detection (by udev?) of the drives. For safety, you should use 'by-id'.
In case of virtio-SCSI, for example: /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_2 (when the serial ID of the disk is '2') and so on.
So unless it is changed location, not much we can do here. Is that the case?

Comment 22 Yaniv Kaul 2016-11-21 10:30:20 UTC

Yet another example where things go south when you rely on SCSI discovery ordering - bug 1349696

Comment 23 Liron Aravot 2016-12-28 15:48:28 UTC

I believe that the info I've provided in  https://bugzilla.redhat.com/show_bug.cgi?id=1177229#c3 and https://bugzilla.redhat.com/show_bug.cgi?id=1177229#c9 is still relevant.

Allon/Tal - how do we want to proceed with it?

Comment 24 Yaniv Kaul 2016-12-28 15:56:02 UTC

(In reply to Liron Aravot from comment #23)
> I believe that the info I've provided in 
> https://bugzilla.redhat.com/show_bug.cgi?id=1177229#c3 and
> https://bugzilla.redhat.com/show_bug.cgi?id=1177229#c9 is still relevant.
> 
> Allon/Tal - how do we want to proceed with it?

See comment 21. The only thing we can and MUST do, is provide a serial number to the disks - and keep it (and use it). It's imperative for virtio-scsi.

Comment 25 Liron Aravot 2016-12-28 16:02:44 UTC

Yaniv, thanks.
For regular disk image we do pass their id in the engine as serial (unless there's an issue i'm not aware of) and for LUN disks we have BZ 957788.

Is there any action item left on this BZ?

Comment 26 Yaniv Kaul 2016-12-28 16:06:32 UTC

(In reply to Liron Aravot from comment #25)
> Yaniv, thanks.
> For regular disk image we do pass their id in the engine as serial (unless
> there's an issue i'm not aware of) and for LUN disks we have BZ 957788.
> 
> Is there any action item left on this BZ?

If it's also for virtio-scsi, I suggest moving this bug to Docs, to ensure this is properly documented.

Comment 27 Yaniv Kaul 2017-01-22 09:25:12 UTC

Please move to Docs.

Comment 28 Liron Aravot 2017-01-22 17:13:41 UTC

Verified that the disk id is passed and listed for virtio-scsi as well.

Comment 29 Eyal Edri 2017-01-23 09:01:38 UTC

FYI,
This bug wasn't moved to ON_QA because it doesn't contain any external tracker attached.

Since there is no way to determine if this bug was fixed in the release, it will stay on MODIFIED until the relevant patch with the fix will be attached.

Comment 30 Sandro Bonazzola 2017-01-23 14:33:10 UTC

Moving to 4.1.1 not being trackes in 4.1.0 RC and not marked as blocker.
Assigning to Liron who moved the bug to Modified.

Comment 31 Elad 2017-04-12 08:32:20 UTC

1) Added a 50G direct LUN to a VM with RHEL7.3 OS, the guest saw it as /dev/sda:

[root@localhost ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0   50G  0 disk 
sr0     11:0    1 1024M  0 rom  
vda    253:0    0   10G  0 disk 
├─vda1 253:1    0  200M  0 part /boot
├─vda2 253:2    0    2G  0 part [SWAP]
└─vda3 253:3    0  7.8G  0 part /

2) Resized the LUN from the storage server to 70G
3) Put all hosts to maintenance and activated them in order to refresh the LUNs info
4) Started the VM, guest sees the disk as /dev/sda:

[root@localhost ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0   70G  0 disk 
sr0     11:0    1 1024M  0 rom                                                                                                                                                                
vda    253:0    0   10G  0 disk                                                                                                                                                               
├─vda1 253:1    0  200M  0 part /boot                                                                                                                                                         
├─vda2 253:2    0    2G  0 part [SWAP]                                                                                                                                                        
└─vda3 253:3    0  7.8G  0 part / 


Verified using:
rhevm-4.1.1.8-0.1.el7.noarch
vdsm-4.19.10.1-1.el7ev.x86_64
libvirt-2.0.0-10.el7_3.5.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64

Note You need to log in before you can comment on or make changes to this bug.