Bug 1614064 - Disk UI and lettering / scsi ids are inconsistent
Summary: Disk UI and lettering / scsi ids are inconsistent
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 4.2.5.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: bugs@ovirt.org
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-08 22:44 UTC by Nathan March
Modified: 2023-09-14 04:32 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-22 08:14:22 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
How the disks look in the GUI (25.86 KB, image/png)
2018-08-08 22:44 UTC, Nathan March
no flags Details
The boot menu showing the drives in an unknown order (#5 is my boot device) (15.74 KB, image/png)
2018-08-08 22:44 UTC, Nathan March
no flags Details
virsh dumpxml from before me removing + reattaching (15.19 KB, text/plain)
2018-08-08 22:44 UTC, Nathan March
no flags Details
virsh dumpxml from after (15.19 KB, text/plain)
2018-08-08 22:45 UTC, Nathan March
no flags Details

Description Nathan March 2018-08-08 22:44:07 UTC
Created attachment 1474547 [details]
How the disks look in the GUI

I have the a basic config in the GUI (see gui.png), with 4 drives and one of them being marked bootable. This looks fairly sensible, with nathantest_boot as being the primary device that contains grub, /, /boot, etc

The big issue is that the devices are enumerated in no logical way within the OS:

! nathantest ~ # lsblk
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
hdc     22:0    1   4G  0 disk 
sda      8:0    0  40G  0 disk /var
sdc      8:32   0   6G  0 disk 
sdb      8:16   0   2G  0 disk /tmp
sdd      8:48   0   6G  0 disk 
`-sdd1   8:49   0   6G  0 part /

Here nathantest_boot became sdd, nathantest_oldroot became sdc, nathantest_tmp became sdb and nathantest_var became sda.

I've attached a copy of the xml file used to generate the VM (nathantest.xml), in here you can see some "dev='sdX'" parameters that seem to line up with the UI and some scsi unit #'s that are seemingly random:

      <target dev='sda' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='4'/>

      <target dev='sdb' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>

      <target dev='sdc' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>

      <target dev='sdd' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>

For clarity, here's the actual drive to scsi ID mapping from within the VM:

lrwxrwxrwx 1 root root   9 Aug  8 14:47 pci-0000:00:04.0-virtio-pci-virtio1-scsi-0:0:0:1 -> ../../sdb
lrwxrwxrwx 1 root root   9 Aug  8 14:47 pci-0000:00:04.0-virtio-pci-virtio1-scsi-0:0:0:2 -> ../../sda
lrwxrwxrwx 1 root root   9 Aug  8 14:47 pci-0000:00:04.0-virtio-pci-virtio1-scsi-0:0:0:3 -> ../../sdc
lrwxrwxrwx 1 root root   9 Aug  8 14:47 pci-0000:00:04.0-virtio-pci-virtio1-scsi-0:0:0:4 -> ../../sdd

So not at all what the above XML chunk would seem to imply.

If I then shutdown my VM, remove all the drives, and attach them one by one in this order nathantest_boot, nathantest_tmp, nathantest_var, nathantest_oldroot, I end up with something that looks functionally identical in the GUI but the drive ordering has changed (see nathantest-after.xml) to be the order in which I attached the drives:

NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
hdc     22:0    1   4G  0 disk 
sda      8:0    0   6G  0 disk 
`-sda1   8:1    0   6G  0 part /
sdb      8:16   0  40G  0 disk /tmp
sdc      8:32   0   2G  0 disk 
sdd      8:48   0   6G  0 disk 

Another issue I'm hitting is that at the boot menu (see screenshot bootmenu.png), there's no way to identify which is which except by trial and error. You would think that the one marked as "OS" would be the first one, or maybe it would be the first once since it's at the top of the list? Nope it's actually #5, and it was #5 both before and after my above renumbering.

If this was just me I could handle removing/readding everything to get them in the order I want, but for anyone less technical this is very confusing that the other in the GUI doesn't match the order within the machine. I coulda lso easily see this impacting a production server when someone goes and provisions a machine with two identical size drives and then deletes the wrong one because the order in the OS is opposite of the order in the GUI.

I've been reading a lot of posts with similar issues like this, and I feel like the easiest fix is simply to expose the SCSI ID in the GUI and allow users to update it (vmware does this). You could then add the SCSI ID as the default sorted column on the VM disk tab. This would also help with windows VMs where multiple disks may be the same size and you're not sure which one to resize in ovirt (on vmware we match the disk # in windows, up to the scsi ID in vsphere).

Comment 1 Nathan March 2018-08-08 22:44:31 UTC
Created attachment 1474548 [details]
The boot menu showing the drives in an unknown order (#5 is my boot device)

Comment 2 Nathan March 2018-08-08 22:44:51 UTC
Created attachment 1474549 [details]
virsh dumpxml from before me removing + reattaching

Comment 3 Nathan March 2018-08-08 22:45:06 UTC
Created attachment 1474550 [details]
virsh dumpxml from after

Comment 4 Tal Nisan 2018-08-16 12:39:59 UTC
Hi Nathan,
Isn't the enumeration the same as the alphanumeric order of the disk aliases (minus the boot disk which is first)?

Comment 5 Nathan March 2018-08-20 23:02:52 UTC
I just shutdown my VM and edited the existing disk aliases to prefix them with a number in the same order as they're currently listed in the gui:

1_nathantest_boot  cd9d89c5-395f-4b32-8cda-521bb9be8333 (bootable)
2_nathantest_oldroot ef296784-57cc-4405-8e3b-03d47faaa4e9
3_nathantest_tmp 17ca073f-de6f-490b-ba66-42b34a1188f2
4_nathantest_var 74b1da12-1c3b-4deb-a2ef-74e479eacc9b

If I then remove all the disks and reattach them in an arbitrary order, say #3 #2 #1 #4, this is how the xml looks:

    <disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source file='/rhev/data-center/mnt/10.1.32.10:_gto__vana__sas01/2060a19f-f26c-4dba-a559-83541a4d0c7a/images/cd9d89c5-395f-4b32-8cda-521bb9be8333/4c609016-07e9-4d4b-9c94-72b588583bf4'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <serial>cd9d89c5-395f-4b32-8cda-521bb9be8333</serial>
      <alias name='ua-cd9d89c5-395f-4b32-8cda-521bb9be8333'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source file='/rhev/data-center/mnt/10.1.32.10:_gto__vana__sas01/2060a19f-f26c-4dba-a559-83541a4d0c7a/images/ef296784-57cc-4405-8e3b-03d47faaa4e9/65d0bf35-64a5-4b75-aefd-eabb2f4be893'/>
      <backingStore/>
      <target dev='sdb' bus='scsi'/>
      <serial>ef296784-57cc-4405-8e3b-03d47faaa4e9</serial>
      <alias name='ua-ef296784-57cc-4405-8e3b-03d47faaa4e9'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source file='/rhev/data-center/mnt/10.1.32.10:_gto__vana__sas01/2060a19f-f26c-4dba-a559-83541a4d0c7a/images/17ca073f-de6f-490b-ba66-42b34a1188f2/5d541384-4961-4a75-a19b-270b120f4a2c'/>
      <backingStore/>
      <target dev='sdc' bus='scsi'/>
      <serial>17ca073f-de6f-490b-ba66-42b34a1188f2</serial>
      <alias name='ua-17ca073f-de6f-490b-ba66-42b34a1188f2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source file='/rhev/data-center/mnt/10.1.32.10:_gto__vana__sas01/2060a19f-f26c-4dba-a559-83541a4d0c7a/images/74b1da12-1c3b-4deb-a2ef-74e479eacc9b/8c5056d2-3f1f-43bf-9495-d388227de7e0'/>
      <backingStore/>
      <target dev='sdd' bus='scsi'/>
      <serial>74b1da12-1c3b-4deb-a2ef-74e479eacc9b</serial>
      <alias name='ua-74b1da12-1c3b-4deb-a2ef-74e479eacc9b'/>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>
    </disk>

Which seems to be as you say, where it's listed in alphabetical order by alias. I'm not longer able to reproduce the odd scsi numbering here that I listed in my original ticket and I'm not sure why.... doesn't look like any updates have been applied. I'll keep trying to reproduce and will follow up if I can manage.

So I guess my question comes down to: Why does the virtio driver not assign drive letters in the same order as the scsi ID?

Example boot logs from 2.6.32-504.8.1.el6.x86_64:

sd 2:0:0:0: [sda] Attached SCSI disk
sd 2:0:0:3: [sdb] Attached SCSI disk
sd 2:0:0:1: [sdd] Attached SCSI disk
sd 2:0:0:2: [sdc] Attached SCSI disk

Latest 4.9.122 mainline kernel:

[    1.347626] sd 0:0:0:0: [sda] Attached SCSI disk
[    1.349386] sd 0:0:0:1: [sdd] Attached SCSI disk
[    1.350958] sd 0:0:0:2: [sdc] Attached SCSI disk
[    1.364957] sd 0:0:0:3: [sdb] Attached SCSI disk

Spot checking a random assortment of other hardware boxes, other device drivers seem to increment the letter in the same order as the scsi id.

Comment 6 Tal Nisan 2018-08-21 14:57:21 UTC
Arik, can you answer that?

Comment 7 Arik 2018-08-23 13:26:58 UTC
(In reply to Nathan March from comment #5)
> So I guess my question comes down to: Why does the virtio driver not assign
> drive letters in the same order as the scsi ID?

The drive letters are assigned by ovirt-engine.
The sorting takes into account the following: whether or not the disk is bootable, the disk address, whether or not it is a disk snapshot and its alias [1].

That logic is supposed to sort disks of different types, not just scsi disks, that are attached to the VM. I think it is possible to enhance this logic in a way that it would consider the scsi ID when comparing scsi disks.

[1] https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/VmInfoBuildUtils.java#L644-L648

Comment 8 Nathan March 2018-08-23 16:32:26 UTC
(In reply to Arik from comment #7)
> The drive letters are assigned by ovirt-engine.
> The sorting takes into account the following: whether or not the disk is
> bootable, the disk address, whether or not it is a disk snapshot and its
> alias [1].

Thanks for the link to the code, that's useful to see. I'm not clear what the device address is considered as in that code though, compared to the above XML that I posted. The target dev + scsi ID are both in the correct order in the XML, but if the address is supposed to be the serial # (uuid) then it's not sorting via that either. Am I missing something here?

Comment 9 Tal Nisan 2018-09-02 14:44:26 UTC
The sorting is made by the disk alias (with the exception of the boot disk)

Comment 10 Tal Nisan 2018-09-16 12:23:38 UTC
Nathan, did you get all the info required? Can I close the bug?

Comment 11 Nathan March 2018-09-17 23:58:47 UTC
The scsi id's are assigned by alphabetical order of the drive alias, but the drives themselves are not.

If you take a look at the above example in comment 5, you can see the scsi ID's are set correctly as is the drive letter inside the xml. The drive letter assigned by the guest on boot however does not correspond to the drive letter in the xml, and is not assigned in the same order as the scsi IDs.

Comment 12 Michal Skrivanek 2018-09-18 05:38:57 UTC
that would be then on the guest side. Can you check your guest OS configuration/rules? 
The names in <target dev='sdd'> are just hints, unused by the guest AFAIK.
The SCSI address is what most OS use to enumerate disks...but it could be arbitrarily changed by the guest out of our control.

Comment 13 Nathan March 2018-09-18 20:35:26 UTC
Wouldn't the scsi id -> lettering be decided by the virtio kernel driver?

I tested both on centos kernel 2.6.32-504.8.1.el6.x86_64 and gentoo with 4.9.122, both enumerated the drives in the same incorrect order.

Comment 14 Nathan March 2018-09-18 20:48:26 UTC
Just to clarify, I mean the kernel is assigning the incorrect drive letters prior to udev or anything like that being loaded. See dmesg lines in comment 5

Comment 15 Michal Skrivanek 2018-09-19 07:22:59 UTC
TBH I do not know how exactly it differs between RHEL kernels, gentoo and mainline, it could be different. IIRC it did change between RHEL 6 and RHEL 7 kernels (at least for NICs), it does depend on kernel compile options in the guest you are booting so we can't really guarantee anything from the management side. That's why we added the guest agent reporting to be able to map it back in GUI

is it also wrong with el7 kernel?

Comment 16 Nathan March 2018-09-20 18:52:16 UTC
I've now done the following:

- Installed a centos7 (3.10.0-862.el7.x86_64) vm with a single 40gb drive namd of "nathanc7_Disk1". This shows up as expected as /dev/sda
- While the VM was online, added a new 50gb drive named "nathanc7_Disk2". This shows up as /dev/sdb
- Rebooted the vm, afterwards confirmed drive letters are still as before.
- While the VM was online, added a new 60gb drive named "nathanc7_Disk3". Although this said it was successfully added, for some reason it showed as "Inactive" and didn't present itself in the OS.
- Rebooted the VM, then realized the latest disk was still inactive, shut down the VM and then activated the disk in the gui. I also disconnected the centos install CD here.
- Booted it up, and now I see 40gb sda, 60gb sdb, 70gb sdc

So just by adding drives one at a time, their drive letters were not assigned in order.

Just to be completely clear, in the GUI I have: nathanc7_Disk1 40gb, nathanc7_Disk2 50gb, nathanc7_Disk3 60gb

In the VM, I have:

[root@localhost ~]# dmesg | grep 'logical blo'  
[    1.777692] sd 2:0:0:0: [sda] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB)
[    1.777798] sd 2:0:0:1: [sdb] 125829120 512-byte logical blocks: (64.4 GB/60.0 GiB)
[    1.778783] sd 2:0:0:2: [sdc] 104857600 512-byte logical blocks: (53.6 GB/50.0 GiB)

It looks like now for some reason ovirt has put the latest 60gb drive at scsi ID #1:

    <disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source file='/rhev/data-center/mnt/10.1.32.10:_gto__vana__sas01/2060a19f-f26c-4dba-a559-83541a4d0c7a/images/91a4f290-9de9-4a0c-b3db-bcbe7de49e45/58a43470-e392-4034-ab1b-8f85debfb0f5'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <serial>91a4f290-9de9-4a0c-b3db-bcbe7de49e45</serial>
      <boot order='1'/>
      <alias name='ua-91a4f290-9de9-4a0c-b3db-bcbe7de49e45'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source file='/rhev/data-center/mnt/10.1.32.10:_gto__vana__sas01/2060a19f-f26c-4dba-a559-83541a4d0c7a/images/db5c0b80-e43e-49e3-b038-02f465e1486b/1708357f-978b-45f9-ae6b-1bdc812069c2'/>
      <backingStore/>
      <target dev='sdb' bus='scsi'/>
      <serial>db5c0b80-e43e-49e3-b038-02f465e1486b</serial>
      <alias name='ua-db5c0b80-e43e-49e3-b038-02f465e1486b'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source file='/rhev/data-center/mnt/10.1.32.10:_gto__vana__sas01/2060a19f-f26c-4dba-a559-83541a4d0c7a/images/9414ea06-3a9f-48df-8c89-6c3b4ae09f86/285d04fe-2f00-4184-b3b7-e526ac6a7f47'/>
      <backingStore/>
      <target dev='sdc' bus='scsi'/>
      <serial>9414ea06-3a9f-48df-8c89-6c3b4ae09f86</serial>
      <alias name='ua-9414ea06-3a9f-48df-8c89-6c3b4ae09f86'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    
and just to check the underlying files to be 100% sure which file is which drive:
    
40G     /rhev/data-center/mnt/10.1.32.10:_gto__vana__sas01/2060a19f-f26c-4dba-a559-83541a4d0c7a/images/91a4f290-9de9-4a0c-b3db-bcbe7de49e45/58a43470-e392-4034-ab1b-8f85debfb0f5
50G     /rhev/data-center/mnt/10.1.32.10:_gto__vana__sas01/2060a19f-f26c-4dba-a559-83541a4d0c7a/images/db5c0b80-e43e-49e3-b038-02f465e1486b/1708357f-978b-45f9-ae6b-1bdc812069c2
60G     /rhev/data-center/mnt/10.1.32.10:_gto__vana__sas01/2060a19f-f26c-4dba-a559-83541a4d0c7a/images/9414ea06-3a9f-48df-8c89-6c3b4ae09f86/285d04fe-2f00-4184-b3b7-e526ac6a7f47

In the logs I see that on previous boot for some reason the 50gb drive was given 2:0:0:2. While I was making the above changes I also detached the centos install CD, but I don't think that's related since it was assigned 1:0:0:0:

[root@localhost log]# grep :0:0: dmesg.old 
[    1.584228] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     2.5+ PQ: 0 ANSI: 5
[    1.693004] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
[    1.693291] sr 1:0:0:0: Attached scsi CD-ROM sr0
[    1.796439] scsi 2:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[    1.796839] scsi 2:0:0:2: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[    1.828173] sd 2:0:0:0: [sda] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB)
[    1.828288] sd 2:0:0:2: [sdb] 104857600 512-byte logical blocks: (53.6 GB/50.0 GiB)
[    1.828399] sd 2:0:0:0: [sda] Write Protect is off
[    1.828401] sd 2:0:0:0: [sda] Mode Sense: 63 00 00 08
[    1.828436] sd 2:0:0:2: [sdb] Write Protect is off
[    1.828438] sd 2:0:0:2: [sdb] Mode Sense: 63 00 00 08
[    1.828452] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.829202] sd 2:0:0:2: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.831362] sd 2:0:0:0: [sda] Attached SCSI disk
[    1.831795] sd 2:0:0:2: [sdb] Attached SCSI disk
[    3.594083] sr 1:0:0:0: Attached scsi generic sg0 type 5
[    3.594131] sd 2:0:0:0: Attached scsi generic sg1 type 0
[    3.594170] sd 2:0:0:2: Attached scsi generic sg2 type 0

At this point I then powered off the machine completely and did this:

 - Removed nathanc7_Disk3
 - Removed nathanc7_Disk2
 - Attached nathanc7_Disk2
 - Attached nathanc7_Disk3
 - Booted the VM
 
Now the SCSI ids are ordered appropriately as per the alias name, but the drive letters are still incorrect:

[root@localhost ~]# dmesg | grep 'logical bl'
[    1.779133] sd 2:0:0:0: [sda] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB)
[    1.779210] sd 2:0:0:2: [sdb] 125829120 512-byte logical blocks: (64.4 GB/60.0 GiB)
[    1.779755] sd 2:0:0:1: [sdc] 104857600 512-byte logical blocks: (53.6 GB/50.0 GiB)

Hopefully this info is helpful, but let me know if there's something specific you'd like me to test.

Comment 17 Michal Skrivanek 2018-09-24 11:03:39 UTC
it would be best if you don't mix it up with hotplugs. They are handled slightly differently than offline edit configuration.

I tried a VM with 1 disk (unit 0) and hotplug another one - it got unit 2, when adding 3r disk it got unit 1 resulting in the order you see above,
Looks like a bug in hotplug. It also happened to me that the third disk was not activated out of the box.

Can you try without that?

Comment 18 Tal Nisan 2018-10-22 08:14:22 UTC
Closing as insufficient data, please reopen if you can provide the needed info

Comment 19 Red Hat Bugzilla 2023-09-14 04:32:56 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.