Bug 1317490

Summary: [engine-backend] Disks are alphabetically ordered instead of numerically, which causes the guest to see them this way (1,10,2..) instead of (1,2,10)
Product: [oVirt] ovirt-engine Reporter: Elad <ebenahar>
Component: BLL.StorageAssignee: Tal Nisan <tnisan>
Status: CLOSED CURRENTRELEASE QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.3.3CC: ahadas, amureini, bugs, ebenahar, kgoldbla, ratamir, tnisan, ylavi
Target Milestone: ovirt-4.1.1-1Flags: rule-engine: ovirt-4.1+
rule-engine: exception+
rule-engine: planning_ack+
tnisan: devel_ack+
ratamir: testing_ack+
Target Release: 4.1.1.6   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-21 09:48:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
server, engine, vdsm logs none

Description Elad 2016-03-14 12:03:28 UTC
Description of problem:
The same as Bug 1315878, only for disks:
Guest is exposed to its attached disks in a alphabetic order in case the disks names include numbers. For example, when attaching the disks with the following names: 1,2,3,4,10 , engine expose the them in the following order to the guest:
1,10,2,4.  
  
Version-Release number of selected component (if applicable):
rhevm-3.6.3.4-0.1.el6.noarch
vdsm-4.17.23-0.el7ev.noarch
libvirt-daemon-1.2.17-13.el7_2.4.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.8.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create a VM and install RHEL OS, stop the VM.
2. Create and attach disks with the following names and sizes to the guest: disk1 (1G), disk2 (2G), disk10 (10G). Use the mentioned sizes to correlate later in the guest between disk name and size
3. Start the VM
4. Check 'lsblk' on the guest

Actual results:
The disks are exposed to the guest in an alphabetic order instead on numerically. 

In my case, a disk called 'disk2' is exposed as /dev/vdb (as expected), 'disk10' is exposed  as /dev/vdc (not as expected) and 'disk3' is exposed as /dev/vdd (not as expected):

[root@RHEL6 ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sr0     11:0    1 1024M  0 rom  
vda    252:0    0   10G  0 disk 
├─vda1 252:1    0  200M  0 part /boot
├─vda2 252:2    0    2G  0 part [SWAP]
└─vda3 252:3    0  7.8G  0 part /
vdb    252:16   0    2G  0 disk 
vdc    252:32   0   10G  0 disk 
vdd    252:48   0    3G  0 disk


Expected results:
Disks should be ordered and exposed numerically in such case 

Additional info:

Comment 1 Yaniv Kaul 2016-03-15 07:28:29 UTC
Sounds quite severe to me - presenting the wrong disks to a VM might cause havoc, no? BTW, they should have natural ordering.

Comment 2 Tal Nisan 2016-03-15 09:34:34 UTC
Elad, is this a regression? If users are living with that throughout the versions I guess it should be OK to keep the 4.0 target

Comment 3 Elad 2016-03-17 07:47:57 UTC
AFAIK, this scenario wasn't tested in previous versions. 
We can live with it.

Comment 4 Yaniv Kaul 2016-11-21 11:02:22 UTC
(In reply to Yaniv Kaul from comment #1)
> Sounds quite severe to me - presenting the wrong disks to a VM might cause
> havoc, no? BTW, they should have natural ordering.

Raising severity a bit. Sounds problematic - although if the user is not already using the disk by-id, they might get messed up anyway (if they are using SCSI at least).

Comment 5 Tal Nisan 2017-02-07 16:07:51 UTC
Arik, does your latest fix regarding disk order solve this bug?

Comment 6 Arik 2017-02-07 20:23:25 UTC
(In reply to Tal Nisan from comment #5)
> Arik, does your latest fix regarding disk order solve this bug?

I'm afraid not. The data that the engine sends to vdsm regarding the disks didn't change because of the change I did, so I expect vdsm to set the same names (vda, vdb and so on).

Comment 7 Tal Nisan 2017-03-01 13:00:30 UTC
Fixed and verified by looking the Libvirt XML, I've cut off the irrelevant part

Before the fix with disks named 1_Disk1, 1_Disk2, 1_Disk3 disk order is as following:
1_Disk1
1_Disk2
1_Disk3
        <disk device="disk" snapshot="no" type="file">
            <target bus="scsi" dev="sda" />
            <serial>935f9965-3a0f-497e-a085-861e245514f1</serial>
        </disk>
        <disk device="disk" snapshot="no" type="file">
            <target bus="scsi" dev="sdb" />
            <serial>129e2893-0698-4007-8a5e-6627c80ecda8</serial>
        </disk>
        <disk device="disk" snapshot="no" type="file">
            <target bus="scsi" dev="sdc" />
            <serial>88ccbe07-10d2-4152-be72-34bf18a8735f</serial>
        </disk>

Before the fix with disks named 1_Disk1, 1_Disk2, 1_Disk10 disk order is as following and faulty:
1_Disk1
1_Disk10
1_Disk3
        <disk device="disk" snapshot="no" type="file">
            <target bus="scsi" dev="sda" />
            <serial>935f9965-3a0f-497e-a085-861e245514f1</serial>
        </disk>
        <disk device="disk" snapshot="no" type="file">
            <target bus="scsi" dev="sdb" />
            <serial>88ccbe07-10d2-4152-be72-34bf18a8735f</serial>
        </disk>
        <disk device="disk" snapshot="no" type="file">
            <target bus="scsi" dev="sdc" />
            <serial>129e2893-0698-4007-8a5e-6627c80ecda8</serial>
        </disk>

After the fix with disks named 1_Disk1, 1_Disk2, 1_Disk10 disk order is as following:
1_Disk1
1_Disk10
1_Disk3
        <disk device="disk" snapshot="no" type="file">
            <target bus="scsi" dev="sda" />
            <serial>935f9965-3a0f-497e-a085-861e245514f1</serial>
        </disk>
        <disk device="disk" snapshot="no" type="file">
            <target bus="scsi" dev="sdb" />
            <serial>129e2893-0698-4007-8a5e-6627c80ecda8</serial>
        </disk>
        <disk device="disk" snapshot="no" type="file">
            <target bus="scsi" dev="sdc" />
            <serial>88ccbe07-10d2-4152-be72-34bf18a8735f</serial>
        </disk>

Comment 8 Kevin Alon Goldblatt 2017-03-09 18:53:21 UTC
Verified with the following code:
--------------------------------------------
ovirt-engine-4.1.1.3-0.1.el7.noarc
rhevm-4.1.1.3-0.1.el7.noarch
vdsm-4.19.7-1.el7ev.x86_64


Verfied with the following scenario:
-------------------------------------------
1. Create a VM and install RHEL OS, stop the VM.
2. Create and attach disks with the following names and sizes to the guest: disk1 (1G), disk2 (2G), disk10 (10G). Use the mentioned sizes to correlate later in the guest between disk name and size
3. Start the VM
4. Check 'lsblk' on the guest

disk3 10g is /dev/sda
disk2  2g is /dev/sdb
disk1  1g is /dev/sdc

This should be as follows:
disk1  1g should be /dev/sda
disk2  2g should be /dev/sdb
disk3 10g should be /dev/sdc

Moving to ASSIGNED!

Comment 9 Red Hat Bugzilla Rules Engine 2017-03-09 18:53:27 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 10 Kevin Alon Goldblatt 2017-03-09 18:58:36 UTC
Created attachment 1261679 [details]
server, engine, vdsm logs

Adding logs

Comment 11 Tal Nisan 2017-03-13 13:09:14 UTC
Kevin, the logs you've attached are spanning 17 hours of activity, why are they not trimmed from the moment you created the VM until after you ran it?

Comment 12 Tal Nisan 2017-03-13 13:11:45 UTC
Also it seems that the VDSM log is not from the host that was used to run the VM so in that bug's case it's quite useless

Comment 13 Tal Nisan 2017-03-13 13:49:37 UTC
'I need to correlate the disks aliases and ids, please send the results of the query "select disk_id, disk_alias from base_disks;"

Comment 14 Allon Mureinik 2017-03-15 13:57:57 UTC
Pushing out to 4.1.2 to complete the investigation. This isn't a blocker for 4.1.1

Comment 15 Kevin Alon Goldblatt 2017-03-19 14:10:04 UTC
(In reply to Tal Nisan from comment #12)
> Also it seems that the VDSM log is not from the host that was used to run
> the VM so in that bug's case it's quite useless

Tal there was only one vdsm host here so it definitely contains the correct vdsm log

Comment 16 Kevin Alon Goldblatt 2017-03-19 14:12:07 UTC
(In reply to Tal Nisan from comment #13)
> 'I need to correlate the disks aliases and ids, please send the results of
> the query "select disk_id, disk_alias from base_disks;"

              disk_id                |     disk_alias     
--------------------------------------+--------------------
 ac052ca4-4b56-4b55-a2ff-b78095233e66 | OVF_STORE
 a1b3b176-b81a-43b5-a6d4-7c696e7f2a8d | OVF_STORE
 445cedc2-a601-471e-beb6-10eb3e77d345 | GlanceDisk-9c3e573
 1ece6f4f-e0b7-48ef-ab00-a62b873313e9 | vm1_Disk1
 b18f18d6-75f9-4b02-ad37-1080a1989459 | vm1_Disk2
 446580b4-ca3c-49aa-95e0-0d5966e51aca | vm1_Disk3
 47d4a98f-8d91-4f10-a9ce-702296c9d620 | GlanceDisk-9c3e573
 283d2da2-56cd-4365-8681-79e494e609f7 | d1
 45c0d935-205a-4ac0-b85d-55b3371a5445 | d2
 ab066eb6-6339-48e9-afe7-81669abca7bb | d3
 432245de-2555-474b-991b-e87efeef2323 | d4
 c5edfbfd-175d-4571-9b54-cf3286de55d5 | vm2_Disk1
 bd9b7a6e-8c3e-4748-bbb9-e42902e5b0a9 | vm2_Disk1
 8a5bf60c-d4db-4114-a87d-1ad82d94cb26 | cephvol1
 e5b59815-5f52-4566-be2f-01aa2e38b33c | cephvol1
 8e9f3a13-1212-4289-ad50-1a0a463a397e | vm_local1_Disk1
 a0a3f6e2-6fc5-4715-967f-448e4dc2fe1f | vm_local2_Disk1
 41a77246-86c9-4a6d-8aac-60b4678a6790 | GlanceDisk-9c3e573
 20bee0ae-ecb3-485b-be80-62d47908da1f | GlanceDisk-9c3e573
 cdbb4df3-9b14-49ee-a020-3119db2b5729 | GlanceDisk-9c3e573

Comment 17 Tal Nisan 2017-03-20 10:34:27 UTC
Looking at the Libvirt XML, this is the drive info sent:
            <target bus="scsi" dev="sda" />
            <serial>1ece6f4f-e0b7-48ef-ab00-a62b873313e9</serial>

            <target bus="scsi" dev="sdb" />
            <serial>b18f18d6-75f9-4b02-ad37-1080a1989459</serial>

            <target bus="scsi" dev="sdc" />
            <serial>446580b4-ca3c-49aa-95e0-0d5966e51aca</serial>

Which corresponds to the following order:
vm1_Disk1
vm1_Disk2
vm1_Disk3

Which means that we are sending the correct order and that the bug was fixed as it should, we don't have much to do about the fact that inside the guest the order is not as we sent as Libvirt clearly states that the order is merely a hint:

"
<target>
The target element controls the bus / device under which the disk is exposed to the guest OS. The dev attribute indicates the "logical" device name. The actual device name specified is not guaranteed to map to the device name in the guest OS. Treat it as a device ordering hint
"

So this bug should be verified

Comment 18 Kevin Alon Goldblatt 2017-03-21 13:01:12 UTC
Verified with the following code:
--------------------------------------------
ovirt-engine-4.1.1.3-0.1.el7.noarc
rhevm-4.1.1.3-0.1.el7.noarch
vdsm-4.19.7-1.el7ev.x86_64


Verfied with the following scenario:
-------------------------------------------
1. Create a VM and install RHEL OS, stop the VM.
2. Create and attach disks with the following names and sizes to the guest: disk1 (1G), disk2 (2G), disk10 (10G). Use the mentioned sizes to correlate later in the guest between disk name and size
3. Start the VM
4. Check 'lsblk' on the guest


Based on comment17 - Moving to VERIFIED!