Bug 1474730

Summary: Can't boot from a migrated/cloned boot disk
Product: Red Hat Enterprise Linux 7 Reporter: Nikola <nmarjano>
Component: virt-managerAssignee: Pavel Hrdina <phrdina>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: low Docs Contact:
Priority: medium    
Version: 7.3CC: aliang, chayang, coli, famz, hhuang, juzhang, knoel, michen, pbonzini, phrdina, qzhang, rbalakri, syangsao, virt-maint, xuwei
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-05 13:45:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nikola 2017-07-25 09:33:44 UTC
Description of problem:

When migrating /boot from sda1 to sdb1 KVM system fails to boot with "No bootable device found" when directed to boot from sdb1.
When doing the same process using IDE disks - system boots successfully.

Workaround: changing the disk bus from SCSI to IDE (or VirtIO) will allow the system to boot. Also, changing it back from IDE to SCSI will allow the system to boot.


Version-Release number of selected component (if applicable):

RHEL 7.3, 3.10.0-514.21.2.el7.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
libvirt-2.0.0-10.el7_3.9.x86_64
virt-manager-1.4.0-2.el7.noarch

Disk layout:
sda             8:0    0     1G  0 disk
└─sda1          8:1    0     1G  0 part /boot
sdb             8:16   0     1G  0 disk 
sdc             8:32   0     8G  0 disk 
└─sdc1          8:33   0     7G  0 part
  ├─rhel-root 253:0    0     6G  0 lvm  /
  └─rhel-swap 253:1    0     1G  0 lvm  [SWAP]

sda - original (old) boot device
sdb - new device to inherit the old boot device function
sdc - root LVM


Steps to Reproduce:
1. Create a boot partition on the new disk:
# sfdisk -d /dev/sda |grep -E 'sectors|bootable'|sfdisk --force /dev/sdb
2. Clone the /boot partition:
# dd if=/dev/sda1 of=/dev/sdb1 bs=512 conv=noerror,sync
3. Installing the bootloader on the new disk
# grub2-install /dev/sdb (this is done in rescue environment)
4. Detach sda device and reboot

Actual results:
System does not boot (no bootable device found), unless the workaround is applied (changing the disk bus to IDE or VirtIO via VMM).

Expected results:
System boots successfully, eventually allowing the permanent removal of the old boot device, while the new one takes its place.

Additional info:
Reproduced on RHEL 7 and Fedora 25. Both times using Virtual Manager (VMM)

Comment 2 Ademar Reis 2017-07-26 13:21:50 UTC
CongLi, can you please validate the testing by reproducing it in our QE environment? Testing this with RHEL-7.4 latest packages should be enough.

Comment 3 CongLi 2017-07-31 13:00:57 UTC
Reproduced this bug on the following version:
kernel-3.10.0-693.el7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.3.x86_64
libvirt-3.2.0-14.el7.x86_64
virt-manager-1.4.1-7.el7.noarch

The steps are same as comment 0 via virt-manager.
1. virtio-scsi  --> can not boot up
2. virtio-blk   --> boot up successfully
3. ide          --> boot up successfully
4. virtio-scsi(failed) -> ide(successfully) -> virtio-scsi(successfully)


Thanks.

Comment 4 Fam Zheng 2017-08-17 12:49:23 UTC
Seems a duplicate of bug 1020622? Cong, could you please retest with seabios-1.10.2-3.el7 or above?

Comment 5 Sam Yangsao 2017-08-17 19:53:37 UTC
Just verified this occurs in RHV 4.1, and the workaround addresses this same issue, please test there as well.

Let me know if you need packaging info and if I should file a separate bz.

Thanks much.

Comment 6 CongLi 2017-08-18 02:47:22 UTC
(In reply to Fam Zheng from comment #4)
> Seems a duplicate of bug 1020622? Cong, could you please retest with
> seabios-1.10.2-3.el7 or above?

could reproduce this bug with seabios-1.10.2-3.el7.x86_64.

Comment 7 CongLi 2017-08-18 02:51:31 UTC
(In reply to Sam Yangsao from comment #5)
> Just verified this occurs in RHV 4.1, and the workaround addresses this same
> issue, please test there as well.

Could you reproduce this problem with ide drive for the boot disk?

If yes, please help provide your qemu, seabios, libvirt and virt-manager versions, I will have a try.

Thanks.

> Let me know if you need packaging info and if I should file a separate bz.
> 
> Thanks much.

Comment 8 Fam Zheng 2017-08-18 10:50:23 UTC
Looks like a virt-manager/libvirt issue. After the steps, bootindex= property is set on the initial disks but not on the cloned one.

Setting "<boot order=... />" attributes to disks explicitly with "virsh edit" fixes it, as long as the seabios is new enough to handle booting from non-zero LUN, or alternatively set the new boot hd as LUN 0.

FYI, the reproducer I have yields this final QEMU command line:

...

-drive file=/stor/images/3.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-1,cache=unsafe -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1,bootindex=1 -drive file=/stor/images/2.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-2 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-0-2,id=scsi0-0-0-2
...

Which corresponds to this libvirt xml:

  <os>
    <boot dev='hd'/>
  </os>

  ...

     <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='unsafe'/>
      <source file='/stor/images/3.qcow2'/>
      <target dev='sdb' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/stor/images/2.qcow2'/>
      <target dev='sdc' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>

As said above, the workaround is removing the <boot dev='hd' /> line and add <boot order="..." /> lines to each <disk> nodes. For old seabios it is also necessary to change the boot disk's "unit='...'" to "unit='0'".

Reassigning to virt-manager for further investigation.

Comment 10 Sam Yangsao 2017-08-21 14:14:32 UTC
(In reply to CongLi from comment #7)
> (In reply to Sam Yangsao from comment #5)
> > Just verified this occurs in RHV 4.1, and the workaround addresses this same
> > issue, please test there as well.
> 
> Could you reproduce this problem with ide drive for the boot disk?
> 
> If yes, please help provide your qemu, seabios, libvirt and virt-manager
> versions, I will have a try.
> 
> Thanks.
> 
> > Let me know if you need packaging info and if I should file a separate bz.
> > 
> > Thanks much.

Original first disk was virtio-scsi.  I added a second disk that was also virtio-scsi to test disk mirroring.  When I de-activated the first disk and attempted to boot off the second disk, it failed.  Changed the second disk to "ide", the OS booted up fine.

Here is the list of packages from my RHEL-H:

# rpm -qa |grep qemu
ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.10.x86_64
libvirt-daemon-driver-qemu-2.0.0-10.el7_3.9.x86_64
qemu-kvm-common-rhev-2.6.0-28.el7_3.10.x86_64
qemu-kvm-tools-rhev-2.6.0-28.el7_3.10.x86_64
qemu-img-rhev-2.6.0-28.el7_3.10.x86_64

# rpm -qa |grep seabios
seabios-bin-1.9.1-5.el7_3.3.noarch

# rpm -qa |grep virt
fence-virt-0.3.2-5.el7.x86_64
libvirt-daemon-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-config-nwfilter-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-driver-nodedev-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-driver-qemu-2.0.0-10.el7_3.9.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
virt-what-1.13-8.el7.x86_64
ovirt-imageio-common-1.0.0-0.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
libvirt-client-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-driver-network-2.0.0-10.el7_3.9.x86_64
libvirt-python-2.0.0-2.el7.x86_64
libvirt-daemon-driver-secret-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-driver-interface-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-driver-storage-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-kvm-2.0.0-10.el7_3.9.x86_64
virt-v2v-1.32.7-3.el7_3.2.x86_64
collectd-virt-5.7.1-4.el7.x86_64
libvirt-daemon-driver-nwfilter-2.0.0-10.el7_3.9.x86_64
libvirt-lock-sanlock-2.0.0-10.el7_3.9.x86_64
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch

# uname -a
Linux XXXX 3.10.0-514.21.2.el7.x86_64 #1 SMP Sun May 28

Comment 11 Paolo Bonzini 2017-08-22 16:44:44 UTC
Sam, what is the QEMU command line when boot fails?

It may be that you are booting from non-zero LUNs, which was only fixed in 7.4 (bug 1020622).

Comment 12 Sam Yangsao 2017-09-05 00:16:42 UTC
(In reply to Paolo Bonzini from comment #11)
> Sam, what is the QEMU command line when boot fails?
> 
> It may be that you are booting from non-zero LUNs, which was only fixed in
> 7.4 (bug 1020622).

Sorry, didn't see this comment, not sure how you can get the QEMU command line when using RHV

Comment 13 Pavel Hrdina 2017-09-05 06:38:48 UTC
Hi, so the issue is in virt-manager.  When installing the guest virt-manager uses old libvirt syntax which will always mark only first disk as bootable.  The fix would be updating virt-manager and virt-install code to always use the new per device syntax.

Old syntax:

  ...
  <os>
    ...
    <boot dev='hd'/>
    ...
  </os>

New syntax:

  ...
  <devices>
    ...
    <disk ...>
      ...
      <boot order='1'/>
      ...
    </disk>
    ...
  </devices>

In addition when the old syntax is used and there are multiple disk devices virt-manager incorrectly shows all disk devices as configured to be bootable.

There is a workaround, if you modify boot order using virt-manager it will use the new syntax, but only in that case.

Comment 14 Pavel Hrdina 2017-09-05 13:45:06 UTC
Disregard my previous comment :) I've give it some more testing and checked the QEMU command line and after installing the seabios-1.10.2-3.el7 it works.  It doesn't matter whether the old boot XML or new boot XML is used, the correct disk is always marked as bootable.  The issue is indeed with the LUN being different than 0.  I'm closing this bug as a duplicate of BZ 1020622.

For the virt-manager issue that it incorrectly shows all disk devices as bootable if the old XML syntax is used I've created an upstream BZ 1488480.

*** This bug has been marked as a duplicate of bug 1020622 ***