Bug 1410391

Summary: [ppc64le]SLOF: with "-device virtio-blk-pci,bootindex=0 -boot order=cdn,once=n,menu=off,strict=off", after boot from virtio-blk-pci failed, slof don't change to boot from network
Product: Red Hat Enterprise Linux 7 Reporter: xianwang <xianwang>
Component: SLOFAssignee: Thomas Huth <thuth>
Status: CLOSED ERRATA QA Contact: xianwang <xianwang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: knoel, michen, mrezanin, qzhang, thuth, virt-maint, yhong, zhengtli
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: SLOF-20170303 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 22:33:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description xianwang 2017-01-05 11:50:12 UTC
Description of problem:
qemu cli:

-drive id=drive_image1,if=none,format=qcow2,snapshot=off,file=/root/r.qcow2 \
-device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1 \
-boot order=cdn,once=n,menu=off,strict=off \

Slof will firstly boot from virtio-blk-pci,bootindex=0, if this
device is not bootable or if the disk is an empty disk  or boot failed,
slof will stop to boot, it will not change to other boot method;

Version-Release number of selected component (if applicable):
Host:
3.10.0-514.el7.ppc64le
qemu-kvm-rhev-2.6.0-28.el7_3.2.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch

Guest:
3.10.0-514.6.1.el7.ppc64le

How reproducible:
100%

Steps to Reproduce:
1.boot a guest, with an empty disk r.qcow2 and specify this device bootindex=1;
/usr/libexec/qemu-kvm \
    -name 'RHEL7.3-vm1'  \
    -machine pseries \
    -nodefaults  \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=03  \
    -device usb-ehci,id=usb1,bus=pci.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0 \
    -drive id=drive_image1,if=none,format=qcow2,snapshot=off,file=/root/r.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1 \
    -device virtio-net-pci,mac=9a:7b:7c:7d:7e:71,id=idtlLxAk,vectors=4,netdev=idlkwV8e,bus=pci.0  \
    -netdev tap,id=idlkwV8e,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 \
    -m 2G  \
    -smp 4,maxcpus=8 \
    -cpu host \
    -vga std \
    -vnc :1 \
    -qmp tcp:0:8881,server,nowait \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=n,menu=off,strict=off \
    -monitor stdio \
    -enable-kvm \
    -usb -device usb-host,hostbus=1,hostaddr=3,id=usb0 \

2.
3.

Actual results:
(1)Slof will firstly boot from virtio-blk-pci,bootindex=1, if this
device is not bootable or if the disk is an empty disk  or boot failed,
slof will stop to boot, it will not change to other boot method;

(2)If remove the "bootindex", the slof will firstly boot from network
because of "once=n", if boot from network failed, it will stop; Then  if
reboot, guest will boot as order=cdn;


Expected results:
if the disk that "virtio-blk-pci,bootindex=1" is an empty disk  or boot failed,
slof will try to boot from network because "once=n"

Additional info:

Comment 2 Thomas Huth 2017-01-08 08:05:34 UTC
Just a note: According to
https://github.com/qemu/qemu/blob/master/docs/bootindex.txt :

"If the bootindex property is not set for a device, it gets
lowest boot priority."

Since we're using "strict=off" here, I think SLOF should try to boot from the NIC anyway after failing to boot from the block device, even without "once=n".

Comment 3 Thomas Huth 2017-01-11 07:28:18 UTC
FWIW, here are some technical details what's going on here: QEMU builds a list with all devices that have the "bootindex" parameter and passes it to SLOF in the /chosen/qemu,boot-list device tree property. The value from "-boot order=xxx" (or the value from "-boot once=x" during the first boot) gets passed in /chosen/qemu,boot-device instead.

Now, when SLOF detects the /chosen/qemu,boot-list property, it currently only uses this list for booting, and completely ignores /chosen/qemu,boot-device. I think we've got to fix this so that SLOF combines both approaches instead (or "once=x" will never work properly): It should look at /chosen/qemu,boot-device first to see which *classes* of devices should be considered for booting. Devices of these classes should then get extracted from the /chosen/qemu,boot-list property, so that they are considered for booting in the right order according to their "bootindex" parameter. Then, if we started QEMU with "strict=off", we should finally also add the other remaining devices to the SLOF-internal list of possible boot devices.

Comment 4 Thomas Huth 2017-01-14 07:10:26 UTC
I've now suggested a patch on the SLOF mailing list:
https://lists.ozlabs.org/pipermail/slof/2017-January/001426.html

Comment 5 Thomas Huth 2017-02-06 11:00:18 UTC
The SLOF maintainer suggested to fix this issue in QEMU instead, so I'm changing to the component to qemu-kvm-rhev now.

Comment 6 Thomas Huth 2017-03-03 08:22:01 UTC
Finally, at least my patch that fixes the "strict=off" problem has been accepted in SLOF:
http://git.qemu-project.org/?p=SLOF.git;a=commitdiff;h=ef5286f020d850f47fe196297f673769f6d63198
... so moving the component back to SLOF now.

Comment 7 Thomas Huth 2017-03-03 08:26:16 UTC
We'll get the fix via rebase to SLOF-20170303

Comment 8 Miroslav Rezanina 2017-03-14 13:51:56 UTC
Fixed by rebase

Comment 10 Yongxue Hong 2017-03-23 08:15:26 UTC
The following is the step of verification:

1.Version:
Host:3.10.0-623.el7.ppc64le
Qemu:qemu-kvm-rhev-2.9.0-0.el7.mrezanin201703210848
SLOF:SLOF.noarch  20170303-1.git66d250e.el7

2.Steps to Verify:
Same to the top Description

3.Actual results:
SLOF **********************************************************************
QEMU Starting
 Build Date = Mar 14 2017 08:36:17
 FW Version = mockbuild@ release 20170303
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /pci@800000020000000
                     00 0000 (D) : 1234 1111    qemu vga
                     00 0800 (D) : 1033 0194    serial bus [ usb-xhci ]
                     00 1000 (D) : 1af4 1004    virtio [ scsi ]
Populating /pci@800000020000000/scsi@2
       SCSI: Looking for devices
                     00 1800 (D) : 1af4 1003    virtio [ serial ]
                     00 2000 (D) : 1af4 1001    virtio [ block ]
                     00 2800 (D) : 1af4 1000    virtio [ net ]
                     00 4800 (D) : 1af4 1002    unknown-legacy-device*
Installing QEMU fb
Scanning USB 
  XHCI: Initializing
    USB Keyboard 
    USB mouse 
    USB Storage 
       SCSI: Looking for devices
USB-DISK: Bulk commad failed!
USB-DISK: Bulk commad failed!
No console specified using screen & keyboard
   Welcome to Open Firmware

  Copyright (c) 2004, 2011 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php
Trying to load:  from: /pci@800000020000000/scsi@4 ... 
E3404: Not a bootable device!
Trying to load:  from: /pci@800000020000000/scsi@4 ... 
E3404: Not a bootable device!
Trying to load:  from: cdrom ... 
E3405: No such device
Trying to load:  from: /pci@800000020000000/ethernet@5 ... 
 Initializing NIC
  Reading MAC address from device: 9a:7b:7c:7d:7e:71
  Requesting information via DHCP: done
  Using IPv4 address: 10.19.112.128
  Requesting file "pxelinux.0" via TFTP from 10.19.42.13
  Receiving data:  25 KBytes
  TFTP: Received pxelinux.0 (25 KBytes)
E3403: Bad executable:   No boot partition found
E3406: Client application returned an error:    No boot partition found

        ..`. ..     .......  ..           ......      .......
    ..`...`''.`'. .''``````..''.       .`''```''`.  `''``````
       .`` .:' ': `''.....  .''.       ''`     .''..''.......
         ``.':.';. ``````''`.''.      .''.      ''``''`````'`
         ``.':':`   .....`''.`'`...... `'`.....`''.`'`       
        .`.`'``   .'`'`````.  ``''''''  ``''`'''`. `'`       
  Type 'boot'  and press return  to  continue  booting  the system.
  Type 'reset-all'  and  press  return  to   reboot   the   system.


        ..`. ..     .......  ..           ......      .......
    ..`...`''.`'. .''``````..''.       .`''```''`.  `''``````
       .`` .:' ': `''.....  .''.       ''`     .''..''.......
         ``.':.';. ``````''`.''.      .''.      ''``''`````'`
         ``.':':`   .....`''.`'`...... `'`.....`''.`'`       
        .`.`'``   .'`'`````.  ``''''''  ``''`'''`. `'`       
  Type 'boot'  and press return  to  continue  booting  the system.
  Type 'reset-all'  and  press  return  to   reboot   the   system.
Ready! 
0 > 

The guest try to boot from network,so this bug is fixed, and change the status to verified.

Comment 11 errata-xmlrpc 2017-08-01 22:33:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2093