Bug 1846975 - Failed to boot up a s390x guest with virtio-blk-ccw if attaching a virtio-scsi-ccw bus in previous
Summary: Failed to boot up a s390x guest with virtio-blk-ccw if attaching a virtio-scs...
Keywords:
Status: ON_QA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: qemu-kvm
Version: 8.3
Hardware: s390x
OS: Linux
low
medium
Target Milestone: rc
: 8.4
Assignee: Thomas Huth
QA Contact: virt-qe-z
URL:
Whiteboard:
Depends On:
Blocks: 1796871 1842946
TreeView+ depends on / blocked
 
Reported: 2020-06-15 10:02 UTC by Gu Nini
Modified: 2020-10-20 04:00 UTC (History)
10 users (show)

Fixed In Version: qemu-kvm-4.2.0-35.module+el8.4.0+8453+f5da6c50
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
IBM Linux Technology Center 187001 None None None 2020-07-21 13:59:40 UTC
Red Hat Bugzilla 1653554 medium CLOSED Pass more devices via the interface between QEMU and firmware while booting up a guest 2020-10-14 00:28:05 UTC
Red Hat Bugzilla 1846960 medium VERIFIED Failed to boot up a s390x guest without attaching 'bootindex=' 2020-10-19 20:17:21 UTC

Description Gu Nini 2020-06-15 10:02:45 UTC
Description of problem:
When boot up a guest with a virtio-scsi-ccw bus in previous of the virtio-blk-ccw, which is for the system disk device, the guest failed to boot up with following serial output:

# nc -U /var/tmp/avocado_2
LOADPARM=[        ]
Using virtio-scsi.

! Cannot locate virtio-scsi device !

Ncat: Broken pipe.


Version-Release number of selected component (if applicable):
Host kernel: 4.18.0-211.el8.s390x
Guest kernel: 4.18.0-211.el8.s390x
Qemu: qemu-kvm-4.2.0-19.module+el8.3.0+6473+93e27135.s390x

How reproducible:
100%

Steps to Reproduce:
1. Boot up a guest with following qemu command line, please note the virtio-scsi-ccw bus is specified in previous of the virtio-blk-ccw device:

/usr/libexec/qemu-kvm \
    -S \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine s390-ccw-virtio  \
    -nodefaults  \
    -vga none \
    -m 3072  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'host' \
    -chardev socket,id=qmp_id_qmpmonitor1,server,path=/var/tmp/avocado_1,nowait  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,id=chardev_serial0,server,path=/var/tmp/avocado_2,nowait \
    -device sclpconsole,id=serial0,chardev=chardev_serial0 \
    -device virtio-scsi-ccw,id=virtio_scsi_ccw0 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/ngu/rhel830-s390x-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device virtio-blk-ccw,id=image1,drive=drive_image1,write-cache=on,bootindex=1 \
    -device virtio-net-ccw,mac=9a:9f:19:fe:49:13,id=idh8Nq82,netdev=idVx9QzC  \
    -netdev tap,id=idVx9QzC,vhost=on  \
    -nographic  \
    -rtc base=utc \
    -boot strict=on \
    -enable-kvm \
    -device virtio-mouse-ccw,id=input_mouse1 \
    -device virtio-keyboard-ccw,id=input_keyboard1 \
    -monitor stdio

2. Connect its serial and issue 'cont' in hmp
# nc -U /var/tmp/avocado_2

sh vm1.sh
QEMU 4.2.0 monitor - type 'help' for more information
(qemu)  
(qemu) 
(qemu) cont

3. Check the serial output.


Actual results:
The guest failed to boot up and quited automatically and there is following serial output:

# nc -U /var/tmp/avocado_2
LOADPARM=[        ]
Using virtio-scsi.

! Cannot locate virtio-scsi device !

Ncat: Broken pipe.


Expected results:
The guest could boot up successfullly.

Additional info:
Please note the bug is different from bz1846960. It could also be reproduced on guest kernel 4.18.0-193.8.1.el8_2.s390 and 4.18.0-203.el8.s390x, so it's not a regression.

Comment 1 Gu Nini 2020-06-15 10:08:50 UTC
(In reply to Gu Nini from comment #0)
> Description of problem:
> When boot up a guest with a virtio-scsi-ccw bus in previous of the
> virtio-blk-ccw, which is for the system disk device, the guest failed to
> boot up with following serial output:

Please note there is no 'bootindex=' specified in the 'virtio-blk-ccw' device. If it's specified, there is no the bug.

> 
> # nc -U /var/tmp/avocado_2
> LOADPARM=[        ]
> Using virtio-scsi.
> 
> ! Cannot locate virtio-scsi device !
> 
> Ncat: Broken pipe.
> 
> 
> Version-Release number of selected component (if applicable):
> Host kernel: 4.18.0-211.el8.s390x
> Guest kernel: 4.18.0-211.el8.s390x
> Qemu: qemu-kvm-4.2.0-19.module+el8.3.0+6473+93e27135.s390x
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. Boot up a guest with following qemu command line, please note the
> virtio-scsi-ccw bus is specified in previous of the virtio-blk-ccw device:
> 
> /usr/libexec/qemu-kvm \
>     -S \
>     -name 'avocado-vt-vm1'  \
>     -sandbox on  \
>     -machine s390-ccw-virtio  \
>     -nodefaults  \
>     -vga none \
>     -m 3072  \
>     -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
>     -cpu 'host' \
>     -chardev
> socket,id=qmp_id_qmpmonitor1,server,path=/var/tmp/avocado_1,nowait  \
>     -mon chardev=qmp_id_qmpmonitor1,mode=control \
>     -chardev socket,id=chardev_serial0,server,path=/var/tmp/avocado_2,nowait
> \
>     -device sclpconsole,id=serial0,chardev=chardev_serial0 \
>     -device virtio-scsi-ccw,id=virtio_scsi_ccw0 \
>     -blockdev
> node-name=file_image1,driver=file,aio=threads,filename=/home/ngu/rhel830-
> s390x-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
>     -blockdev
> node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,
> file=file_image1 \
>     -device
> virtio-blk-ccw,id=image1,drive=drive_image1,write-cache=on,bootindex=1 \

Please neglect 'bootindex=1', this is a key factor for the bug. Sorry for the mistake.

>     -device virtio-net-ccw,mac=9a:9f:19:fe:49:13,id=idh8Nq82,netdev=idVx9QzC
> \
>     -netdev tap,id=idVx9QzC,vhost=on  \
>     -nographic  \
>     -rtc base=utc \
>     -boot strict=on \
>     -enable-kvm \
>     -device virtio-mouse-ccw,id=input_mouse1 \
>     -device virtio-keyboard-ccw,id=input_keyboard1 \
>     -monitor stdio
> 
> 2. Connect its serial and issue 'cont' in hmp
> # nc -U /var/tmp/avocado_2
> 
> sh vm1.sh
> QEMU 4.2.0 monitor - type 'help' for more information
> (qemu)  
> (qemu) 
> (qemu) cont
> 
> 3. Check the serial output.
> 
> 
> Actual results:
> The guest failed to boot up and quited automatically and there is following
> serial output:
> 
> # nc -U /var/tmp/avocado_2
> LOADPARM=[        ]
> Using virtio-scsi.
> 
> ! Cannot locate virtio-scsi device !
> 
> Ncat: Broken pipe.
> 
> 
> Expected results:
> The guest could boot up successfullly.
> 
> Additional info:
> Please note the bug is different from bz1846960. It could also be reproduced
> on guest kernel 4.18.0-193.8.1.el8_2.s390 and 4.18.0-203.el8.s390x, so it's
> not a regression.

Comment 2 Thomas Huth 2020-06-15 13:10:26 UTC
I can reproduce the problem, using a slightly simplified command line:

/usr/libexec/qemu-kvm -m 2G -device virtio-scsi-ccw \
  -blockdev node-name=file_image1,driver=file,filename=/autofs/s390x_team_storage/thuth/images/rhel8x11.qcow2 \
  -blockdev node-name=drive_image1,driver=qcow2,file=file_image1 \
  -device virtio-blk-ccw,id=image1,drive=drive_image1 -nographic

Guest output:

 LOADPARM=[        ]
 Using virtio-scsi.

 ! Cannot locate virtio-scsi device !

The problem goes away if I either remove the "-device virtio-scsi-ccw" or add a "bootindex=1" to the  virtio-blk-ccw device.

Comment 3 Thomas Huth 2020-06-17 13:13:55 UTC
I think the problem is in the s390-ccw bios of the guest. If there is a virtio-scsi device and no boot index, it only scans the SCSI bus but does not look for other boot devices like virtio-blk anymore. Thus moving the component from "kernel" to "qemu-kvm".

Comment 5 smitterl 2020-06-30 09:03:53 UTC
(In reply to Thomas Huth from comment #3)
> I think the problem is in the s390-ccw bios of the guest. If there is a
> virtio-scsi device and no boot index, it only scans the SCSI bus but does
> not look for other boot devices like virtio-blk anymore. Thus moving the
> component from "kernel" to "qemu-kvm".

Thomas, can you confirm the expected behavior will be:

Given I start the vm both with a virtio-blk and a virtio-scsi device
But only the virtio-blk device is bootable
When I start the vm
Then it boots from the virtio-blk device

Comment 6 Thomas Huth 2020-06-30 09:33:27 UTC
(In reply to smitterl from comment #5)
> Given I start the vm both with a virtio-blk and a virtio-scsi device
> But only the virtio-blk device is bootable
> When I start the vm
> Then it boots from the virtio-blk device

Yes, I think that would be the most userfriendly behavior. But please note that this bug is not a regression, it has always been like the current behavior that the guest does not boot if you specify a virtio-scsi controller first, and a bootable virtio-blk device second (without "bootindex"). So I currently don't treat this as a high priority problem right now.

Actually, I had a look at the source code, and the problem is even somewhat worse: The guest e.g. also does not boot if you specify two virtio-blk devices where only the second one is bootable. The s390-ccw bios simply stops at the first block device (or scsi controller) that it finds and does not try to continue booting from other devices if there is a failure. To fix this problem, I think I need to rewrite quite a bit of the logic in the s390-ccw bios, so this will likely take a while 'till it's done.

Comment 7 smitterl 2020-06-30 09:46:06 UTC
(In reply to Thomas Huth from comment #6)
> (In reply to smitterl from comment #5)
> > Given I start the vm both with a virtio-blk and a virtio-scsi device
> > But only the virtio-blk device is bootable
> > When I start the vm
> > Then it boots from the virtio-blk device
> 
> Yes, I think that would be the most userfriendly behavior. But please note
> that this bug is not a regression, it has always been like the current
> behavior that the guest does not boot if you specify a virtio-scsi
> controller first, and a bootable virtio-blk device second (without
> "bootindex"). So I currently don't treat this as a high priority problem
> right now.
> 
> Actually, I had a look at the source code, and the problem is even somewhat
> worse: The guest e.g. also does not boot if you specify two virtio-blk
> devices where only the second one is bootable. The s390-ccw bios simply
> stops at the first block device (or scsi controller) that it finds and does
> not try to continue booting from other devices if there is a failure. To fix
> this problem, I think I need to rewrite quite a bit of the logic in the
> s390-ccw bios, so this will likely take a while 'till it's done.

Thank you very much for the quick answer, Thomas. We'll plan accordingly.

Comment 9 IBM Bug Proxy 2020-07-21 15:11:58 UTC
------- Comment From MIHAJLOV@de.ibm.com 2020-07-21 11:03 EDT-------
Please keep in mind that the s390x architecture only supports a single boot device, namely the one specified using Diagnose 308. I agree that this should be enhanced, but this should be within the architecture (which might need extensions). Adding some ad-hoc traversal of devices is NOT the way to go, as it is likely to not respect the guest's intentions.

Comment 11 Thomas Huth 2020-07-22 06:13:24 UTC
(In reply to IBM Bug Proxy from comment #9)
> ------- Comment From MIHAJLOV@de.ibm.com 2020-07-21 11:03 EDT-------
> Adding some ad-hoc traversal of devices is NOT the way to
> go, as it is likely to not respect the guest's intentions.

Well, the s390-ccw bios is already searching through all available devices if no "bootindex" has been specified. And I think that's a good idea since this is what is also done by other firmware implementations on x86 and ppc64, so this is what most users likely expect. The confusing thing is that the s390-ccw bios stops at the very first device that it finds. If there is no bootable device here (e.g. because it is a scsi controller without attached scsi disks), it simply refuses to boot. All other firmware implementations that I've seen so far continue to look at the other available devices in that case. So I think the s390-ccw bios should do it as well to close yet another gap between s390x and other architectures.

Comment 12 IBM Bug Proxy 2020-07-22 08:31:14 UTC
------- Comment From MIHAJLOV@de.ibm.com 2020-07-22 04:28 EDT-------
Right, but this is a legacy behavior (of the BIOS) for a corner case not covered by the real architecture, where you HAVE to enter a load device (through the HMC or the OS through Diag308). If the load device can't be IPLed from, the system has to go into disabled wait.
Adding the possibility to traverse a boot device list is our goal as well. We should however to it in a way that maintains architecture-compliance, which will definitely require changes in the architecture, a todo for IBM. Otherwise effort would be spent to only cover the said corner-case.

Comment 13 Thomas Huth 2020-07-29 07:06:34 UTC
(In reply to IBM Bug Proxy from comment #12)
> ------- Comment From MIHAJLOV@de.ibm.com 2020-07-22 04:28 EDT-------
> Right, but this is a legacy behavior (of the BIOS) for a corner case not
> covered by the real architecture, where you HAVE to enter a load device
> (through the HMC or the OS through Diag308). If the load device can't be
> IPLed from, the system has to go into disabled wait.

That's maybe what traditional s390x admins expect, but it is very confusing for everybody who is used to QEMU/KVM on one of the other major architectures (x86, POWER, ...). So I think it is worthwhile to improve the booting in the most confusing situations at least (e.g. when specifying an "empty" virtio-scsi controller followed by a bootable virtio-blk disk). Most of the related code reworks (e.g. not panic()-ing anymore in the low-level routines if a non-bootable device has been found) will be needed anyway if you add the "bootindex" boot device list traversal later.

So I've suggested a patch series for this problem now here:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg08015.html


Note You need to log in before you can comment on or make changes to this bug.