Bug 1757250

Summary: grub errors out with "failure reading sector" in ofdisk.c when booting a VM from an attached hard disk image on power9
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: SLOFAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 30CC: berrange, crobinso, crosa, dan, dgibson, jforbes, normand, pbonzini, pjones, rjones, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-03 22:08:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1071880    

Description Adam Williamson 2019-10-01 00:27:51 UTC
We just got a couple of new power9 boxes that are intended to be used as openQA worker hosts. Unfortunately, they don't seem to be able to boot VMs from a hard disk image attached as a virtio block device(?), which is how the 10% of openQA tests that *don't* run into https://bugzilla.redhat.com/show_bug.cgi?id=1757249 boot.

This scenario gets a bit further than 1757249 - we actually boot and successfully reach the grub menu. However, grub then chokes trying to boot the kernel, with a series of errors like this:

virtioblk_transfer failed! type=0, status = 1
[repeats several times]
error: ../../grub-core/disk/ieee1275/ofdisk.c:586:failure reading sector 0xNNNNNN from `ieee1275/disk'.

we see that repeated several times for different NNNNNN values, then finally "Press any key to continue..."

The qemu command looks like this:

/usr/bin/qemu-system-ppc64 -g 1024x768 -vga virtio -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -soundhw ac97 -global isa-fdc.driveA= -m 4096 -machine usb=off -cpu host -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -boot order=c,menu=on,splash-time=5000 -device nec-usb-xhci -device usb-tablet -device usb-kbd -smp 1 -enable-kvm -no-shutdown -vnc :93,share=force-shared -device virtio-serial -chardev socket,path=virtio_console,server,nowait,id=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev socket,path=qmp_socket,server,nowait,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-overlay0-file,filename=/var/lib/openqa/pool/3/raid/hd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0-overlay0,file=hd0-overlay0-file,cache.no-flush=on -device virtio-blk,id=hd0-device,drive=hd0-overlay0,bootindex=0,serial=hd0 -blockdev driver=file,node-name=cd0-overlay0-file,filename=/var/lib/openqa/pool/3/raid/cd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay0,file=cd0-overlay0-file,cache.no-flush=on -device scsi-cd,id=cd0-device,drive=cd0-overlay0,serial=cd0

I'm filing against openbios not grub for now as this seems to be specific to qemu and system-dependent: it does not happen on our existing power8 worker host box, that can run the same thing just fine. Only the new power9 worker hosts are affected.

CCing Cleber Rosa and David Gibson (at Cleber's suggestion). Also pjones for the grub side.

Comment 1 Adam Williamson 2019-10-01 00:28:16 UTC
As with the other bug, this is with:

[root@openqa-ppc64le-02 adamwill][PROD]# rpm -q qemu-system-ppc
qemu-system-ppc-3.1.1-2.fc30.ppc64le
[root@openqa-ppc64le-02 adamwill][PROD]# rpm -q openbios
openbios-20181005-2.git441a84d.fc30.noarch
[root@openqa-ppc64le-02 adamwill][PROD]#

Comment 2 David Gibson 2019-10-01 01:36:50 UTC
The pseries machine does not use openbios.  Moving to SLOF for now, though I suspect a bug in qemu is probably more likely.

I'm not sure it's the problem here, but shouldn't you have "virtio-blk-pci" instead of "virtio-blk"?

Comment 3 Adam Williamson 2019-10-01 01:50:12 UTC
I'm not an expert, but that's the command os-autoinst (openQA's job runner) generates and it works on the power8 box. The commands generated for x86_64 and aarch64 are similar (also using 'virtio-blk') and those both work fine also.

Comment 4 Adam Williamson 2019-10-03 22:08:01 UTC
I think this was the same issue as 1757249 in the end.

*** This bug has been marked as a duplicate of bug 1757249 ***