Bug 1769445

Summary: can't open qmp at /usr/libexec/os-autoinst/OpenQA/Qemu/Proc.pm line 405
Product: [Fedora] Fedora Reporter: Michel Normand <normand>
Component: qemuAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 31CC: amit, awilliam, berrange, cfergeau, dan, dwmw2, itamar, pbonzini, rjones, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-07 08:42:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1071880    
Attachments:
Description Flags
qmp_connect_autoinst.log none

Description Michel Normand 2019-11-06 16:09:08 UTC
Created attachment 1633372 [details]
qmp_connect_autoinst.log

After upgrade of a P8 openQA server from f30 to f31, the openQA is unable to connect to qemu guest as unable to open qmp as per reported log extract

===
[2019-11-06T15:37:26.727 CET] [debug] starting: /usr/bin/qemu-system-ppc64 -g 1024x768 -vga virtio -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -soundhw ac97 -global isa-fdc.driveA= -m 4096 -machine usb=off -cpu host -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -boot order=c,menu=on,splash-time=5000 -device nec-usb-xhci -device usb-tablet -device usb-kbd -smp 1 -enable-kvm -no-shutdown -vnc :91,share=force-shared -device virtio-serial -chardev socket,path=virtio_console,server,nowait,id=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev socket,path=qmp_socket,server,nowait,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-overlay0-file,filename=/var/lib/openqa/pool/1/raid/hd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0-overlay0,file=hd0-overlay0-file,cache.no-flush=on -device virtio-blk,id=hd0-device,drive=hd0-overlay0,bootindex=0,serial=hd0 -blockdev driver=file,node-name=hd1-overlay0-file,filename=/var/lib/openqa/pool/1/raid/hd1-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd1-overlay0,file=hd1-overlay0-file,cache.no-flush=on -device virtio-blk,id=hd1-device,drive=hd1-overlay0,serial=hd1
...
[2019-11-06T15:37:46.159 CET] [debug] Backend process died, backend errors are reported below in the following lines: 
can't open qmp at /usr/libexec/os-autoinst/OpenQA/Qemu/Proc.pm line 405. 
===

before upgrade (was with f30)
===
$rpm -qa |grep -E 'qemu.3|autoinst-4|openqa-4' |sort
openqa-4.6-33.20190806git1c53390.fc30.noarch
os-autoinst-4.5-26.20190807git3391d60.fc30.ppc64le
qemu-3.1.1-2.fc30.ppc64le
===

after upgrade to f31
===
$rpm -qa |grep -E 'qemu.4|autoinst-4|openqa-4' |sort
openqa-4.6-33.20190806git1c53390.fc31.noarch
os-autoinst-4.5-26.20190807git3391d60.fc31.ppc64le
qemu-4.1.0-5.fc31.ppc64le
===

I do not know how to investigate manually.

Comment 1 Adam Williamson 2019-11-06 16:36:17 UTC
IIRC, that error usually really means 'qemu dropped dead unexpectedly' or something like that. Can you try running the same qemu command (modified as minimally as possible to run outside of openqa) manually and see what happens?

You could also try https://bodhi.fedoraproject.org/updates/FEDORA-2019-bee032bb21 , which is what's running on staging now and I think is not having this problem there.

Comment 2 Adam Williamson 2019-11-06 17:35:37 UTC
I tell a lie - our p8 worker host *is* hitting this, e.g. https://openqa.stg.fedoraproject.org/tests/665001/file/autoinst-log.txt . the p9 worker hosts are not hitting it.

Comment 3 Adam Williamson 2019-11-06 17:36:13 UTC
Note I did notice this problem with virt-install on the p8 box as well:

https://bugzilla.redhat.com/show_bug.cgi?id=1768551

these could possibly be the same bug...

Comment 4 Adam Williamson 2019-11-06 18:21:15 UTC
OK, yeah, I think my theory is adding up. Try with `-M pseries-3.1` and I bet you'll find it works. I think the issue is that qemu after this commit:

https://github.com/qemu/qemu/commit/2782ad4c4102d57f7f8e135dce0c1adb0149de77

expects firmware support for this SPAPR_CAP_WORKAROUND mitigation, but our affected machines do not have a new enough firmware. I've filed https://pagure.io/fedora-infrastructure/issue/8365 to request the firmware on our infra box be updated. Can you check this and if you think I'm right, we can probably close this as a dupe of 1768551 ?

Comment 5 Michel Normand 2019-11-07 08:42:45 UTC
as per Adam Williamson investigation of bug#1769600, I redo a test on my P8 machine adding "QEMU_APPEND=-M pserie-3.1" as openQA ppc64le machine, and was able to run tests. So eturn this bug as duplicate of new bug.

*** This bug has been marked as a duplicate of bug 1769600 ***

Comment 6 Adam Williamson 2019-11-07 15:42:56 UTC
Michel: I think bug#1768551 and bug#1769600 are different. bug#1768551 happens on our P8 worker host, and in that case qemu just exits out immediately on launch. bug#1769600 happens on our P9 worker host, and in *that* case qemu runs, but cannot boot much past the bootloader. Both can be worked around by changing the machine type, but I don't think they're exactly the same bug. As the machine and symptoms here match bug#1768551 , let's make this a dupe of that, not the P9 one.

*** This bug has been marked as a duplicate of bug 1768551 ***

Comment 7 Adam Williamson 2019-11-07 15:57:09 UTC
BTW, for openQA purposes there is actually a `QEMUMACHINE` variable you can just set to `pseries-3.1` (or `pseries-3.1,usb=off` to match a baked-in default in os-autoinst, if you prefer...if you don't set `QEMUMACHINE` on ppc, os-autoinst automatically sets it to `usb=off`, so I guess there's some reason to preserve that in a custom setting). I only figured this out halfway through yesterday though :P