Bug 1769445 - can't open qmp at /usr/libexec/os-autoinst/OpenQA/Qemu/Proc.pm line 405
Summary: can't open qmp at /usr/libexec/os-autoinst/OpenQA/Qemu/Proc.pm line 405
Keywords:
Status: CLOSED DUPLICATE of bug 1768551
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 31
Hardware: ppc64le
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PPCTracker
TreeView+ depends on / blocked
 
Reported: 2019-11-06 16:09 UTC by Michel Normand
Modified: 2019-11-07 15:57 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-07 08:42:45 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
qmp_connect_autoinst.log (8.46 KB, text/plain)
2019-11-06 16:09 UTC, Michel Normand
no flags Details

Description Michel Normand 2019-11-06 16:09:08 UTC
Created attachment 1633372 [details]
qmp_connect_autoinst.log

After upgrade of a P8 openQA server from f30 to f31, the openQA is unable to connect to qemu guest as unable to open qmp as per reported log extract

===
[2019-11-06T15:37:26.727 CET] [debug] starting: /usr/bin/qemu-system-ppc64 -g 1024x768 -vga virtio -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -soundhw ac97 -global isa-fdc.driveA= -m 4096 -machine usb=off -cpu host -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -boot order=c,menu=on,splash-time=5000 -device nec-usb-xhci -device usb-tablet -device usb-kbd -smp 1 -enable-kvm -no-shutdown -vnc :91,share=force-shared -device virtio-serial -chardev socket,path=virtio_console,server,nowait,id=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev socket,path=qmp_socket,server,nowait,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-overlay0-file,filename=/var/lib/openqa/pool/1/raid/hd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0-overlay0,file=hd0-overlay0-file,cache.no-flush=on -device virtio-blk,id=hd0-device,drive=hd0-overlay0,bootindex=0,serial=hd0 -blockdev driver=file,node-name=hd1-overlay0-file,filename=/var/lib/openqa/pool/1/raid/hd1-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd1-overlay0,file=hd1-overlay0-file,cache.no-flush=on -device virtio-blk,id=hd1-device,drive=hd1-overlay0,serial=hd1
...
[2019-11-06T15:37:46.159 CET] [debug] Backend process died, backend errors are reported below in the following lines: 
can't open qmp at /usr/libexec/os-autoinst/OpenQA/Qemu/Proc.pm line 405. 
===

before upgrade (was with f30)
===
$rpm -qa |grep -E 'qemu.3|autoinst-4|openqa-4' |sort
openqa-4.6-33.20190806git1c53390.fc30.noarch
os-autoinst-4.5-26.20190807git3391d60.fc30.ppc64le
qemu-3.1.1-2.fc30.ppc64le
===

after upgrade to f31
===
$rpm -qa |grep -E 'qemu.4|autoinst-4|openqa-4' |sort
openqa-4.6-33.20190806git1c53390.fc31.noarch
os-autoinst-4.5-26.20190807git3391d60.fc31.ppc64le
qemu-4.1.0-5.fc31.ppc64le
===

I do not know how to investigate manually.

Comment 1 Adam Williamson 2019-11-06 16:36:17 UTC
IIRC, that error usually really means 'qemu dropped dead unexpectedly' or something like that. Can you try running the same qemu command (modified as minimally as possible to run outside of openqa) manually and see what happens?

You could also try https://bodhi.fedoraproject.org/updates/FEDORA-2019-bee032bb21 , which is what's running on staging now and I think is not having this problem there.

Comment 2 Adam Williamson 2019-11-06 17:35:37 UTC
I tell a lie - our p8 worker host *is* hitting this, e.g. https://openqa.stg.fedoraproject.org/tests/665001/file/autoinst-log.txt . the p9 worker hosts are not hitting it.

Comment 3 Adam Williamson 2019-11-06 17:36:13 UTC
Note I did notice this problem with virt-install on the p8 box as well:

https://bugzilla.redhat.com/show_bug.cgi?id=1768551

these could possibly be the same bug...

Comment 4 Adam Williamson 2019-11-06 18:21:15 UTC
OK, yeah, I think my theory is adding up. Try with `-M pseries-3.1` and I bet you'll find it works. I think the issue is that qemu after this commit:

https://github.com/qemu/qemu/commit/2782ad4c4102d57f7f8e135dce0c1adb0149de77

expects firmware support for this SPAPR_CAP_WORKAROUND mitigation, but our affected machines do not have a new enough firmware. I've filed https://pagure.io/fedora-infrastructure/issue/8365 to request the firmware on our infra box be updated. Can you check this and if you think I'm right, we can probably close this as a dupe of 1768551 ?

Comment 5 Michel Normand 2019-11-07 08:42:45 UTC
as per Adam Williamson investigation of bug#1769600, I redo a test on my P8 machine adding "QEMU_APPEND=-M pserie-3.1" as openQA ppc64le machine, and was able to run tests. So eturn this bug as duplicate of new bug.

*** This bug has been marked as a duplicate of bug 1769600 ***

Comment 6 Adam Williamson 2019-11-07 15:42:56 UTC
Michel: I think bug#1768551 and bug#1769600 are different. bug#1768551 happens on our P8 worker host, and in that case qemu just exits out immediately on launch. bug#1769600 happens on our P9 worker host, and in *that* case qemu runs, but cannot boot much past the bootloader. Both can be worked around by changing the machine type, but I don't think they're exactly the same bug. As the machine and symptoms here match bug#1768551 , let's make this a dupe of that, not the P9 one.

*** This bug has been marked as a duplicate of bug 1768551 ***

Comment 7 Adam Williamson 2019-11-07 15:57:09 UTC
BTW, for openQA purposes there is actually a `QEMUMACHINE` variable you can just set to `pseries-3.1` (or `pseries-3.1,usb=off` to match a baked-in default in os-autoinst, if you prefer...if you don't set `QEMUMACHINE` on ppc, os-autoinst automatically sets it to `usb=off`, so I guess there's some reason to preserve that in a custom setting). I only figured this out halfway through yesterday though :P


Note You need to log in before you can comment on or make changes to this bug.