Bug 1661976

Summary: qemu-kvm: failed to set irq for PMU
Product: Red Hat Enterprise Linux 8 Reporter: Li Shuang <shuali>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
qemu-kvm sub component: General QA Contact: Virtualization Bugs <virt-bugs>
Status: CLOSED NOTABUG Docs Contact:
Severity: high    
Priority: high CC: awilliam, chayang, drjones, juzhang, michen, rbalakri, virt-maint, yinxu
Version: 8.0   
Target Milestone: rc   
Target Release: 8.0   
Hardware: aarch64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-27 13:32:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Li Shuang 2018-12-25 05:48:39 UTC
Description of problem:
# /usr/libexec/qemu-kvm -name debug -drive file=/home/debug_el8.aarch64.qcow2,if=none,id=drive-virtio-disk1,media=disk,cache=none,snapshot=off,format=qcow2,aio=native -device virtio-blk-pci,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=0 -cpu host -drive file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/debug_VARS.fd,if=pflash,format=raw,unit=1  -netdev tap,id=hostnet0,vhost=on,script=/etc/br_wan_ifup,downscript=/etc/br_wan_ifdn,ifname=tap_debug_0 -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=00:e4:a3:83:14:e3 -netdev tap,id=hostnet1,vhost=on,script=/etc/br_lan_ifup,downscript=/etc/br_lan_ifdn,ifname=tap_debug_1 -device virtio-net-pci,netdev=hostnet1,id=virtio-net-pci1,mac=00:a7:c8:b3:76:b2 -netdev tap,id=hostnet2,vhost=on,script=/etc/br_lan_ifup,downscript=/etc/br_lan_ifdn,ifname=tap_debug_2 -device virtio-net-pci,netdev=hostnet2,id=virtio-net-pci2,mac=00:c1:2b:b5:f0:e2 -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0 -device pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0 -serial file:/home/debug_el8.console -serial pty -vnc :10 -qmp tcp:0:11010,server,nowait  -smp 2,cores=1,threads=1,sockets=2 -m 4096
qemu-kvm: -serial pty: char device redirected to /dev/pts/2 (label serial1)
qemu-kvm: PMU: KVM_SET_DEVICE_ATTR: Invalid argument
qemu-kvm: failed to set irq for PMU
Aborted (core dumped)

# tail -5 /var/log/messages 
Dec 25 00:44:15 qualcomm-amberwing-rep-04 kvm[30901]: 0 guests now active
Dec 25 00:44:15 qualcomm-amberwing-rep-04 systemd-coredump[30889]: Process 30855 (qemu-kvm) of user 0 dumped core.#012#012Stack trace of thread 30855:#012#0  0x0000ffff88ad2d0c raise (libc.so.6)#012#1  0x0000ffff88ac08e8 abort (libc.so.6)#012#2  0x0000aaaae1087300 kvm_arm_pmu_set_irq (qemu-kvm)#012#3  0x0000aaaae107efd8 machvirt_init (qemu-kvm)#012#4  0x0000aaaae1152570 machine_run_board_init (qemu-kvm)#012#5  0x0000aaaae0fcfe74 main (qemu-kvm)#012#6  0x0000ffff88ac0d24 __libc_start_main (libc.so.6)#012#7  0x0000aaaae0fd1ccc _start (qemu-kvm)#012#8  0x0000aaaae0fd1ccc _start (qemu-kvm)
Dec 25 00:44:49 qualcomm-amberwing-rep-04 restraintd[4676]: *** Current Time: Tue Dec 25 00:44:49 2018 Localwatchdog at:  * Disabled! *
Dec 25 00:45:49 qualcomm-amberwing-rep-04 restraintd[4676]: *** Current Time: Tue Dec 25 00:45:49 2018 Localwatchdog at:  * Disabled! *
Dec 25 00:46:49 qualcomm-amberwing-rep-04 restraintd[4676]: *** Current Time: Tue Dec 25 00:46:49 2018 Localwatchdog at:  * Disabled! *


Version-Release number of selected component (if applicable):
RHEL-8.0-20181220.1 ==> kernel-4.18.0-56.el8
qemu-kvm-core-2.12.0-49.module+el8+2586+bf759444.aarch64


How reproducible:
sometimes


Steps to Reproduce:
1. install packages and prepare test environment
# yum install kernel-kernel-networking-common -y
# yum install python3-pexpect -y
# cd /mnt/tests/kernel/networking/
# source common/vm/vm.sh
# vinit
# vstart debug
(here we will find that the vstart failed)
2. run qemu-kvm manually
# /usr/libexec/qemu-kvm -name debug -drive file=/home/debug_el8.aarch64.qcow2,if=none,id=drive-virtio-disk1,media=disk,cache=none,snapshot=off,format=qcow2,aio=native -device virtio-blk-pci,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=0 -cpu host -drive file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/debug_VARS.fd,if=pflash,format=raw,unit=1  -netdev tap,id=hostnet0,vhost=on,script=/etc/br_wan_ifup,downscript=/etc/br_wan_ifdn,ifname=tap_debug_0 -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=00:e4:a3:83:14:e3 -netdev tap,id=hostnet1,vhost=on,script=/etc/br_lan_ifup,downscript=/etc/br_lan_ifdn,ifname=tap_debug_1 -device virtio-net-pci,netdev=hostnet1,id=virtio-net-pci1,mac=00:a7:c8:b3:76:b2 -netdev tap,id=hostnet2,vhost=on,script=/etc/br_lan_ifup,downscript=/etc/br_lan_ifdn,ifname=tap_debug_2 -device virtio-net-pci,netdev=hostnet2,id=virtio-net-pci2,mac=00:c1:2b:b5:f0:e2 -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0 -device pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0 -serial file:/home/debug_el8.console -serial pty -vnc :10 -qmp tcp:0:11010,server,nowait  -smp 2,cores=1,threads=1,sockets=2 -m 4096


Actual results:
VM start failed by qemu-kvm on aarch64.


Expected results:
VM start succeed.


Additional info:
We can reproduce this issue on the following systems:
netqe-arm-02.knqe.lab.eng.bos.redhat.com
hpe-apollo-cn99xx-07.khw3.lab.eng.bos.redhat.com
qualcomm-amberwing-rep-04.khw3.lab.eng.bos.redhat.com

Comment 1 Andrew Jones 2018-12-27 13:32:49 UTC
On machines with GICv3 that don't support GICv2 guests you must have '-machine gic-version=3' on the QEMU command line. Also, the whole command line in comment 0 looks weird. Why isn't libvirt being used, which will ensure it's correct?

Comment 2 Adam Williamson 2020-03-04 00:44:02 UTC
For the record, just ran into this on Fedora 31. There *are* use cases for running qemu without libvirt. I run the Fedora openQA instance; openQA is a test system which runs tests in qemu virtual machines, which it runs directly, not via libvirt.

Thanks to finding this bug I was able to configure it to use `-machine virt,gic-version=max` which makes qemu boot on both our older and newer aarch64 worker hosts, just figured I'd add a note here in case it helps anyone else.

Comment 3 Andrew Jones 2020-03-04 08:28:28 UTC
(In reply to Adam Williamson from comment #2)
> For the record, just ran into this on Fedora 31. There *are* use cases for
> running qemu without libvirt.

This is a RHEL8 bug so there are *no* supported use cases for running qemu without libvirt.

> I run the Fedora openQA instance; openQA is a
> test system which runs tests in qemu virtual machines, which it runs
> directly, not via libvirt.

I'm glad to hear that openQA is getting run on AArch64 Fedora, but it sounds like openQA needs to be patched to better determine how to generate AArch64 QEMU command lines.

> 
> Thanks to finding this bug I was able to configure it to use `-machine
> virt,gic-version=max` which makes qemu boot on both our older and newer
> aarch64 worker hosts, just figured I'd add a note here in case it helps
> anyone else.

We do use gic-version=max for QEMU testing quite a bit, but users still need to be a bit cautious, as guests that were working with gicv2 compatibility on gicv3 hosts that support gicv2 compatibility will be magically changed to gicv3 guests. That's possibly not something the user expects. Also, on TCG or with KVM and kernel_irqchip=off, max=3, but the emulation for 3 isn't currently complete, so it won't completely work.

That said, upstream recently got patches to generate better error messages, as the PMU stuff is pretty confusing. And, there are patches on the list to autodetect gic-version=3 in certain cases, which should help avoid the problem if those patches get merged. Still, I think openQA should learn how to probe host gic support and then generate explicit, correct command lines like libvirt does.