Bug 1914805
Summary: | Check guest network link status of virtio nic with status=on, qemu core dumped: qemu-kvm: ../util/async.c:343: aio_ctx_finalize: Assertion `flags & BH_DELETED' failed. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Lei Yang <leiyang> | ||||
Component: | qemu-kvm | Assignee: | lulu <lulu> | ||||
qemu-kvm sub component: | Networking | QA Contact: | Lei Yang <leiyang> | ||||
Status: | CLOSED CURRENTRELEASE | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | high | CC: | aadam, ailan, chayang, jasowang, jinzhao, juzhang, kkoukiou, lulu, mmarusak, mpitt, smitterl, virt-maint | ||||
Version: | 8.4 | Keywords: | Regression, Triaged | ||||
Target Milestone: | rc | ||||||
Target Release: | 8.4 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | CockpitTest | ||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-08-11 08:51:11 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1948358 | ||||||
Attachments: |
|
Description
Lei Yang
2021-01-11 09:31:11 UTC
Assigned to Ariel for initial triage per bz process and age of bug created or assigned to virt-maint without triage. Created attachment 1749334 [details]
/var/log/libvirt/qemu/subVmTest1.log
We started to see this crash in our Cockpit tests, since the last refresh of our debian-testing image. That updated qemu from 5.1 to 5.2, the regression definitively happened between these.
# coredumpctl info|cat
PID: 2816 (qemu-system-x86)
UID: 64055 (libvirt-qemu)
GID: 64055 (libvirt-qemu)
Signal: 6 (ABRT)
Timestamp: Thu 2021-01-21 09:55:04 UTC (4min 10s ago)
Command Line: /usr/bin/qemu-system-x86_64 -name guest=subVmTest1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-subVmTest1/master-key.aes -machine pc-i440fx-5.2,accel=tcg,usb=off,dump-guest-core=off,memory-backend=pc.ram -cpu qemu64 -m 256 -object memory-backend-ram,id=pc.ram,size=268435456 -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 27e9d33a-76fe-4134-8dd4-60491582eeb2 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=35,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -blockdev {"driver":"file","filename":"/var/lib/libvirt/images/subVmTest1-2.img","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null} -device virtio-blk-pci,bus=pci.0,addr=0x6,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,serial=SECOND -netdev tap,fd=37,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:27:f0:b2,bus=pci.0,addr=0x3,bootindex=2 -add-fd set=2,fd=39 -chardev file,id=charserial0,path=/dev/fdset/2,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=38,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
Executable: /usr/bin/qemu-system-x86_64
Control Group: /machine.slice/machine-qemu\x2d2\x2dsubVmTest1.scope/emulator
Unit: machine-qemu\x2d2\x2dsubVmTest1.scope
Slice: machine.slice
Boot ID: 393395d7b2be485090065d58113a9337
Machine ID: a813e99297ed47cc8833e186ef538e9a
Hostname: debian
Storage: none
Message: Process 2816 (qemu-system-x86) of user 64055 dumped core.
Unfortunately I can't get a core dump out of this for the life of me -- I tried to increase the host VM's memory to 5 GiB even (and that crashing QEMU guest only has 256 MiB), but still I get
systemd-coredump[2839]: Resource limits disable core dumping for process 2816 (qemu-system-x86).
systemd-coredump[2839]: Process 2816 (qemu-system-x86) of user 64055 dumped core.
It's not my configuration -- `ulimit -c` is unlimited, and I get core dumps of other processes just fine.
In /var/log/libvirt/qemu/subVmTest1.log (attached) I see this very assertion:
2021-01-21T09:55:04.364762Z qemu-system-x86_64: terminating on signal 15 from pid 2033 (/usr/sbin/libvirtd)
qemu-system-x86_64: ../../util/async.c:343: aio_ctx_finalize: Assertion `flags & BH_DELETED' failed.
Do you happen to have any hint how to pry out a core dump from this thing? Does anything in QEMU itself set the core rlimit to 0 or something? Thanks!
I'm happy to run the test on RHEL 8.4 as well to compare (maybe core dumping works better there), I can reproduce it quite reliably. However, Lei's reported version is 5.2.0-2.scrmod+el8.4.0+9296+87860477.wrb210106, while the current build in RHEL 8.4 AppStream nightly is 4.2.0-39.module+el8.4.0+9248+2cae4f71. Where do I get the 5.2 build from? (In reply to Martin Pitt from comment #6) > I'm happy to run the test on RHEL 8.4 as well to compare (maybe core dumping > works better there), I can reproduce it quite reliably. However, Lei's > reported version is 5.2.0-2.scrmod+el8.4.0+9296+87860477.wrb210106, while > the current build in RHEL 8.4 AppStream nightly is > 4.2.0-39.module+el8.4.0+9248+2cae4f71. Where do I get the 5.2 build from? Hi Martin Please try this brew for testing: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34385922 Can you share with me your steps to reproduce this problem? Because I cannot reproduce this problem stably. Thanks in advance. Best Regards Lei @Lei: I'm happy to, but it's not *that* trivial. https://github.com/cockpit-project/cockpit/blob/master/test/README.md describes how to run our integration tests, and with injecting RPMS it's a bit more involved. Everything is safe in the sense that it does not require root privs and does not change anything permanently in your home dir or on your host (it uses transient libvirt-qemu session domains). If you don't have/don't want to install libvirt etc., all that works fine in toolbox: toolbox create --image quay.io/cockpit/tasks (This is in fact how most cockpit developers work these days) I built a cockpit standard RHEL 8.4 VM in a https://github.com/cockpit-project/cockpit checkout, booted it: $ test/image-prepare rhel-8-4 $ bots/vm-run rhel-8-4 This automatically shows a QEMU console. Log in (root:foobar), and grab the RPMs from that build and install them: dnf install -y wget wget -r -np http://download.eng.bos.redhat.com/brewroot/work/tasks/5982/34385982/ rpm --verbose -U download.eng.bos.redhat.com/brewroot/work/tasks/5982/34385982/*.rpm On the host, run various integration tests against that already running VM: TEST_OS=rhel-8-4 test/verify/check-machines-nics --machine 127.0.0.2:2201 --browser 127.0.0.2:9091 -tv TestMachinesNICs.testNICAdd TEST_OS=rhel-8-4 test/verify/check-machines-disks --machine 127.0.0.2:2201 --browser 127.0.0.2:9091 -tv TestMachinesDisks.testAddDiskAdditionalOptions TEST_OS=rhel-8-4 test/verify/check-machines-lifecycle --machine 127.0.0.2:2201 --browser 127.0.0.2:9091 -tv TestMachinesLifecycle.testBasic (Drop the -tv if this is too noisy for you) Unfortunately I didn't hit that crash yet on RHEL 8.4. Doing the same steps with "debian-testing" (and without the middle part of grabbing the RPMs, of course) reproduces the crash fairly often for me (maybe 1 out of 3 runs). At least the crash was observed [1] with these three tests on debian-testing. I'll leave these tests running in a loop, and see if I can catch it over the day. [1] https://github.com/cockpit-project/bots/issues/1565 That loop was successful in the sense that it finally reproduced the crash in RHEL 8.4: # coredumpctl info /usr/libexec/qemu-kvm |cat PID: 119536 (qemu-kvm) UID: 107 (qemu) GID: 107 (qemu) Signal: 6 (ABRT) Timestamp: Fri 2021-01-22 07:25:24 UTC (54s ago) Command Line: /usr/libexec/qemu-kvm -name guest=subVmTest1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-subVmTest1/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=tcg,usb=off,dump-guest-core=off -cpu qemu64 -m 256 -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -uuid b51fe8b6-e0a8-4eca-a2ad-e2cebeee7015 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=37,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -blockdev {"driver":"file","filename":"/var/lib/libvirt/images/subVmTest1-2.img","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null} -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,serial=SECOND -netdev tap,fd=39,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:a0:d5:79,bus=pci.0,addr=0x3,bootindex=2 -add-fd set=2,fd=41 -chardev file,id=charserial0,path=/dev/fdset/2,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=40,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on Executable: /usr/libexec/qemu-kvm Control Group: /machine.slice/machine-qemu\x2d2\x2dsubVmTest1.scope Unit: machine-qemu\x2d2\x2dsubVmTest1.scope Slice: machine.slice Boot ID: 10827ff665c94fbdb554f48de2dcc52b Machine ID: 04215ca07d2848eb9fcbbb65d4753240 Hostname: m1.cockpit.lan Storage: none Message: Process 119536 (qemu-kvm) of user 107 dumped core. However, still "Resource limits disable core dumping for process 119536 (qemu-kvm).". I don't know how to enable core dumps for that.. FTR, during that loop I got 20(!) core dumps of libvirtd. (In reply to Martin Pitt from comment #9) > That loop was successful in the sense that it finally reproduced the crash > in RHEL 8.4: > > # coredumpctl info /usr/libexec/qemu-kvm |cat > PID: 119536 (qemu-kvm) > UID: 107 (qemu) > GID: 107 (qemu) > Signal: 6 (ABRT) > Timestamp: Fri 2021-01-22 07:25:24 UTC (54s ago) > Command Line: /usr/libexec/qemu-kvm -name > guest=subVmTest1,debug-threads=on -S -object > secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2- > subVmTest1/master-key.aes -machine > pc-i440fx-rhel7.6.0,accel=tcg,usb=off,dump-guest-core=off -cpu qemu64 -m 256 > -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -uuid > b51fe8b6-e0a8-4eca-a2ad-e2cebeee7015 -no-user-config -nodefaults -chardev > socket,id=charmonitor,fd=37,server,nowait -mon > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot > strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -blockdev > {"driver":"file","filename":"/var/lib/libvirt/images/subVmTest1-2.img","node- > name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"} -blockdev > {"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file": > "libvirt-1-storage","backing":null} -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=libvirt-1-format,id=virtio- > disk0,bootindex=1,serial=SECOND -netdev tap,fd=39,id=hostnet0 -device > rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:a0:d5:79,bus=pci.0,addr=0x3, > bootindex=2 -add-fd set=2,fd=41 -chardev > file,id=charserial0,path=/dev/fdset/2,append=on -device > isa-serial,chardev=charserial0,id=serial0 -chardev > socket,id=charchannel0,fd=40,server,nowait -device > virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0, > name=org.qemu.guest_agent.0 -spice > port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless- > migration=on -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -sandbox > on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg > timestamp=on > Executable: /usr/libexec/qemu-kvm > Control Group: /machine.slice/machine-qemu\x2d2\x2dsubVmTest1.scope > Unit: machine-qemu\x2d2\x2dsubVmTest1.scope > Slice: machine.slice > Boot ID: 10827ff665c94fbdb554f48de2dcc52b > Machine ID: 04215ca07d2848eb9fcbbb65d4753240 > Hostname: m1.cockpit.lan > Storage: none Hi Martin Could you please share the content of /etc/systemd/coredump.conf? "Storage: none" from man coredump.conf says the core dumps are not stored permanently. Maybe this is the cause? > Message: Process 119536 (qemu-kvm) of user 107 dumped core. > > However, still "Resource limits disable core dumping for process 119536 > (qemu-kvm).". I don't know how to enable core dumps for that.. FTR, during > that loop I got 20(!) core dumps of libvirtd. Chao, as I wrote earlier:
> It's not my configuration -- `ulimit -c` is unlimited, and I get core dumps of other processes just fine.
/etc/systemd/coredump.conf is the stock RHEL 8.4 default, i.e. all comments only. In particular:
#Storage=external
#ProcessSizeMax=2G
#ExternalSizeMax=2G
Is this a regression? If yes, could you please help to bisect to find the first bad commit? Thanks (In reply to jason wang from comment #13) > Is this a regression? If yes, could you please help to bisect to find the > first bad commit? > > Thanks Hi Jason I will update after getting the results. Best Regards Lei Hi Jason I tried to use the steps in the bug description to reproduce the problem. It's such a shame that I did not succeed. I will try the method provided by Martin Pitt in Comment 8 to reproduce the problem. I will update it in time after reproducing. Best Regards Lei (In reply to Lei Yang from comment #15) > Hi Jason > > I tried to use the steps in the bug description to reproduce the problem. > It's such a shame that I did not succeed. > I will try the method provided by Martin Pitt in Comment 8 to reproduce the > problem. I will update it in time after reproducing. > > Best Regards > Lei Hi Jason, Could you please take a look at Bug 1925047? The same assertion observed with qemu-5.2 on rhel9 but with more information. This is also happenning with qemu-5.2.0-5.fc34.1.aarch64.rpm, spotted from Cockpit tests on fedora 34 images as well. If you 're still having hard time to reproduce I can add some debug packages to the test images, so next time it crashes, we can provide you with a proper stacktrace. (In reply to Katerina Koukiou from comment #17) > This is also happenning with qemu-5.2.0-5.fc34.1.aarch64.rpm, spotted from > Cockpit tests on fedora 34 images as well. If you 're still having hard time > to reproduce I can add some debug packages to the test images, so next time > it crashes, we can provide you with a proper stacktrace. Hi Katerina: Would you mind to have a bisection to find the first commit that introduces this issue? Thanks Hey @jason, bisection would be hard here as it's happening in a non deterministic way and I definitely can't reproduce reliably outside of the CI. What I suggested above is that I can install the relevant debuginfo packages inside the CI test images, so next time it crashes we can provide you with a proper stacktrace from the CI tests. Our automatic tracker [1] still sees this on Fedora 34, Debian testing, and Ubuntu 21.04; but these all have QEMU 5.2. Our CI uses the default version of QEMU in RHEL, and 8.5 still has 4.2.0, which is not affected. However, RHEL 9.0 nightly has 6.0.0-1.el9 and we have *not* seen this crash in our CI there. So this is a strong indication that 6.0 does not have this crash any more. Thanks! [1] https://github.com/cockpit-project/bots/issues/1565 Hi, Jason 1.I tried to use the method mentioned in Bug 1925047 to test, and it did not reproduce the problem, even if I use the same kernel and qemu version as it. Host Test Version: qemu-kvm-5.2.0-3.el9.x86_64 kernel-5.11.0-0.rc5.138.el9.x86_64 Guest kernel Version: kernel 5.11.0-0.rc3.124.el9.x86_64 2.Based on Comment 24,25,is this indicate that qemu-6.0 does not have this crash any more? And the problem has been fixed. Thanks Lei Hi Cindy Based on Comment 24, 25, and this bug was not reproduced in the test of rhel8.5. Does this indicate that qemu-6.0 has solved this problem? Is it possible to change the bug status to "CURRENTRELEASE "? Best Regards Lei Sure Thanks Lei, this bug was fix in qemu 6.0, let's move this to CURRENTRELEASE |