Bug 1791790
Summary: | Duplicate key error reported in 'virQEMUDriverGetDomainCapabilities' | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] Virtualization Tools | Reporter: | Richard W.M. Jones <rjones> | ||||
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||
Status: | CLOSED NEXTRELEASE | QA Contact: | |||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | unspecified | CC: | libvirt-maint, mprivozn, pkrempa, tburke | ||||
Target Milestone: | --- | Keywords: | Upstream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-6.1.0 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1794691 (view as bug list) | Environment: | |||||
Last Closed: | 2020-01-24 13:54:31 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1794691 | ||||||
Attachments: |
|
Description
Richard W.M. Jones
2020-01-16 13:35:00 UTC
I'm pretty convinced this bug is already fixed upstream although I did not yet identify which exact commit may have fixed it, but feel free to close it if you want. I just had the error happen with upstream libvirt from git, so in fact I do NOT think it is fixed. However it is very intermittent. I am currently trying to provoke the error while at the same time using Peter Krempa's enhanced debugging patch. With the enhanced debugging patch: virt-v2v: error: libguestfs error: could not get libvirt domain capabilities: internal error: Duplicate hash table key '34:3:pc-i440fx-4.2:/usr/bin/qemu-system-x86_64' [code=1 int1=-1] This error is definitely *very* intermittent. It seems to happen one time in 20 or less. According to the hash key the error comes from: virQEMUDriverGetDomainCapabilities in src/qemu/qemu_conf.c but I have no idea why that is happening as I didn't deal with the qemu domain caps cache much. In theory this should work to reproduce the bug, but I've run this for thousands of iterations and it didn't reproduce it for me. while killall lt-libvirtd libvirtd >& /dev/null; tools/virsh domcapabilities >& /tmp/log; do echo -n .; done Created attachment 1653096 [details]
test.c
I have a reproducer, although the bug still happens very infrequently.
*Note* that this reproducer will start 256 processes on your system
using a fork bomb approach, so you may want to adjust down the
"NR_JOBS" setting at the top before running, although it may take
even longer to reproduce the bug if you do that.
$ gcc -O2 -g -Wall `pkgconf libvirt --cflags --libs` test.c -o test
$ ./test
$ ./test
$ ./test
$ ./test
$ ./test
[repeat many times until ...]
$ ./test
libvirt: error : internal error: Duplicate hash table key '34:3:pc-i440fx-4.2:/usr/bin/qemu-system-x86_64'
The problem is that we try to save couple of CPU cycles and cache domain capabilities. However, we don't lock the hash table that holds the cached values. So if you have two threads fighting, they will both try to add a value under the same key. This is the problematic function: https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_conf.c;h=b62dd1df52d54ca9f8386beb8b786a9f39dd4854;hb=HEAD#l1368 Patches posted upstream: https://www.redhat.com/archives/libvir-list/2020-January/msg01004.html I've pushed patches upstream: c76009313f qemu_capabilities: Rework domain caps cache cc361a34c5 qemu_conf: Avoid dereferencing NULL in virQEMUDriverGetHost{NUMACaps,CPU} 609acf1f5d cpu.c: Check properly for virCapabilitiesGetNodeInfo() retval v6.0.0-91-gc76009313f |