Bug 1791790 - Duplicate key error reported in 'virQEMUDriverGetDomainCapabilities'
Summary: Duplicate key error reported in 'virQEMUDriverGetDomainCapabilities'
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Michal Privoznik
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1794691
TreeView+ depends on / blocked
 
Reported: 2020-01-16 13:35 UTC by Richard W.M. Jones
Modified: 2020-01-24 13:54 UTC (History)
4 users (show)

Fixed In Version: libvirt-6.1.0
Clone Of:
: 1794691 (view as bug list)
Environment:
Last Closed: 2020-01-24 13:54:31 UTC
Embargoed:


Attachments (Terms of Use)
test.c (1.00 KB, text/plain)
2020-01-17 15:11 UTC, Richard W.M. Jones
no flags Details

Description Richard W.M. Jones 2020-01-16 13:35:00 UTC
Description of problem:

Several virt-v2v tests fail with:

virt-v2v: error: libguestfs error: could not get libvirt domain 
capabilities: internal error: Duplicate key [code=1 int1=-1]

Version-Release number of selected component (if applicable):

qemu-4.2.0-2.fc32.x86_64
libvirt-daemon-6.0.0-1.fc32.x86_64

How reproducible:

100% with libvirt from Fedora
Does NOT appear to happen with libvirt from git, so this
one could be fixed.

Steps to Reproduce:
1. In virt-v2v source, make && make check

Comment 1 Richard W.M. Jones 2020-01-16 14:30:20 UTC
I'm pretty convinced this bug is already fixed upstream although
I did not yet identify which exact commit may have fixed it, but
feel free to close it if you want.

Comment 2 Richard W.M. Jones 2020-01-16 17:17:11 UTC
I just had the error happen with upstream libvirt from git, so
in fact I do NOT think it is fixed.  However it is very intermittent.
I am currently trying to provoke the error while at the same time
using Peter Krempa's enhanced debugging patch.

Comment 3 Richard W.M. Jones 2020-01-16 17:34:18 UTC
With the enhanced debugging patch:

virt-v2v: error: libguestfs error: could not get libvirt domain 
capabilities: internal error: Duplicate hash table key 
'34:3:pc-i440fx-4.2:/usr/bin/qemu-system-x86_64' [code=1 int1=-1]

This error is definitely *very* intermittent.  It seems to happen
one time in 20 or less.

Comment 4 Peter Krempa 2020-01-17 11:02:54 UTC
According to the hash key the error comes from:

virQEMUDriverGetDomainCapabilities in src/qemu/qemu_conf.c

but I have no idea why that is happening as I didn't deal with the qemu domain caps cache much.

Comment 5 Richard W.M. Jones 2020-01-17 13:46:31 UTC
In theory this should work to reproduce the bug, but I've run this for
thousands of iterations and it didn't reproduce it for me.

while killall lt-libvirtd libvirtd >& /dev/null; tools/virsh domcapabilities >& /tmp/log; do echo -n .; done

Comment 6 Richard W.M. Jones 2020-01-17 15:11:04 UTC
Created attachment 1653096 [details]
test.c

I have a reproducer, although the bug still happens very infrequently.

*Note* that this reproducer will start 256 processes on your system
using a fork bomb approach, so you may want to adjust down the
"NR_JOBS" setting at the top before running, although it may take
even longer to reproduce the bug if you do that.

$ gcc -O2 -g -Wall `pkgconf libvirt --cflags --libs` test.c -o test
$ ./test
$ ./test
$ ./test
$ ./test
$ ./test
[repeat many times until ...]
$ ./test
libvirt:  error : internal error: Duplicate hash table key '34:3:pc-i440fx-4.2:/usr/bin/qemu-system-x86_64'

Comment 7 Michal Privoznik 2020-01-23 15:58:47 UTC
The problem is that we try to save couple of CPU cycles and cache domain capabilities. However, we don't lock the hash table that holds the cached values. So if you have two threads fighting, they will both try to add a value under the same key. This is the problematic function: https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_conf.c;h=b62dd1df52d54ca9f8386beb8b786a9f39dd4854;hb=HEAD#l1368

Comment 8 Michal Privoznik 2020-01-24 10:28:40 UTC
Patches posted upstream:

https://www.redhat.com/archives/libvir-list/2020-January/msg01004.html

Comment 9 Michal Privoznik 2020-01-24 13:54:31 UTC
I've pushed patches upstream:

c76009313f qemu_capabilities: Rework domain caps cache
cc361a34c5 qemu_conf: Avoid dereferencing NULL in virQEMUDriverGetHost{NUMACaps,CPU}
609acf1f5d cpu.c: Check properly for virCapabilitiesGetNodeInfo() retval

v6.0.0-91-gc76009313f


Note You need to log in before you can comment on or make changes to this bug.