Bug 1794691
Summary: | Duplicate key error reported in 'virQEMUDriverGetDomainCapabilities' | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Michal Privoznik <mprivozn> | ||||||
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Lili Zhu <lizhu> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 8.1 | CC: | chhu, jdenemar, libvirt-maint, lmen, mprivozn, pkrempa, rjones, tburke, xuzhang, yalzhang | ||||||
Target Milestone: | rc | Keywords: | Upstream | ||||||
Target Release: | 8.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libvirt-6.0.0-3.el8 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Cause:
When generating domain capabilities XML (e.g. virsh domcapabilities), libvirt saves generated caps for possible future use (caching). This saves couple of CPU cycles, because the capabilities need to be constructed exactly once. However, due to a missing mutual exclusion in the code the typical TOCTOU problem might have happened. If two threads were asked to return domain capabilities XML, both correctly identified that no cached data is available. So both proceeded to generate it and tried to add it to the cache. The first one succeeded, while the other failed with a cryptic error message.
Consequence:
Generating domain capabilities XML might have failed with a cryptic error message.
Fix:
Add mutexes around cache, so only one thread can work with it at once.
Result:
Caching works as designed and generating domain capabilities succeeds.
|
Story Points: | --- | ||||||
Clone Of: | 1791790 | Environment: | |||||||
Last Closed: | 2020-05-05 09:55:54 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1791790 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Michal Privoznik
2020-01-24 11:00:51 UTC
Tested with: libvirt-daemon-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64 qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64 1. Download the attached bash script and compile it $ gcc -O2 -g -Wall `pkgconf libvirt --cflags --libs` test.c -o test 2. execute the script $ i=0 $ while true; do i=$((i+1)); echo $i; ./test; done 1 2 3 .... 10212 10213 10214 ^C repeat for 10000+ times, could not reproduce Retested with: libvirt-daemon-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64 qemu-kvm-4.2.0-17.module+el8.2.0+6141+0f540f16.x86_64 1. Download the attached bash script and compile it $ gcc -O2 -g -Wall `pkgconf libvirt --cflags --libs` test.c -o test 2. execute the script $ i=0 $ while true; do i=$((i+1)); echo $i; ./test; done 1 2 3 .... 3025 3026 3027 ^C repeated for 3000+ times, did not hit the error Hi, Michal The attached file is test.c I tried to reproduce this bug with libvirt-daemon-6.0.0-1, and also with libvirt-daemon-5.6.0-10 can not reproduce. Please help to check whether there is something wrong with the testing steps. Thanks very much. Created attachment 1678916 [details]
test.c
I think that test is the one which I wrote ...? I think Peter had a much better program which used threads to reproduce the bug more reliably. Unfortunately I cannot find it in the mailing list. (https://www.redhat.com/archives/libvir-list/2020-January/msg01006.html) Created attachment 1679013 [details] repro.c Yeah, this bug is not easy to reproduce because it is a race condition. What I did when debugging this was to put sleep(5) into a specific place and then get domcaps from two terminals at once. More details here: https://www.redhat.com/archives/libvir-list/2020-January/msg01004.html However, with the attached file and tuning worker pool (min_workers = 2 max_workers = 2 in libvirtd.conf) I am able to reproduce occasionally, I mean, if I revert the fix and don't put the sleep(): https://www.redhat.com/archives/libvir-list/2020-January/msg01007.html Thanks very much to Michal and Richard, can produce this bug now 1. put a sleep in the following the lines, then build from source # git diff diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 30637b21ac..f76bbe4b39 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -1372,6 +1372,7 @@ virQEMUDriverGetDomainCapabilities(virQEMUDriverPtr driver, key = g_strdup_printf("%d:%d:%s:%s", data.arch, data.virttype, NULLSTR(data.machine), NULLSTR(data.path)); + sleep(5); if (virHashAddEntry(domCapsCache, key, domCaps) < 0) return NULL; } 2. And then try to get domcaps from two consoles at once Terminal 1: # virsh domcapabilities <domainCapabilities> <path>/usr/libexec/qemu-kvm</path> <domain>kvm</domain> <machine>pc-i440fx-rhel7.6.0</machine> <arch>x86_64</arch> <vcpu max='240'/> <iothreads supported='yes'/> .... Terminal 2: # virsh domcapabilities error: failed to get emulator capabilities error: internal error: Duplicate key Verify this bug with: libvirt-daemon-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64.rpm 1. Apply the following patch to rebuild libvirt From 75610bfea795c4988cccb84491d718492e67474f Mon Sep 17 00:00:00 2001 From: rpm-build <rpm-build> Date: Sun, 19 Apr 2020 10:06:33 -0400 Subject: [PATCH] Verify bug 1794691 Signed-off-by: rpm-build <rpm-build> --- src/qemu/qemu_capabilities.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 0b4ed42..1e22c10 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -2108,6 +2108,7 @@ virQEMUCapsGetDomainCapsCache(virQEMUCapsPtr qemuCaps, key = g_strdup_printf("%d:%d:%s:%s", arch, virttype, NULLSTR(machine), path); + sleep(5); if (virHashAddEntry(cache->cache, key, tempDomCaps) < 0) goto cleanup; -- 2.18.2 2. start libvirtd 3. try to get domcapabilities from two consoles at once Terminal 1: # virsh domcapabilities <domainCapabilities> <path>/usr/libexec/qemu-kvm</path> <domain>kvm</domain> <machine>pc-i440fx-rhel7.6.0</machine> <arch>x86_64</arch> <vcpu max='240'/> <iothreads supported='yes'/> ..... Terminal 2: # virsh domcapabilities <domainCapabilities> <path>/usr/libexec/qemu-kvm</path> <domain>kvm</domain> <machine>pc-i440fx-rhel7.6.0</machine> <arch>x86_64</arch> <vcpu max='240'/> <iothreads supported='yes'/> <os supported='yes'> .... As the testing result matches with the expected result, mark the bug as verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017 |