Bug 1576464

Summary: Hash operation not allowed during iteration
Product: Red Hat Enterprise Linux 7 Reporter: Derek Higgins <derekh>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: yafu <yafu>
Severity: high Docs Contact:
Priority: high    
Version: 7.4CC: bfournie, chhu, dyuan, hjensas, jdenemar, jherrman, kchamart, kevin, lmen, mflusche, michele, mprivozn, mtessun, sasha, wznoinsk, xuzhang
Target Milestone: rcKeywords: Upstream, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-4.3.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Prior to this update, guest virtual machine actions that use a python library in some cases failed and "Hash operation not allowed during iteration" error messages were logged. Several redundant thread access checks have been removed, and the problem no longer occurs.
Story Points: ---
Clone Of:
: 1579460 1581364 (view as bug list) Environment:
Last Closed: 2018-10-30 09:55:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1579460, 1581364    
Attachments:
Description Flags
libvirtd logs
none
vbmc errors none

Description Derek Higgins 2018-05-09 14:03:52 UTC
Created attachment 1433877 [details]
libvirtd logs

Description of problem:
Using a python library[1] to control libvirt domains, domain actions are failing from time to time.

See logs attached, each of the power on failures in vbmc.errors appears to be associated with a "Hash operation not allowed during iteration" error in libvirtd.logs


1 - https://github.com/openstack/virtualbmc

Version-Release number of selected component (if applicable):

Red Hat Enterprise Linux Server release 7.4 (Maipo)
libvirt-3.9.0-14.el7_5.2.x86_64
libvirt-libs-3.9.0-14.el7_5.2.x86_64
libvirt-client-3.9.0-14.el7_5.2.x86_64


I'm told that this problem has only appeared (or at least gotten worse) with the move from RHEL 7.3 too 7.4, I'll attempt to verify this with more details.

Comment 2 Derek Higgins 2018-05-09 14:04:27 UTC
Created attachment 1433878 [details]
vbmc errors

Comment 3 Michal Privoznik 2018-05-09 14:11:10 UTC
I think this is fixed upstream by:

commit 4d7384eb9ddef2008cb0cc165eb808f74bc83d6b
Author:     Vincent Bernat <vincent>
AuthorDate: Tue Apr 10 08:27:15 2018 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Wed Apr 11 11:18:37 2018 +0200

    util: don't check for parallel iteration in hash-related functions
    
    This is the responsability of the caller to apply the correct lock
    before using these functions. Moreover, the use of a simple boolean
    was still racy: two threads may check the boolean and "lock" it
    simultaneously.
    
    Users of functions from src/util/virhash.c have to be checked for
    correctness. Lookups and iteration should hold a RO
    lock. Modifications should hold a RW lock.
    
    Most important uses seem to be covered. Callers have now a greater
    responsability, notably the ability to execute some operations while
    iterating were reliably forbidden before are now accepted.
    
    Signed-off-by: Vincent Bernat <vincent>

libvirt.git $ git describe --contains 4d7384eb9ddef2008cb0cc165eb808f74bc83d6b
v4.3.0-rc1~369

Comment 4 Kashyap Chamarthy 2018-05-09 14:20:39 UTC
A reproducer would be useful.

That said, seems like something in this area was very recently fixed in upstream libvirt, in tis commit:

    4d7384e -- "util: don't check for parallel iteration in 
    hash-related functions"

But note that the above commit isn't in the libvirt version (3.9.0, package: 14.el7_5.2) running in your setup.

Maybe Michal will be able to tell us a bit more.

Comment 5 Kashyap Chamarthy 2018-05-09 14:23:52 UTC
(In reply to Kashyap Chamarthy from comment #4)
> A reproducer would be useful.
> 
> That said, seems like something in this area was very recently fixed in
> upstream libvirt, in tis commit:
> 
>     4d7384e -- "util: don't check for parallel iteration in 
>     hash-related functions"
> 
> But note that the above commit isn't in the libvirt version (3.9.0, package:
> 14.el7_5.2) running in your setup.
> 
> Maybe Michal will be able to tell us a bit more.

Ah, Michal already pointed to the same commit in his earlier comment.  And he elaborated further on IRC:

    I think the problem is that we've switched to RW locks which allows
    multiple readers to work over list of domains (in fact a hash 
    table), however the rest of the code (hash table impl) had this
    check preventing access from multiple threads.

    For instance, if one thread is listing running VMs while the other 
    is fetching stats for all running VMs it may so happen that these
    threads will clash on the bogus check - hence the error message.

Comment 9 Michal Privoznik 2018-05-11 07:58:11 UTC
Moving to POST per comment 3.

Comment 10 yafu 2018-05-14 07:35:06 UTC
Reproduced with libvirt-3.9.0-14.el7_5.4.x86_64.

Test steps:
1.Start a guest:
#virsh start test1

2.Do 'virsh list' in a loop:
#for i in {1..1000}; do virsh list; done

3.Open another terminal, do 'virsh domstats' in a loop:
#for i in {1..1000}; do virsh domstats; done

4.Check the libvirtd.log:
#cat /var/log/libvirt/libvirtd.log | grep -i virhash
2018-05-14 07:28:48.812+0000: 2177: error : virHashForEach:597 : Hash operation not allowed during iteration
2018-05-14 07:28:50.426+0000: 2175: error : virHashForEach:597 : Hash operation not allowed during iteration
2018-05-14 07:29:19.708+0000: 2175: error : virHashForEach:597 : Hash operation not allowed during iteration

Comment 11 Bob Fournier 2018-05-14 20:56:15 UTC
Do you know when you will have a build available with this fix, or is it possible to get another scratch build built?  The output from the scratch build from comment no longer seems to be there.

Comment 13 yafu 2018-05-15 05:36:06 UTC
Can not reproduce the issue with the scratch build in comment 12.

Comment 14 Bob Fournier 2018-05-15 13:02:40 UTC
*** Bug 1571384 has been marked as a duplicate of this bug. ***

Comment 15 Bob Fournier 2018-05-15 14:06:00 UTC
Does this bz need to be cloned for 7.5?

Comment 17 Bob Fournier 2018-05-17 16:53:12 UTC
We would like this patch to be considered for 7.5.z.  This bug has been causing many failures of Openstack deployments that use Virtualbmc.  As shown in comments 7 and 8, this patch works well and fixes these deployment failures.

Comment 20 yafu 2018-06-01 08:16:33 UTC
Verified with libvirt-4.3.0-1.el7.x86_64.

Test steps:
1.Start a guest:
#virsh start test1

2.Do 'virsh list' in a loop:
#for i in {1..1000}; do virsh list; done

3.Open another terminal, do 'virsh domstats' in a loop:
#for i in {1..1000}; do virsh domstats; done

4.Check the libvirtd.log after step 2&3, no error "Hash operation not allowed during iteration":
#cat /var/log/libvirt/libvirtd.log | grep -i "Hash operation not allowed during iteration"
no output

Comment 22 errata-xmlrpc 2018-10-30 09:55:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3113