Bug 1711789
Summary: | Starting a libvirt pool blocks listing pools | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | YunmingYang <yunyang> |
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> |
Status: | CLOSED ERRATA | QA Contact: | gaojianan <jgao> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 8.1 | CC: | jdenemar, jsuchane, leiwang, mpitt, qiyuan, rbalakri, wshi, xuzhang, yalzhang |
Target Milestone: | rc | Keywords: | Upstream |
Target Release: | 8.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | libvirt-6.0.0-1.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-05 09:46:14 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
YunmingYang
2019-05-20 06:32:37 UTC
This is a libvirt issue. Rewriting the description for the libvirt devs. Description of the problem: When trying to start a storage pool and the operation is going to timeout (for example when pool's source is unreachable) the StoragePoolList API can't be used in parallel. How reproducible: 100% Steps to Reproduce: 1: Define a NFS pool with unreachable source. virsh pool-define /tmp/netfs-pool.xml Example XML: <pool type='netfs'> <name>nfsimages</name> <source> <host name='127.0.0.10'/> <dir path='/tmp/nfs-pool-dir'/> <format type='nfs'/> </source> <target> <path>/mnt</path> </target> </pool> 2: Try to start the pool, the operation with hang and eventually timeout virsh pool-start nfsimages 3: On a second terminal tab try to list the existing storage pools virsh pool-list --all Actual results: The operation to list the storage pools just hangs until the creation operation exits with timeout Expected results: StoragePoolCreate and ListAllStoragePools API should not contend on the same mutex, but they should be allowed to run in parallel As discussed on IRC yesterday, this indeed is a locking problem. When libvirt is starting a pool, the pool object's lock is acquired. Then, depending on pool's backend libvirt does whatever is needed to start the pool - in this case it calls 'mount -t nfs ...' (with the lock held). This is suboptimal becasue it effectively blocks another thread that is trying to get list of storage pools. In order to access any kind of pool's info (be it name, uuid, CheckACL() call) the pool object MUST be locked. Patches posted upstream: https://www.redhat.com/archives/libvir-list/2019-May/msg00731.html And finally pushed upstream: 985f035fbf storage: Drop and reacquire pool obj lock in some backends 13284a6b83 storage_driver: Protect pool def during startup and build 9342bc626b storagePoolCreateXML: Don't lose persistent storage on failed create 1340327f48 virstorageobj: Introduce VIR_STORAGE_POOL_OBJ_LIST_ADD_LIVE flag bc281fec0f virStoragePoolObjListAdd: Separate out definition assignment 8c04707058 virStoragePoolObjListAdd: Turn boolean arg into flags 7e08447e8f virstorageobj: Rename virStoragePoolObjAssignDef c7df2437d2 virStoragePoolUpdateInactive: Don't call virStoragePoolObjEndAPI 62ec38518f virStoragePoolUpdateInactive: Fix variable name in comment e1cb98b4e9 virStoragePoolObjListForEach: Grab a reference for pool object c63315789f virStoragePoolObjRemove: Don't unlock pool object upon return v5.6.0-311-g985f035fbf Verified on : libvirt-6.0.0-2.virtcov.el8.x86_64 Step: 1.Define and start a pool with the operation is going to timeout # cat pool.xml1 <pool type='netfs'> <name>nfsimages</name> <source> <host name='$ip'/> <dir path='/nfs'/> <format type='nfs'/> </source> <target> <path>/mnt</path> </target> </pool> [root@jgao-test1 ~]# virsh pool-define pool.xml1 Pool nfsimages defined from pool.xml1 [root@jgao-test1 ~]# virsh pool-start nfsimages 2.In second terminal: # virsh pool-list --all Name State Autostart ----------------------------------- default active no images-1 active yes luks active no nfsimages inactive no Can still show the result including the nfsimages pool. Work as expected Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017 |