Description of problem: If create storage pool failed with a timeout error(such as inputting a non existent IP when create a NFS/ISCSI storage pool), there will be an error which is 'Error message: method call Create timed out'.Then, refresh the page, and it could not get any information about the storage pool for a long time. Version-Release number of selected component (if applicable): cockpit-191-1.el7.x86_64 cockpit-bridge-191-1.el7.x86_64 cockpit-ws-191-1.el7.x86_64 cockpit-machines-193-1.el7.x86_64 cockpit-system-191-1.el7.noarch libvirt-dbus-1.3.0-1.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create a NFS/ISCSI storage pool with a non existent IP. Actual results: 1. cockpit-machines could not get any information about storage pool for a long time. Expected results: 1. cockpit-machines could show the information about storage pool correctly. Additional info:
This is a libvirt issue. Rewriting the description for the libvirt devs. Description of the problem: When trying to start a storage pool and the operation is going to timeout (for example when pool's source is unreachable) the StoragePoolList API can't be used in parallel. How reproducible: 100% Steps to Reproduce: 1: Define a NFS pool with unreachable source. virsh pool-define /tmp/netfs-pool.xml Example XML: <pool type='netfs'> <name>nfsimages</name> <source> <host name='127.0.0.10'/> <dir path='/tmp/nfs-pool-dir'/> <format type='nfs'/> </source> <target> <path>/mnt</path> </target> </pool> 2: Try to start the pool, the operation with hang and eventually timeout virsh pool-start nfsimages 3: On a second terminal tab try to list the existing storage pools virsh pool-list --all Actual results: The operation to list the storage pools just hangs until the creation operation exits with timeout Expected results: StoragePoolCreate and ListAllStoragePools API should not contend on the same mutex, but they should be allowed to run in parallel
As discussed on IRC yesterday, this indeed is a locking problem. When libvirt is starting a pool, the pool object's lock is acquired. Then, depending on pool's backend libvirt does whatever is needed to start the pool - in this case it calls 'mount -t nfs ...' (with the lock held). This is suboptimal becasue it effectively blocks another thread that is trying to get list of storage pools. In order to access any kind of pool's info (be it name, uuid, CheckACL() call) the pool object MUST be locked.
Patches posted upstream: https://www.redhat.com/archives/libvir-list/2019-May/msg00731.html
And finally pushed upstream: 985f035fbf storage: Drop and reacquire pool obj lock in some backends 13284a6b83 storage_driver: Protect pool def during startup and build 9342bc626b storagePoolCreateXML: Don't lose persistent storage on failed create 1340327f48 virstorageobj: Introduce VIR_STORAGE_POOL_OBJ_LIST_ADD_LIVE flag bc281fec0f virStoragePoolObjListAdd: Separate out definition assignment 8c04707058 virStoragePoolObjListAdd: Turn boolean arg into flags 7e08447e8f virstorageobj: Rename virStoragePoolObjAssignDef c7df2437d2 virStoragePoolUpdateInactive: Don't call virStoragePoolObjEndAPI 62ec38518f virStoragePoolUpdateInactive: Fix variable name in comment e1cb98b4e9 virStoragePoolObjListForEach: Grab a reference for pool object c63315789f virStoragePoolObjRemove: Don't unlock pool object upon return v5.6.0-311-g985f035fbf
Moving to POST per comment 5.
Verified on : libvirt-6.0.0-2.virtcov.el8.x86_64 Step: 1.Define and start a pool with the operation is going to timeout # cat pool.xml1 <pool type='netfs'> <name>nfsimages</name> <source> <host name='$ip'/> <dir path='/nfs'/> <format type='nfs'/> </source> <target> <path>/mnt</path> </target> </pool> [root@jgao-test1 ~]# virsh pool-define pool.xml1 Pool nfsimages defined from pool.xml1 [root@jgao-test1 ~]# virsh pool-start nfsimages 2.In second terminal: # virsh pool-list --all Name State Autostart ----------------------------------- default active no images-1 active yes luks active no nfsimages inactive no Can still show the result including the nfsimages pool. Work as expected
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017