Bug 1711789 - Starting a libvirt pool blocks listing pools
Summary: Starting a libvirt pool blocks listing pools
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: 8.0
Assignee: Michal Privoznik
QA Contact: gaojianan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-20 06:32 UTC by YunmingYang
Modified: 2020-11-19 08:59 UTC (History)
9 users (show)

Fixed In Version: libvirt-6.0.0-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-05 09:46:14 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description YunmingYang 2019-05-20 06:32:37 UTC
Description of problem:
If create storage pool failed with a timeout error(such as inputting a non existent IP when create a NFS/ISCSI storage pool), there will be an error which is 'Error message: method call Create timed out'.Then, refresh the page, and it could not get any information about the storage pool for a long time.

Version-Release number of selected component (if applicable):
cockpit-191-1.el7.x86_64
cockpit-bridge-191-1.el7.x86_64
cockpit-ws-191-1.el7.x86_64
cockpit-machines-193-1.el7.x86_64
cockpit-system-191-1.el7.noarch
libvirt-dbus-1.3.0-1.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a NFS/ISCSI storage pool with a non existent IP.

Actual results:
1. cockpit-machines could not get any information about storage pool for a long time.

Expected results:
1. cockpit-machines could show the information about storage pool correctly. 

Additional info:

Comment 2 Katerina Koukiou 2019-05-20 13:57:18 UTC
This is a libvirt issue. Rewriting the description for the libvirt devs.

Description of the problem:
When trying to start a storage pool and the operation is going to timeout (for example when pool's source is unreachable) the StoragePoolList API can't be used in parallel.

How reproducible:
100%

Steps to Reproduce:
1: Define a NFS pool with unreachable source.

virsh pool-define /tmp/netfs-pool.xml

Example XML:
 
<pool type='netfs'>
  <name>nfsimages</name>
  <source>
    <host name='127.0.0.10'/>
    <dir path='/tmp/nfs-pool-dir'/>
    <format type='nfs'/>
  </source>
  <target>
    <path>/mnt</path>
  </target>
</pool>


2: Try to start the pool, the operation with hang and eventually timeout

virsh pool-start nfsimages

3: On a second terminal tab try to list the existing storage pools

virsh pool-list --all

Actual results:

The operation to list the storage pools just hangs until the creation operation exits with timeout

Expected results:

StoragePoolCreate and ListAllStoragePools API should not contend on the same mutex, but they should be allowed to run in parallel

Comment 3 Michal Privoznik 2019-05-21 10:58:12 UTC
As discussed on IRC yesterday, this indeed is a locking problem. When libvirt is starting a pool, the pool object's lock is acquired. Then, depending on pool's backend libvirt does whatever is needed to start the pool - in this case it calls 'mount -t nfs ...' (with the lock held). This is suboptimal becasue it effectively blocks another thread that is trying to get list of storage pools. In order to access any kind of pool's info (be it name, uuid, CheckACL() call) the pool object MUST be locked.

Comment 4 Michal Privoznik 2019-05-24 14:36:37 UTC
Patches posted upstream:

https://www.redhat.com/archives/libvir-list/2019-May/msg00731.html

Comment 5 Michal Privoznik 2019-08-23 07:39:21 UTC
And finally pushed upstream:

985f035fbf storage: Drop and reacquire pool obj lock in some backends
13284a6b83 storage_driver: Protect pool def during startup and build
9342bc626b storagePoolCreateXML: Don't lose persistent storage on failed create
1340327f48 virstorageobj: Introduce VIR_STORAGE_POOL_OBJ_LIST_ADD_LIVE flag
bc281fec0f virStoragePoolObjListAdd: Separate out definition assignment
8c04707058 virStoragePoolObjListAdd: Turn boolean arg into flags
7e08447e8f virstorageobj: Rename virStoragePoolObjAssignDef
c7df2437d2 virStoragePoolUpdateInactive: Don't call virStoragePoolObjEndAPI
62ec38518f virStoragePoolUpdateInactive: Fix variable name in comment
e1cb98b4e9 virStoragePoolObjListForEach: Grab a reference for pool object
c63315789f virStoragePoolObjRemove: Don't unlock pool object upon return

v5.6.0-311-g985f035fbf

Comment 7 Michal Privoznik 2020-01-13 14:58:18 UTC
Moving to POST per comment 5.

Comment 9 gaojianan 2020-02-03 07:43:06 UTC
Verified on :
libvirt-6.0.0-2.virtcov.el8.x86_64

Step:
1.Define and start a pool with the operation is going to timeout
# cat pool.xml1
<pool type='netfs'>
  <name>nfsimages</name>
  <source>
    <host name='$ip'/>
    <dir path='/nfs'/>
    <format type='nfs'/>
  </source>
  <target>
    <path>/mnt</path>
  </target>
</pool>

[root@jgao-test1 ~]# virsh pool-define pool.xml1 
Pool nfsimages defined from pool.xml1

[root@jgao-test1 ~]# virsh pool-start nfsimages 

2.In second terminal:
# virsh pool-list --all
 Name        State      Autostart
-----------------------------------
 default     active     no
 images-1    active     yes
 luks        active     no
 nfsimages   inactive   no

Can still show the result including the nfsimages pool.

Work as expected

Comment 11 errata-xmlrpc 2020-05-05 09:46:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017


Note You need to log in before you can comment on or make changes to this bug.