Bug 789766

Summary:	#570359 patch introduces large udevadm settle waiting time when connecting to libvirtd
Product:	Red Hat Enterprise Linux 6	Reporter:	Pieter Hollants <pieter>
Component:	libvirt	Assignee:	John Ferlan <jferlan>
Status:	CLOSED WONTFIX	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	6.4	CC:	acathrow, cwei, dallan, dyuan, mzhan, shyu, ydu, zpeng
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-04-16 13:05:52 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pieter Hollants 2012-02-12 21:51:51 UTC

Description of problem:
In Bug #570359, a patch was introduced to minimize the possibility of a potential race with udev when running "lvreduce". This patch introduced a call to "udevadm settle".
However, the default timeout of "udevadm settle" is 120 seconds. Now this call is also made when connecting to libvirtd, for EACH storage pool defined, causing eg. virt-manager to encounter a delay of e.g 4 minutes for 2 volumes, rendering the impression that virt-manager hangs.

Version-Release number of selected component (if applicable):



Steps to Reproduce:
1. Define at least one storage pool with a LVM volume group.
2. Disconnect virt-manager
3. Try to reconnect virt-manager
  
Actual results:
n-times the default timeout of "udevadm settle" (n = number of defined storage pools)

Expected results:
Connecting without delay.
libvirt should:
a.) distinguish in more detail when udevadm needs to be called. Certainly not on a connect of a client.
b.) pass a "--timeout 15" argument or similar to "udevadm settle" and allow clients to give the user visual feedback about the ongoing settle operation

Additional info:

Comment 2 Osier Yang 2012-04-27 03:38:41 UTC

(In reply to comment #0)
> Description of problem:
> In Bug #570359, a patch was introduced to minimize the possibility of a
> potential race with udev when running "lvreduce". This patch introduced a call
> to "udevadm settle".
> However, the default timeout of "udevadm settle" is 120 seconds. Now this call
> is also made when connecting to libvirtd, for EACH storage pool defined,
> causing eg. virt-manager to encounter a delay of e.g 4 minutes for 2 volumes,

Guess you mean "2 pools".

> rendering the impression that virt-manager hangs.
> 
> Version-Release number of selected component (if applicable):
> 
> 
> 
> Steps to Reproduce:
> 1. Define at least one storage pool with a LVM volume group.
> 2. Disconnect virt-manager
> 3. Try to reconnect virt-manager
> 
> Actual results:
> n-times the default timeout of "udevadm settle" (n = number of defined storage
> pools)
> 
> Expected results:
> Connecting without delay.
> libvirt should:
> a.) distinguish in more detail when udevadm needs to be called. Certainly not
> on a connect of a client.

I guess it's caused by virt-manager explicitly request to refresh the pool
when "checking storage"? and all of storage backends "logical", "disk", "scsi", "mpath" uses "udevadm settle" when refreshing the pool. Should virt-manager not
try to refresh the pool while connecting, but after the connection is got?

> b.) pass a "--timeout 15" argument or similar to "udevadm settle" and allow
> clients to give the user visual feedback about the ongoing settle operation
> 

It could cause virt-manager got the incomplete volumes information I guess, though yes, an argument for "udevadm settle" is a good idea, and I think it
will be useful in many places.

Osier

Comment 3 Pieter Hollants 2012-05-01 21:22:05 UTC

> > However, the default timeout of "udevadm settle" is 120 seconds. Now this call
> > is also made when connecting to libvirtd, for EACH storage pool defined,
> > causing eg. virt-manager to encounter a delay of e.g 4 minutes for 2 volumes,
> 
> Guess you mean "2 pools".

Yes.

> > Expected results:
> > Connecting without delay.
> > libvirt should:
> > a.) distinguish in more detail when udevadm needs to be called. Certainly not
> > on a connect of a client.
> 
> I guess it's caused by virt-manager explicitly request to refresh the pool
> when "checking storage"? and all of storage backends "logical", "disk", "scsi",
> "mpath" uses "udevadm settle" when refreshing the pool. Should virt-manager not
> try to refresh the pool while connecting, but after the connection is got?

For example, but that is a technical detail and not the core issue. The core issue is that the user knows what's going on. So even if the refresh is done after the connection was got, the protocol between virt-manager and libvirtd should support appropriate indication to virt-manager. So that virt-manager shows a message box about the pending refresh.

> > b.) pass a "--timeout 15" argument or similar to "udevadm settle" and allow
> > clients to give the user visual feedback about the ongoing settle operation
> 
> It could cause virt-manager got the incomplete volumes information I guess,
> though yes, an argument for "udevadm settle" is a good idea, and I think it
> will be useful in many places.

Of course, the exact value is a tradeoff between waiting time and the time necessary to cope with the longest pending refresh operation. However, I wouldn't know which backend needed more than 15 seconds.

Comment 4 Osier Yang 2012-05-28 10:06:10 UTC

Similiar problem discussed in upstream:

https://www.redhat.com/archives/libvir-list/2012-April/msg01215.html

Comment 11 John Ferlan 2014-04-10 17:43:54 UTC

A pointer to an "interesting" discussion about how/why settle is required, especially for libvirt use starts here:

http://lists.freedesktop.org/archives/systemd-devel/2013-July/011826.html

And the conclusion and direct analysis of why it's necessary is here:

http://lists.freedesktop.org/archives/systemd-devel/2013-July/011845.html

It seems that for the most part udevadm settle can/will return fairly quickly; however, there are instances where it needs to take some time to make sure all devices are properly vetted or there is some configuration issue that needs to be fixed/resolved.

Thus it would seem in this case, that perhaps it wasn't necessarily the problem that libvirt or virt-manager has an lvm pool to connect to, but rather there was something wrong in the udev database.  Of course too much time has gone by to really determine that for sure. When there's no issue present, the connection certainly doesn't take two minutes.

While this isn't necessarily a libvirt proper bug, there may be some options to lessen the pain or at least provide some sort of message/indication that there's potentially something wrong with the udev configuration that needs to be addressed.  The message may not make it to virt-manager, but could be sent to syslog/messages that there is a potential problem.

The udevadm utility seems to require too much knowledge of the target configuration order to be useful in a generic purpose. Even though it claims to print the list of events being waited on if the timeout expires, I didn't see that with one example:

http://lists.freedesktop.org/archives/systemd-devel/2013-July/011829.html

It does return a 0 or 1 based on whether there are events being waited upon and that might be able to be used.

Comment 12 John Ferlan 2014-04-16 13:05:52 UTC

Although an adjustment to the logic was proposed upstream, see:

http://www.redhat.com/archives/libvir-list/2014-April/msg00661.html

it was not accepted for good reasons.  This really is not a libvirt bug per se and the proposed change was more of a workaround for specific paths than anything else. Even supplying a message when the code times out could be a false positive considering libvirt doesn't have a mechanism to determine what event or issue is causing the udevadm settle code to wait. 

The underlying cause is that the udev event code is busy working through changes to its managed space. It is at the mercy of the underlying hardware architecture to process requests in a timely manner and properly message at a system level when/if something is wrong.

To address the original bug requests on what libvirt should do:

>Expected results:
>Connecting without delay.
>libvirt should:
>a.) distinguish in more detail when udevadm needs to be called. Certainly not >on a connect of a client.

Technically, the settle call occurs during the refresh operation of a started (or autostarted) storage pool. Essentially virt-manager is connecting to the target, finding a storage pool started, and refreshing the data within the pool.  Libvirt is only handling those requests.  Now, perhaps there could be a feature added (or one that currently exists, I'm not sure) to not refresh the storage pool on virt-manager startup/connect. That would alleviate the concern or issue regarding the pause.  However, that is a virt-manager issue and would require a separate RFE to virt-manager to inhibit the pool refresh at startup time.  In the long run, libvirt is only doing what it's told to do. Another obvious alternative is to stop the storage pool before starting virt-manager, but I have doubts that that is a reasonable action.

>b.) pass a "--timeout 15" argument or similar to "udevadm settle" and allow >clients to give the user visual feedback about the ongoing settle operation

I'm sure there are those that'd find 15 seconds too long. However, as is pointed out in the posted patch request and the links to the systemd-devel forum topics regarding settle timeout - the timeout doesn't really matter if something is wrong or udevadm is busy processing events.  Allowing udevadm and the underlying hardware architecture to be the messengers of any issues is better than false positives coming from libvirt.


I am closing this as wontfix.  If you feel strongly that virt-manager could do something, then I suggest opening a new bz for a feature request that would allow virt-manager to inhibit automatic pool refresh at connect time using some configuration option.