516773 – vgextend does not take blocking lock and fails if run concurrently with lvextend

Bug 516773 - vgextend does not take blocking lock and fails if run concurrently with lvextend

Summary: vgextend does not take blocking lock and fails if run concurrently with lvextend

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Milan Broz
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	523440
TreeView+	depends on / blocked

Reported:	2009-08-11 13:38 UTC by Ayal Baron
Modified:	2016-04-26 13:45 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-03-30 09:01:23 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2010:0298	0	normal	SHIPPED_LIVE	lvm2 bug fix and enhancement update	2010-03-29 15:16:34 UTC

Description Ayal Baron 2009-08-11 13:38:46 UTC

trying to extend vg (add a pv) fails when there are outstanding lvextend request instead of blocking on the vg lock (as do other lvm command inc. lvextend).

In our setup lvextend is run automatically whenever a virtual machine runs low on space.  The VMs run on many different hosts and all lvextend requests are sent to a single host.
If the vg has no free space left then many lvextend commands are continuously spawn (depending on the number of running VMs with low disk space in the entire system).
A user wishing to extend the VG will fail almost every time because there will be at least one outstanding lvextend command.

The correct behavior is to block on the lock as do the rest of the LVM commands.

Can be easily reproduced - spawn many lvextend commands when there is no free space on VG and try to extend VG.

Comment 4 Alasdair Kergon 2009-08-11 16:33:27 UTC

As far as LVM is concerned, vgextend is behaving as designed.   We have never
guaranteed that any commands that modify the consitution (ie PV list) of
VGs will succeed if other processes are run in parallel.  All we
guarantee is integrity, not success - running parallel processes will
never lead to corruption.  Commands that can't get locks should simply
be retried - this is our mechanism for avoiding deadlocks.  We do already intend to enhance the tools so that the retrying happens internally, and some preparatory changes are already in place, but until now, there has been no reason to give that work any particular priority.  I have never seen a report of a problem due to this before, even to the extent that I don't think I've needed to document it anywhere until now!

Comment 7 Alasdair Kergon 2009-08-11 16:57:57 UTC

Tools like lvextend are optimised on the assumption that they will normally succeed.  But here it is being used in a mode where it is commonly expected to fail and that gives suboptimal performance.  (It uses an expensive write lock, but if you expected it normally to fail, you'd use a cheap read lock instead, and only get the write lock later.)

Comment 8 Alasdair Kergon 2009-08-11 17:23:25 UTC

"The correct behavior is to block on the lock as do the rest of the LVM
commands."

There are two classes of LVM commands, ones which need just one lock, which block like you describe, and ones that need multiple locks at once, which behave as I described in comment #4.

The sequence in which multiple locks are obtained could be changed, but that means fixing one scenario (single VG) by breaking another (multiple independent VGs).


None of the ideas proposed so far (here or elsewhere) represent complete solutions.

Comment 9 Alasdair Kergon 2009-08-11 17:25:40 UTC

Please also provide the actual command lines used for the vgextend and the lvextends.

Since they are launched in parallel, presumably they use absolute sizes not relative sizes?

Comment 18 Yaniv Kaul 2009-09-14 15:01:51 UTC

We have not seen the past issues with this LVM on the Westford cluster for ~2 weeks now, so I guess we can say this is fixed.

Comment 19 Milan Broz 2009-09-15 12:38:00 UTC

Patch in lvm2-2_02_46-9_el5.

Comment 26 Corey Marthaler 2010-02-15 22:50:14 UTC

Marking this bug verified based on comment #18.

Comment 27 Ayal Baron 2010-02-16 11:16:58 UTC

Actually we fixed this in vdsm as well (comment #13) so Yaniv's comment does not verify this bug.

Comment 29 errata-xmlrpc 2010-03-30 09:01:23 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0298.html

Note You need to log in before you can comment on or make changes to this bug.