578413 – vgremove does not take blocking lock and fails if run concurrently with other lvm commands

Bug 578413 - vgremove does not take blocking lock and fails if run concurrently with other lvm commands

Summary: vgremove does not take blocking lock and fails if run concurrently with other...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Milan Broz
QA Contact:	Corey Marthaler
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	577624 582232
TreeView+	depends on / blocked

Reported:	2010-03-31 09:00 UTC by Ayal Baron
Modified:	2014-03-17 01:51 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-01-13 22:40:54 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
strace output of failed vgremove command (14.69 KB, application/x-bzip) 2010-03-31 10:38 UTC, Cyril Plisko	no flags	Details
vgremove -vvvv output (3.03 KB, application/x-bzip) 2010-03-31 10:39 UTC, Cyril Plisko	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0052	0	normal	SHIPPED_LIVE	lvm2 bug fix and enhancement update	2011-01-12 17:15:25 UTC

Description Ayal Baron 2010-03-31 09:00:16 UTC

Description of problem:
Concurrently running 2 vgremove commands (on different vgs) causes one command to fail.  This also happens when running vgremove with pvs and probably other lvm commands.
See https://bugzilla.redhat.com/show_bug.cgi?id=516773 for similar issue.

Error is:
/var/lock/lvm/P_orphans: flock failed: Resource temporarily unavailable\n 
Can't get lock for orphan PVs\n"; <rc> = 5
This is blocking vdsm: https://bugzilla.redhat.com/show_bug.cgi?id=577624

How reproducible:
Always

Comment 1 Milan Broz 2010-03-31 09:07:34 UTC

(I expect the wait_for_locks = 1 is set in lvm.conf here - it was intended to be configurable.)

please can you post ouptut of failing command with -vvvv,
(and strace will probably help here too.)

thx

Comment 2 Cyril Plisko 2010-03-31 10:38:32 UTC

Created attachment 403697 [details]
strace output of failed vgremove command

Comment 3 Cyril Plisko 2010-03-31 10:39:01 UTC

Created attachment 403698 [details]
vgremove -vvvv output

Comment 4 Milan Broz 2010-03-31 11:03:04 UTC

ah, ok. I missed removing "on different vgs" - I can reproduce that easily, it is clear bug.

Comment 5 Ayal Baron 2010-03-31 12:07:49 UTC

just for the record, the host is configured with "wait_for_locks = 1".

Comment 6 Milan Broz 2010-03-31 14:08:18 UTC

Patch sent to lvm-devel for review, one-liner, but it touch locking core...
https://www.redhat.com/archives/lvm-devel/2010-March/msg00365.html

Comment 7 Milan Broz 2010-03-31 17:33:58 UTC

Patch is now upstream, need some time for testing.

Comment 8 Milan Broz 2010-04-13 16:16:34 UTC

Patch added to lvm2-2.02.56-9.el5

Comment 9 Corey Marthaler 2010-04-13 19:33:15 UTC

What am I doing wrong that I can't reproduce this issue?

I created two different VGs, populated them with LVs, and then attempted to remove both simultaneously. I've tried with both VGs deactivated and with both activated, and by doing the remove from different nodes in the cluster and from the same node in the cluster.

Every time the remove works fine.

Comment 10 Milan Broz 2010-04-13 19:45:43 UTC

It is hard to reproduce, because it is race - I was able to reproduce only when stopped debugger before first vgremove unlocked orphan lock.

Maybe try run one loop where creating/removing VG1 and second loop doing the same in parallel for VG2 (different PVs), with local locking.

(explanation from commit log:)

This fixes problem with orphan locking, e.g.
    vgremove VG1    |    vgremove VG2
    lock(VG1)       |    lock(VG2)
    lock(ORPHAN)    |    lock(ORPHAN) -> fail, non-blocking

Comment 11 Corey Marthaler 2010-04-13 21:09:27 UTC

The multiple create/delete loops with local locking causes the issue fairly fast:

  /var/lock/lvm/P_orphans: flock failed: Resource temporarily unavailable
  Can't get lock for orphan PVs

Comment 12 Corey Marthaler 2010-04-13 21:15:20 UTC

Fix verified in lvm2-2.02.56-9.el5.

Comment 15 errata-xmlrpc 2011-01-13 22:40:54 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0052.html

Note You need to log in before you can comment on or make changes to this bug.