Bug 578413
| Summary: | vgremove does not take blocking lock and fails if run concurrently with other lvm commands | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Ayal Baron <abaron> | ||||||
| Component: | lvm2 | Assignee: | Milan Broz <mbroz> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Corey Marthaler <cmarthal> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 5.4 | CC: | agk, antillon.maurizio, cpelland, dwysocha, heinzm, iannis, iheim, jbrassow, mbroz, prockai, pvrabec | ||||||
| Target Milestone: | rc | Keywords: | ZStream | ||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2011-01-13 22:40:54 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 577624, 582232 | ||||||||
| Attachments: |
|
||||||||
|
Description
Ayal Baron
2010-03-31 09:00:16 UTC
(I expect the wait_for_locks = 1 is set in lvm.conf here - it was intended to be configurable.) please can you post ouptut of failing command with -vvvv, (and strace will probably help here too.) thx Created attachment 403697 [details]
strace output of failed vgremove command
Created attachment 403698 [details]
vgremove -vvvv output
ah, ok. I missed removing "on different vgs" - I can reproduce that easily, it is clear bug. just for the record, the host is configured with "wait_for_locks = 1". Patch sent to lvm-devel for review, one-liner, but it touch locking core... https://www.redhat.com/archives/lvm-devel/2010-March/msg00365.html Patch is now upstream, need some time for testing. Patch added to lvm2-2.02.56-9.el5 What am I doing wrong that I can't reproduce this issue? I created two different VGs, populated them with LVs, and then attempted to remove both simultaneously. I've tried with both VGs deactivated and with both activated, and by doing the remove from different nodes in the cluster and from the same node in the cluster. Every time the remove works fine. It is hard to reproduce, because it is race - I was able to reproduce only when stopped debugger before first vgremove unlocked orphan lock.
Maybe try run one loop where creating/removing VG1 and second loop doing the same in parallel for VG2 (different PVs), with local locking.
(explanation from commit log:)
This fixes problem with orphan locking, e.g.
vgremove VG1 | vgremove VG2
lock(VG1) | lock(VG2)
lock(ORPHAN) | lock(ORPHAN) -> fail, non-blocking
The multiple create/delete loops with local locking causes the issue fairly fast: /var/lock/lvm/P_orphans: flock failed: Resource temporarily unavailable Can't get lock for orphan PVs Fix verified in lvm2-2.02.56-9.el5. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0052.html |