Bug 733522 - --virtualsize device creation shouldn't cause errors when attempted with cluster VGs
--virtualsize device creation shouldn't cause errors when attempted with clus...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.1
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Milan Broz
Corey Marthaler
:
Depends On:
Blocks: 756082
  Show dependency treegraph
 
Reported: 2011-08-25 17:48 EDT by Corey Marthaler
Modified: 2013-02-28 23:10 EST (History)
11 users (show)

See Also:
Fixed In Version: lvm2-2.02.95-1.el6
Doc Type: Bug Fix
Doc Text:
Previously, if snapshot with virtual origin was created in clustered VG, it incorrectly tried to activate on other nodes as well and command failed with "Error locking on node" error messages. This has been fixed and snapshot with virtual origin (using --virtualsize) is now properly activated exclusively only (on local node).
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-20 10:59:45 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2011-08-25 17:48:16 EDT
Description of problem:
Locking issues end up occurring when virt LVs are created on cluster VGs 

[root@grant-01 ~]#  lvcreate --virtualsize 10T -L100M -n LV test
  Logical volume "LV" created

[root@grant-01 ~]#  lvcreate --virtualsize 10T -L100M -n LV1 test
  Logical volume "LV1" created

[root@grant-01 ~]# vgs
  VG         #PV #LV #SN Attr   VSize   VFree  
  test        10   2   2 wz--nc 476.78g 476.58g
  vg_grant01   1   3   0 wz--n-  74.01g      0 

[root@grant-01 ~]# lvs -a -o +devices
  LV            VG         Attr   LSize   Origin        Snap%   Devices         
  LV            test       swi-a- 100.00m [LV_vorigin]    0.00  /dev/sdb1(0)
  LV1           test       swi-a- 100.00m [LV1_vorigin]   0.00  /dev/sdb1(25)
  [LV1_vorigin] test       vwi-a-  10.00t 
  [LV_vorigin]  test       vwi-a-  10.00t 

[root@grant-01 ~]# lvremove test/LV
Do you really want to remove active clustered logical volume LV? [y/n]: y
  Error locking on node grant-03: Unable to deactivate open test-LV_vorigin-real (253:4)
  Error locking on node grant-02: Unable to deactivate open test-LV_vorigin-real (253:4)
  Failed to resume LV_vorigin.


# Another node in the cluster
[root@grant-03 ~]# lvs -a -o +devices
[DEADLOCK]


Version-Release number of selected component (if applicable):
2.6.32-191.el6.x86_64

lvm2-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-libs-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-cluster-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
udev-147-2.37.el6    BUILT: Wed Aug 10 07:48:15 CDT 2011
device-mapper-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
cmirror-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
Comment 1 Robert Peterson 2011-08-26 11:04:06 EDT
I just got some of the original symptoms on the vg, and it was
NOT clustered this time.  Observe:

[root@intec1 ../gfs2]# lvremove /dev/intec/intec 
Do you really want to remove active logical volume intec? [y/n]: y
  Logical volume "intec" successfully removed
[root@intec1 ../gfs2]# vgs
  VG        #PV #LV #SN Attr   VSize   VFree  
  intec       1   0   0 wz--n- 400.00g 400.00g
  sasdrives   5   1   0 wz--n- 683.65g 433.65g
[root@intec1 ../gfs2]# lvcreate --virtualsize 10T -L398G -n intec intec
  Unable to create LV intec_vorigin in Volume Group intec: name already in use.
  Couldn't create virtual origin for LV intec

Maybe the symptom shown here is a new bug?
Comment 2 Milan Broz 2011-08-26 11:16:48 EDT
I think we do not need new bug for that, the problem with local VG appears when LV is repeatedly added/removed. Al these things should be fixed in one patch.
Comment 3 Milan Broz 2011-09-14 15:48:33 EDT
I hope it is fixed upstream now. But there are still other problems, like bug #738435.
Comment 4 Corey Marthaler 2011-09-15 15:34:36 EDT
This does not appear to be fully fixed with the latest scratch rpms.

1. virtualsize devices are still allowed to be created on clustered VGs

2. the removal of these virtualsize devices can still fail due to locking errors

However, it no longer deadlocks like in the original report.

[root@grant-01 ~]# lvcreate --virtualsize 10T -L100M -n LV test
  Logical volume "LV" created
[root@grant-01 ~]# lvcreate --virtualsize 10T -L100M -n LV1 test
  Logical volume "LV1" created

[root@grant-01 ~]# vgs
  VG         #PV #LV #SN Attr   VSize   VFree  
  test         6   2   2 wz--nc 476.80g 476.61g
[root@grant-01 ~]# lvs -a -o +devices
  LV            VG    Attr   LSize   Origin        Snap%  Devices
  LV            test  swi-a- 100.00m [LV_vorigin]    0.00 /dev/sdb1(0)
  LV1           test  swi-a- 100.00m [LV1_vorigin]   0.00 /dev/sdb1(25)
  [LV1_vorigin] test  vwi-a-  10.00t
  [LV_vorigin]  test  vwi-a-  10.00t

[root@grant-01 ~]# lvremove test/LV
Do you really want to remove active clustered logical volume LV? [y/n]: y
  Logical volume "LV" successfully removed
[root@grant-01 ~]# lvremove test/LV1
Do you really want to remove active clustered logical volume LV1? [y/n]: y
  Error locking on node grant-01: LV test/LV1 in use: not deactivating
  Unable to deactivate logical volume "LV1"


2.6.32-195.el6.x86_64

lvm2-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
lvm2-libs-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
lvm2-cluster-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
udev-147-2.38.el6    BUILT: Fri Sep  9 16:25:50 CDT 2011
device-mapper-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-libs-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-event-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-event-libs-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
cmirror-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
Comment 5 Milan Broz 2011-09-15 16:23:24 EDT
(In reply to comment #4)
> 1. virtualsize devices are still allowed to be created on clustered VGs

that should be ok, but it must be activated exclusively only (the same as snapshots)

> 2. the removal of these virtualsize devices can still fail due to locking
> errors

The fix is not complete in that build and I think part of of need to be moved to 6.3. Let's test it with next build.
Comment 7 Corey Marthaler 2011-09-21 18:53:56 EDT
Looks like this may need to be pushed off until 6.3.

[root@grant-01 ~]# vgcreate test /dev/sd[bc][123]
  Clustered volume group "test" successfully created
[root@grant-01 ~]#  while true; do  echo "**********"; lvcreate --virtualsize 10T -L100M -n LV test; lvcreate --virtualsize 10T -L100M -n LV1 test; sleep 1;  lvremove -f test/LV;  lvremove -f test/LV1; done
**********
  Logical volume "LV" created
  Logical volume "LV1" created
  Logical volume "LV" successfully removed
  Logical volume "LV1" successfully removed
**********
  Logical volume "LV" created
  Logical volume "LV1" created
  Error locking on node grant-01: Unable to deactivate open test-LV (253:3)
  Unable to deactivate logical volume "LV"
  Logical volume "LV1" successfully removed
**********
  Logical volume "LV" already exists in volume group "test"
  Logical volume "LV1" created
  Logical volume "LV" successfully removed
  Logical volume "LV1" successfully removed
**********
  Unable to create LV LV_vorigin in Volume Group test: name already in use.
  Couldn't create virtual origin for LV LV
  Logical volume "LV1" created
  One or more specified logical volume(s) not found.
  Logical volume "LV1" successfully removed
**********
  Unable to create LV LV_vorigin in Volume Group test: name already in use.
  Couldn't create virtual origin for LV LV
  Couldn't find device with uuid 73CYYu-J574-IL7l-To0v-CQN0-AIxy-RsGgka.
  Cannot change VG test while PVs are missing.
  Consider vgreduce --removemissing.
  One or more specified logical volume(s) not found.
  One or more specified logical volume(s) not found.


2.6.32-198.el6.x86_64

lvm2-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
lvm2-libs-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
lvm2-cluster-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
udev-147-2.38.el6    BUILT: Fri Sep  9 16:25:50 CDT 2011
device-mapper-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
device-mapper-libs-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
device-mapper-event-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
device-mapper-event-libs-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
cmirror-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
Comment 8 Peter Rajnoha 2011-09-22 09:02:51 EDT
OK, this needs more fixing. Moving to 6.3.
Comment 11 Corey Marthaler 2011-12-19 15:25:21 EST
Adding QA ack for 6.3. 

Devel will need to provide unit testing results however before this bug can be
ultimately verified by QA.
Comment 12 Peter Rajnoha 2012-02-16 07:10:56 EST
The --virtualsize snapshot is now correctly "exclusively activated" on one node only. Though I can't hit the error in comment #7 (neither Milan managed to reproduce it unfortunately).

Corey, please try the latest test build (the one that is built automatically from upstream) and see if you can still hit it. If not, we could close this bug then as the original problem is resolved already (and the remaining problematic part is tracked by bug #738435).
Comment 15 Corey Marthaler 2012-04-03 17:02:45 EDT
Fix verified in the latest rpms.

2.6.32-220.4.2.el6.x86_64
lvm2-2.02.95-3.el6    BUILT: Fri Mar 30 09:54:10 CDT 2012
lvm2-libs-2.02.95-3.el6    BUILT: Fri Mar 30 09:54:10 CDT 2012
lvm2-cluster-2.02.95-3.el6    BUILT: Fri Mar 30 09:54:10 CDT 2012
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.74-3.el6    BUILT: Fri Mar 30 09:54:10 CDT 2012
device-mapper-libs-1.02.74-3.el6    BUILT: Fri Mar 30 09:54:10 CDT 2012
device-mapper-event-1.02.74-3.el6    BUILT: Fri Mar 30 09:54:10 CDT 2012
device-mapper-event-libs-1.02.74-3.el6    BUILT: Fri Mar 30 09:54:10 CDT 2012
cmirror-2.02.95-3.el6    BUILT: Fri Mar 30 09:54:10 CDT 2012


[...]
**********
  Logical volume "LV" created
  Logical volume "LV1" created
  Logical volume "LV" successfully removed
  Logical volume "LV1" successfully removed
**********
  Logical volume "LV" created
  Logical volume "LV1" created
  Logical volume "LV" successfully removed
  Logical volume "LV1" successfully removed
**********
  Logical volume "LV" created
  Logical volume "LV1" created
  Logical volume "LV" successfully removed
  Logical volume "LV1" successfully removed
**********
[...]
Comment 16 Milan Broz 2012-04-24 14:03:56 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, if snapshot with virtual origin was created in clustered VG, it incorrectly tried to activate on other nodes as well and command failed with "Error locking on node" error messages.

This has been fixed and snapshot with virtual origin (using --virtualsize) is now properly activated exclusively only (on local node).
Comment 18 errata-xmlrpc 2012-06-20 10:59:45 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0962.html

Note You need to log in before you can comment on or make changes to this bug.