Bug 672314

Summary: Check for suitability of LV segment types before changing the cluster attribute of a VG containing cmirrors
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED ERRATA QA Contact: Corey Marthaler <cmarthal>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.0CC: agk, dwysocha, heinzm, jbrassow, joe.thornber, mbroz, prajnoha, prockai, syeghiay
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.95-9.el6 Doc Type: Bug Fix
Doc Text:
Some LVM segment types, like "mirror", have single machine and cluster-aware variants. Others, like snapshot and the RAID types, have only single machine variants. When switching the cluster attribute of a volume group, the aforementioned segment types must be inactive. This allows for the re-loading of the appropriate single machine or cluster variant, or for the necessity of the activation to be exclusive in nature.
Story Points: ---
Clone Of:
: 822213 (view as bug list) Environment:
Last Closed: 2012-06-20 14:51:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 697866, 756082, 822213    

Description Corey Marthaler 2011-01-24 19:40:30 UTC
Description of problem:
This is similar to rhel4 bug 289331. When moving a clustered volume to the local domain, that volume isn't deactivated on the other nodes in the cluster. Thus, it needs to be deactivated by hand before is can be of any use.

# Change from cluster to local
[root@grant-01 ~]# pvscan
  PV /dev/sdb1                      lvm2 [34.06 GiB]
  PV /dev/sdb2                      lvm2 [34.06 GiB]
  PV /dev/sdb3                      lvm2 [34.06 GiB]
  PV /dev/sdc1                      lvm2 [45.41 GiB]
  PV /dev/sdc2                      lvm2 [45.41 GiB]
  PV /dev/sdc3                      lvm2 [45.41 GiB]
[root@grant-01 ~]# vgcreate grant /dev/sd[bc][123]
  Clustered volume group "grant" successfully created
[root@grant-01 ~]# lvcreate -n lv -L 100M grant
  Logical volume "lv" created
[root@grant-01 ~]# lvs -a -o +devices
  LV      VG         Attr   LSize   Devices
  lv      grant      -wi-a- 100.00m /dev/sdb1(0)
[root@grant-01 ~]# vgs
  VG         #PV #LV #SN Attr   VSize   VFree
  grant        6   1   0 wz--nc 238.38g 238.29g
[root@grant-01 ~]# vgchange -cn grant
  Volume group "grant" successfully changed
[root@grant-01 ~]# vgchange -an grant
  0 logical volume(s) in volume group "grant" now active
[root@grant-01 ~]# lvs -a -o +devices
  LV      VG         Attr   LSize   Devices
  lv      grant      -wi--- 100.00m /dev/sdb1(0)


# Need to deactivate by hand on other nodes
[root@grant-02 ~]# lvs -a -o +devices
  LV      VG         Attr   LSize   Devices
  lv      grant      -wi-a- 100.00m /dev/sdb1(0)
[root@grant-02 ~]# vgchange -an grant
  0 logical volume(s) in volume group "grant" now active


# Unlike bug #289331, the volume is now usable at this point
[root@grant-01 ~]# vgchange -ay grant
  1 logical volume(s) in volume group "grant" now active
[root@grant-01 ~]# lvs -a -o +devices
  LV      VG         Attr   LSize   Devices
  lv      grant      -wi-a- 100.00m /dev/sdb1(0)


Version-Release number of selected component (if applicable):
2.6.32-71.el6.x86_64

lvm2-2.02.72-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010
lvm2-libs-2.02.72-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010
lvm2-cluster-2.02.72-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010
udev-147-2.29.el6    BUILT: Tue Aug 31 16:44:10 CDT 2010
device-mapper-1.02.53-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010
device-mapper-libs-1.02.53-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010
device-mapper-event-1.02.53-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010
device-mapper-event-libs-1.02.53-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010
cmirror-2.02.72-8.el6_0.4    BUILT: Thu Dec  9 09:46:33 CST 2010


How reproducible:
Everytime

Comment 1 Suzanne Logcher 2011-03-28 20:29:09 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains 
unresolved, it has been rejected as it is not proposed as an 
exception or blocker.  It has been moved to RHEL 6.2 since 
it is a FutureFeature request.

Comment 2 Milan Broz 2011-05-31 09:47:17 UTC
I think this is the same problem as bug #672317 - what should happen when you remove clustered flag from VG with active volumes?

See https://bugzilla.redhat.com/show_bug.cgi?id=672317#c1

Until there is upstream decision what it should perform, cond nack/design.

Comment 3 Jonathan Earl Brassow 2011-06-03 16:47:29 UTC
I don't know if the current behavior is acceptable though...  Would it make sense to simply disallow the 'vgchange -cn <VG>' while there are (non-exclusively) active LVs?  Then at least we could print something sensible to the user:
"Unable to change the cluster status of <VG> while there are active logical volumes>"

Comment 5 Alasdair Kergon 2012-04-24 20:02:12 UTC
(In reply to comment #3)
> I don't know if the current behavior is acceptable though...  Would it make
> sense to simply disallow the 'vgchange -cn <VG>' while there are
> (non-exclusively) active LVs?  Then at least we could print something sensible
> to the user:
> "Unable to change the cluster status of <VG> while there are active logical
> volumes>"

Yes.

Comment 9 Jonathan Earl Brassow 2012-04-25 15:03:29 UTC
To test solution, simply try to change the cluster attribute of a volume group while a mirror, snapshot, or RAID logical volume is active.  It should not be allowed.

(Changing the cluster attribute while linear or stripe LVs are active is harmless.)

Comment 10 Jonathan Earl Brassow 2012-04-25 15:49:00 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Some LVM segment types, like "mirror", have single machine and cluster-aware variants.  Others, like snapshot and the RAID types, have only single machine variants.  When switching the cluster attribute of a volume group, the aforementioned segment types must be inactive.  This allows for the re-loading of the appropriate single machine or cluster variant, or for the necessity of the activation to be exclusive in nature.

Comment 15 Corey Marthaler 2012-05-14 23:21:54 UTC
This is currently *ONLY* fixed for cluster mirrors. Marking FailedQA.

# CLUSTER MIRROR VOLUME
[root@hayes-01 ~]# lvs -a -o +devices
  LV                VG     Attr     LSize   Log         Copy%  Devices                              
  mirror            hayes  mwi-a-m- 100.00m mirror_mlog 100.00 mirror_mimage_0(0),mirror_mimage_1(0)
  [mirror_mimage_0] hayes  iwi-aom- 100.00m                    /dev/etherd/e1.1p1(0)
  [mirror_mimage_1] hayes  iwi-aom- 100.00m                    /dev/etherd/e1.1p2(0)
  [mirror_mlog]     hayes  lwi-aom-   4.00m                    /dev/etherd/e1.1p2(25)

[root@hayes-01 ~]# vgs
  VG         #PV #LV #SN Attr   VSize  VFree
  hayes        2   1   0 wz--nc  8.87t 8.87t

# FIX WORKS
[root@hayes-01 ~]# vgchange -cn hayes
  Mirror logical volumes must be inactive when changing the cluster attribute.

[root@hayes-01 ~]# lvremove hayes
Do you really want to remove active clustered logical volume mirror? [y/n]: y
  Logical volume "mirror" successfully removed


# CLUSTER LINEAR VOLUME
[root@hayes-01 ~]# lvcreate -n lv1 -L 100M hayes
  Logical volume "lv1" created
[root@hayes-01 ~]# lvs -a -o +devices
  LV      VG         Attr     LSize    Devices              
  lv1     hayes      -wi-a--- 100.00m  /dev/etherd/e1.1p1(0)
[root@hayes-01 ~]# vgs
  VG         #PV #LV #SN Attr   VSize  VFree
  hayes        2   1   0 wz--nc  8.87t 8.87t


# FIX DOES NOT WORK
[root@hayes-01 ~]#  vgchange -cn hayes
  Volume group "hayes" successfully changed
[root@hayes-01 ~]# vgchange -an hayes
  0 logical volume(s) in volume group "hayes" now active
[root@hayes-01 ~]# lvs -a -o +devices
  LV    VG     Attr     LSize    Devices              
  lv1   hayes  -wi----- 100.00m  /dev/etherd/e1.1p1(0)


# STILL ACTIVE ON THE OTHER TWO NODES!
[root@hayes-02 ~]# lvs -a -o +devices
  LV    VG     Attr     LSize    Devices              
  lv1   hayes  -wi-a--- 100.00m  /dev/etherd/e1.1p1(0)
[root@hayes-03 ~]# lvs -a -o +devices
  LV    VG     Attr     LSize    Devices              
  lv1   hayes  -wi-a--- 100.00m  /dev/etherd/e1.1p1(0)


2.6.32-269.el6.x86_64
lvm2-2.02.95-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
lvm2-libs-2.02.95-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
lvm2-cluster-2.02.95-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
device-mapper-libs-1.02.74-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
device-mapper-event-1.02.74-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
device-mapper-event-libs-1.02.74-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
cmirror-2.02.95-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012

Comment 16 Alasdair Kergon 2012-05-15 00:56:29 UTC
Looking at the technical note and the revised bug summary, I think this change somehow became restricted to a subset of the original problem.

As usual when we have problems like this, there's more than one issue in play.

1. The clustered and non-clustered in-kernel target types could be different.  If they are, then you cannot switch from one to the other and so any LVs using those types must be inactive on all nodes, including the local node, before -cn or -cy can be used.

2. If using -cn to say that a VG is no longer clustered, it makes no sense for us to allow any of the LVs to remain active on any other node.  Otherwise it would still actually be clustered but we would be encouraging people to pretend it was not!

This bug was originally intended to fix both issues, but it seems that the second issue got lost.

Bug 672317 effectively deals with a 3rd problem viz. updating the clvmd locking state correctly across a -c transition.

Comment 17 Peter Rajnoha 2012-05-15 12:15:57 UTC
(In reply to comment #15)
> This is currently *ONLY* fixed for cluster mirrors. Marking FailedQA.
> 

Yes, the check was intended for mirrors and snapshots only (as comment #9 says) as they could cause other specific problems (not seen with linear ones) if used in cluster environment. 

It's easy to add the same check for linear volumes as well, but that was supposed to be fixed later with other BZs together with a final decision on what should be the correct behaviour (which is covered bug #672317, I think - it already contains discussion about the general problem).

If we want this bug to track these other changes and decisions about what should be the correct functionality, ok, let's move this to 6.4 then...

(...maybe I was a bit mystified with comments #8, #9, #10, anyway...)

Comment 18 Corey Marthaler 2012-05-15 14:38:29 UTC
This should fix all cluster volumes (stripes, etc...), not just linears. 

Let's move this to rhel6.4...

Comment 20 Corey Marthaler 2012-05-16 16:16:32 UTC
bug 822213 has been filed and proposed for 6.4 for the remaining volumes not addressed by this partial fix in 6.3.

This bug will now only deal with cluster mirrors.

Comment 22 Corey Marthaler 2012-05-16 20:22:36 UTC
[root@hayes-01 ~]#  vgchange -cn test
  Mirror logical volumes must be inactive when changing the cluster attribute.

[root@hayes-01 ~]# vgchange -cn test
  Snapshot logical volumes must be inactive when changing the cluster attribute.

Marking verified for cmirrors and exclusively activated volumes with snapshots only. All other volumes will be fixed in rhel6.4 (bug 822213).

2.6.32-269.el6.x86_64
lvm2-2.02.95-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012
lvm2-libs-2.02.95-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012
lvm2-cluster-2.02.95-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012
device-mapper-libs-1.02.74-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012
device-mapper-event-1.02.74-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012
device-mapper-event-libs-1.02.74-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012
cmirror-2.02.95-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012

Comment 24 errata-xmlrpc 2012-06-20 14:51:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0962.html