1336346 – HALVM cluster should not allow metadata changes on nodes where VG is inactive

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1336346 - HALVM cluster should not allow metadata changes on nodes where VG is inactive

Summary: HALVM cluster should not allow metadata changes on nodes where VG is inactive

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	6.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	LVM and device-mapper development team
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-16 08:44 UTC by michal novacek
Modified:	2017-12-06 10:53 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-12-06 10:53:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description michal novacek 2016-05-16 08:44:46 UTC

The intent of this bug is to initiate talk about how/whether we want to resolve
this problem for pacemaker cluster on RHEL6.

Current situation:

We do support cluster activated VG activation using ocf:heartbeat LVM resource
agent (3). How to create such LVM volumes managed by the cluster is referenced
here (1).  We do support up to sixteen node clusters (2).

The exclusive activation is done without proper locking manager (clvm with
exclusive activation is not supported by the heartbeat:LVM2 resource agent (4))
with ocf:heartbeat:LVM resource agent script.

Cluster can manage several VGs that might be activated on different nodes of
the cluster.

The problem is that having direct access to the storage/lvm volumes (regardless
of current activation location), you can change the metadata (5).  This problem
might become more serious now we would support LVM raids.


Preffered solution:
-- system_id (6) edited by lvm resource agent

-----

(1):
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/s1-halvm-tagging-CA.html

(2):
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Cluster_Administration/index.html#s1-clust-config-considerations-CA

(3):
http://linux-ha.org/doc/man-pages/re-ra-LVM.html

(4)
# resource-agents-3.9.5-34.el6.x86_64
# cut -f617-620 -d$'\n' /usr/lib/ocf/resource.d/heartbeat/LVM 
        2)  # exclusive activation with clvmd

                ocf_log err "Exclusive activation via clvmd is not supported by this agent."
                exit $OCF_ERR_CONFIGURED

(5)
[root@host-076 ~]# pcs status
Cluster name: STSRHTS5318
Last updated: Wed May  4 09:39:10 2016          Last change: Tue May  3 12:04:19 2016 by root via crm_resource on host-076
Stack: corosync
Current DC: host-077 (version 1.1.14-11.el7-2cccd43) - partition with quorum
3 nodes and 4 resources configured

Online: [ host-076 host-077 host-078 ]

Full list of resources:

 fence-host-076 (stonith:fence_xvm):    Started host-076
 fence-host-077 (stonith:fence_xvm):    Started host-077
 fence-host-078 (stonith:fence_xvm):    Started host-078
 my_lvm_resource        (ocf::heartbeat:LVM):   Started host-078


# *active* node 078
[root@host-078 ~]# lvs -a -o +devices
  LV                  VG            Attr       LSize   Cpy%Sync Devices
  mirror_1            revolution_9  mwi-a-m---  20.00g 46.50    mirror_1_mimage_0(0),mirror_1_mimage_1(0)
  [mirror_1_mimage_0] revolution_9  Iwi-aom---  20.00g          /dev/sda1(0)
  [mirror_1_mimage_1] revolution_9  Iwi-aom---  20.00g          /dev/sdb1(0)


# *non* active node
[root@host-076 ~]# lvs -a -o +devices
  LV                  VG            Attr       LSize   Cpy%Sync Devices
  mirror_1            revolution_9  mwi---m---  20.00g          mirror_1_mimage_0(0),mirror_1_mimage_1(0)
  [mirror_1_mimage_0] revolution_9  Iwi---m---  20.00g          /dev/sda1(0)
  [mirror_1_mimage_1] revolution_9  Iwi---m---  20.00g          /dev/sdb1(0)

[root@host-076 ~]# lvconvert -m 2 revolution_9/mirror_1
 Conversion starts after activation.

[root@host-076 ~]# lvextend -L +100M revolution_9/mirror_1
  Extending 3 mirror images.
  Size of logical volume revolution_9/mirror_1 changed from 20.00 GiB (5120 extents) to 20.10 GiB (5145 extents).
  Logical volume mirror_1 successfully resized.

[root@host-076 ~]# lvs -a -o +devices
  LV                  VG            Attr       LSize   Cpy%Sync Devices
  mirror_1            revolution_9  mwi---m---  20.10g          mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0)
  [mirror_1_mimage_0] revolution_9  Iwi---m---  20.10g          /dev/sda1(0)
  [mirror_1_mimage_1] revolution_9  Iwi---m---  20.10g          /dev/sdb1(0)
  [mirror_1_mimage_2] revolution_9  Iwi---m---  20.10g          /dev/sdc1(0)

[root@host-076 ~]# lvremove revolution_9/mirror_1
  Logical volume "mirror_1" successfully removed

[root@host-076 ~]# lvs -a -o +devices
  LV   VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices
  p

(6):
http://man7.org/linux/man-pages/man7/lvmsystemid.7.html

Comment 2 David Teigland 2016-05-16 19:25:07 UTC

I would think that device filters should be used with the current scripts to prevent hosts that are not supposed to be using a VG from processing it (even for reading).

With RHEL6 and RHEL7 using system ID is possible, which is similar to filters but a little more user friendly.  Example:

Two hosts A and B with VG foo on shared storage.

host A has system id "hostA", plus "foo-hostA" in lvmlocal.conf/extra_system_ids.

host B has system id "hostB", plus "foo-hostB" in lvmlocal.conf/extra_system_ids.

To fail over the VG from one host to the other, the host taking over the VG forcibly changes the system ID on the VG to match its own.

For example, host A is using VG foo, and then fails, so you want host B to use foo instead.  After you are sure that host A is no longer using foo, you would run this command on host B:

vgchange --config 'local/extra_system_ids=["foo-hostA"]' --systemid foo-hostB foo

This allows host B to take over host A's system ID for this command so that it can change the system ID on the VG to its own.

Comment 3 Zdenek Kabelac 2016-05-17 08:44:30 UTC

I'd consider that 'overtaking' should not impose any metadata change which has various associated costs like updating archives and doing various 'metadata' repair operations...

Device filtering seems cleanest option as it just makes whole device 'invisible'
and the VG does not need to be read & parsed at all.

In general - this leads to  'RHEV' use-case - where every command gets associated --config (and seems to indicated the enhancements on --profile support may simplify usage here).

The interesting case however will be -  the possibility of various 'collisions' (duplicates/missing handling)

Since overtaking VG from hostB to hostA should not probably just 'merge' B devices to list of A devices -   VG should be still manipulated independently as there is no initial 'synchronization' during 'creation' time.

And when hosts  A & B both uses VG with name 'vg' and LV with name 'lv' obviously activation of both volumes  cannot happen on a single host -  as there is just a single DM name space.

(yep - clvmd is 'expensive' - but prevents such disasters from start).

ATM it's unclear to me how the namespace is managed outside of lmv2 and whether the costs of this management aren't higher then direct use of clvmd.

So for the start - how is it 'enforced' each host in this 'HA cluster' creates 'independent' VG and there cannot happen name collision ?

Comment 4 David Teigland 2016-05-17 15:45:06 UTC

The big limitation with using device filters is that you can't change the filter on a failed node to remove the devices.  (Until the failed node comes back, at which point you should probably remove the devices from lvm.conf, but it's obviously a bit late to truely protect the devices.)  So, device filtering is actually not very clean because of the stale lvm.conf settings on a failed node.  Since RHEL5 will not have system ID, filters may be the only option, even though it's flawed.

Using system ID does not have that problem because the VG itself states who owns it.  Changing the metadata on the VG (to a new system ID) is precisely what should be done to do proper failover.  I don't understand the problems you have mentioned for changing the system ID:  what's the problem with updating archives?  why avoid metadata repair?  Remember, the new owner of the VG is the one who is reassigning the VG to itself and writing it.  Overall, system ID seems very well suited for this.

The VG namespace issues you mention are not unique to this bz and have no impact on how failover would work.  But, to summarize, vgcreate will not let you create a VG with an existing name, even if that existing name is used by a foreign VG.  If you do manage to create duplicate VG names (e.g. by using filters or reconnecting devices), then lvm will no longer allow you to modify or activate them until you resolve the names (see man lvm under UNIQUE NAMES.)

Comment 7 David Teigland 2017-06-07 17:37:59 UTC

Shouldn't this bz be moved to RHEL7?

I'd like to have a new system ID based failover script ready for 7.5.  The setup and usage are described below.  I presume that the current tag-based failover script could be used as a starting point for a new script; it shouldn't require much change.


1. enable system ID on each host
--------------------------------

A# lvmconfig --typeconfig current | grep system_id_source
        system_id_source="lvmlocal"


B# lvmconfig --typeconfig current | grep system_id_source
        system_id_source="lvmlocal"

A# cat /etc/lvm/lvmlocal.conf
local {
           system_id = [ "A" ]
}
       

B# cat /etc/lvm/lvmlocal.conf
local {
           system_id = [ "B" ]
}

A# lvm systemid
  system ID: A

B# lvm systemid
  system ID: B

(This example uses an explicitly defined system ID, but other system ID sources can also be used.)


2. create VG to use with failover
---------------------------------

A# vgcreate foo PVs...

A# vgs -o name,systemid foo
foo A

B# vgs -o name,systemid foo
Cannot access VG foo with system ID A with local system ID B.


3. add extra system ID to each host to use with VG failover
-----------------------------------------------------------

Each host that may want to use the VG during failover is given its own extra system ID to use specifically for accessing the VG.

The special system ID used for failover is a combination <vgname>-<systemid> where <systemid> is the system ID of the host that should currently be allowed to use the VG.

A# cat /etc/lvm/lvmlocal.conf
local {
           system_id = [ "A" ]
           extra_system_ids = [ "foo-A" ]
}

B# cat /etc/lvm/lvmlocal.conf
local {
           system_id = [ "B" ]
           extra_system_ids = [ "foo-B" ]
}

A# lvmconfig --typeconfig current | grep extra_system_ids
        extra_system_ids="foo-A"


B# lvmconfig --typeconfig current | grep extra_system_ids
        extra_system_ids="foo-B"


4. change the VG system ID to the special failover system ID
------------------------------------------------------------

A# vgchange --systemid foo-A foo
Set foreign system ID foo-A on volume group foo? [y/n]: y

A# vgs name,systemid foo
foo foo-A

B# pvscan --cache
B# vgs foo
Cannot access VG foo with system ID foo-A with local system ID B.


5. failover the VG
------------------

When host A fails, the failover script run by B will run:

B# pvscan --cache
B# vgchange -y --config 'local/extra_system_ids=["foo-B", "foo-A"]' --systemid foo-B foo


To handle another VG "bar" in addition to foo:
- add system IDs "bar-A" / "bar-B" to lvmlocal.conf extra_system_ids in addition to "foo-A" / "foo-B"
- replace foo with bar in step 5.

Comment 8 David Teigland 2017-06-07 19:31:03 UTC

I've found a couple of lvm failover scripts to see what steps are needed:

https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/LVM
https://sourceware.org/git/?p=lvm2.git;a=blob_plain;f=scripts/VolumeGroup.ocf;hb=HEAD

It looks like the script doesn't necessarily know the previous owner of the VG, so there would need to be a step on the new owner to get that.  The new script using system ID should be based on the following lvm commands:

start $vg
---------

pvscan --cache

$our_systemid = `lvm systemid`

$cur_vg_systemid = `vgs --foreign -o systemid $vg`

$new_vg_systemid = $vg-$our_systemid

vgchange -y --config 'local/extra_system_ids=["$cur_vg_systemid", "$new_vg_systemid"]' --systemid $new_vg_systemid $vg

vgchange -aay $vg


stop $vg
--------

vgchange -an $vg

At this point, the last node to have used the VG would still have access to it because the system ID on the VG has not been changed.  It might make sense to change the system ID on the VG in stop to some value that makes in inacessible to any nodes, e.g. just the vg name itself.


status $vg
----------

$our_systemid = `lvm systemid`
$cur_vg_systemid = `vgs -o systemid $vg`

check that $our_systemid == $cur_vg_systemid

Comment 9 Eric Ren 2017-09-29 12:26:38 UTC

Hi David,

> 
> $our_systemid = `lvm systemid`
> 
> $cur_vg_systemid = `vgs --foreign -o systemid $vg`
> 
> $new_vg_systemid = $vg-$our_systemid
> 
> vgchange -y --config 'local/extra_system_ids=["$cur_vg_systemid",
> "$new_vg_systemid"]' --systemid $new_vg_systemid $vg
> 

I get a little confused on when to use "systemid" or "extra systemid".

Here, $cur_vg_systemid is systemid of the current ower.

But, at 5th step in comment#7, it uses extra systemid:
===
B# pvscan --cache
B# vgchange -y --config 'local/extra_system_ids=["foo-B", "foo-A"]' --systemid foo-B foo
===

Could you explain a bit why we need extra systemid? Thanks :)

Comment 10 David Teigland 2017-09-29 14:18:32 UTC

In the example with two nodes (A, B) and one VG (foo), there are four different system IDs being used:

system ID "A" is the normal one used by node A, eg for it's local VGs.
system ID "B" is the normal one used by node B, eg for it's local VGs.

These normal system IDs for the nodes will usually be set in lvmlocal.conf local/system_id="..."

system ID "foo-A" is a special system ID used by node A with VG foo.
system ID "foo-B" is a special system ID used by node B with VG foo.

The system ID given to VG foo is always either "foo-A" or "foo-B".

node A with system ID "A" could normally only access VGs with system ID "A",
but in this case, VG foo has system ID "foo-A". So, node A has to be given the extra system ID "foo-A", which then allows it to access VGs with system IDs "A" or "foo-A".
Same for node B.

Now consider A taking the VG from B. The VG currently has "foo-B", and node A needs to change the VG system ID to "foo-A". In this case, A needs to temporarily give itself system ID "foo-B" to gain access to the VG to change the system ID. So, for this one command, A has three working system IDs: "A", "foo-A" and "foo-B".

Back to your question, $cur_vg_systemid is never the system ID of either host, it is always either "foo-A" or "foo-B".

All of the VG-name-based system IDs are kept in the extra_system_ids list.

If there was a second VG named "bar", then
A would have system ID "A" and extra_system_ids=[ foo-A, bar-A ]
B would have system ID "B" and extra_system_ids=[ foo-B, bar-B ]

If A needs to take VG bar from B, then A would run:
vgchange --systemid bar-A --config 'local/extra_system_ids=["bar-A", "bar-B"]' bar

Comment 11 Eric Ren 2017-10-11 05:12:43 UTC

Hi David,

Thanks for you explanation! Ideally/technically, extra systemid still seems not a must to me. Each VG has both systemid and vgname fields in metadata, which can provides sufficient information to do exclusive activation. But, extra systemid seems much more user-friendly, doing things in an explicit/straight way.


I have an new issue blow, hope you can help me.

1. basic information

# lvm version
  LVM version:     2.02.175(2) (2017-09-13)
  Library version: 1.03.01 (2017-06-28)
  Driver version:  4.35.0

Two nodes cluster: tw1, tw2

VG on shared disk: vgtest3

tw1:~ # lvmconfig --typeconfig current | grep extra_system_ids
	extra_system_ids="vgtest3-tw1"
tw2:~ # lvmconfig --typeconfig current | grep extra_system_ids
	extra_system_ids="vgtest3-tw2"

tw2:~ # vgs -o+systemid vgtest3
	VG      #PV #LV #SN Attr   VSize VFree System ID  
	vgtest3   1   1   0 wz--n- 4.65g 3.65g vgtest3-tw2

2. problem: vgtest3 cannot be activated on tw2, even though its systemid "vgtest3-tw2" is in tw2's extra_system_ids list.

tw2:~ # lvchange -vv -ay vgtest3/lv1
...
	    Setting local/extra_system_ids to extra_system_ids = [ "vgtest3-tw2" ]
	    Adding vgtest3/lv1 to the list of LVs to be processed.
	    Processing LV lv1 in VG vgtest3.
	Cannot activate LVs in a foreign VG.
	    Unlocking /run/lvm/lock/V_vgtest3
	    global/notify_dbus not found in config: defaulting to 1


3. workaround: it works if I vgchange the systemid to "tw2"
tw2:~ # lvm systemid
	system ID: tw2
tw2:~ # vgchange --systemid tw2 vgtest3
	Volume group "vgtest3" successfully changed
tw2:~ # lvchange -ay vgtest3/lv1
tw2:~ # lvs vgtest3/lv1
	LV   VG      Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
	lv1  vgtest3 -wi-a----- 1.00g

Am I missing something? Thanks in advance:)

Comment 12 Eric Ren 2017-10-11 05:23:25 UTC

It still fails to activate the systemid VG, though I explicitly change the systemid to the "VG-name-systemid" extra systemid.

tw2:~ # lvs vgtest3/lv1
	LV   VG      Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
	lv1  vgtest3 -wi-a----- 1.00g                                                    
tw2:~ # vgchange --systemid vgtest3-tw2 vgtest3
	Logical Volumes in VG vgtest3 must be deactivated before system ID can be changed.
tw2:~ # lvchange -an vgtest3/lv1
tw2:~ # vgchange --systemid vgtest3-tw2 vgtest3
	WARNING: Requested system ID vgtest3-tw2 does not match local system ID tw2.
	WARNING: Volume group vgtest3 might become inaccessible from this machine.
Set foreign system ID vgtest3-tw2 on volume group vgtest3? [y/n]: y
	Volume group "vgtest3" successfully changed
tw2:~ # lvchange -ay vgtest3/lv1
	Cannot activate LVs in a foreign VG.

Comment 13 David Teigland 2017-10-11 15:23:18 UTC

Hi Eric, I get the same error, sorry about that.  I thought the code allowed activation using an extra system ID, and I thought I had tried all of this manually.  Maybe I had hacked the code when I tried it.

But your comments have made me go back and think more about just using the standard system IDs directly on the VGs, and I think it will work fine and will be simpler and clearer.  I'm not sure why I originally suggested special system IDs including the VG name... maybe I was thinking the other system ID would temporarily exist in lvmlocal.conf (that would be bad) rather than just added it to the vgchange command line.  Do you see any problem with just using the standard system IDs for this?  That's what I would try at this point.

Comment 14 Eric Ren 2017-10-12 06:46:24 UTC

(In reply to David Teigland from comment #13)
> Hi Eric, I get the same error, sorry about that.

No problem :)

> Do you see any problem with just using the standard system IDs for this?

I've tried this way today. It works well. So far, I've worked out a working version of LVM-activation RA for all 4 different activation mode. I can send out the patches for review today. Thanks~

Comment 15 Jan Kurik 2017-12-06 10:53:53 UTC

Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/

Note You need to log in before you can comment on or make changes to this bug.