This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 984054 - ocf:redhat:LVM possible data corruption
ocf:redhat:LVM possible data corruption
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: resource-agents (Show other bugs)
7.0
Unspecified Unspecified
unspecified Severity urgent
: rc
: ---
Assigned To: David Vossel
Cluster QE
: Regression
Depends On:
Blocks: 883874 1080147
  Show dependency treegraph
 
Reported: 2013-07-12 11:42 EDT by michal novacek
Modified: 2014-06-17 19:55 EDT (History)
4 users (show)

See Also:
Fixed In Version: resource-agents-3.9.5-7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 08:16:03 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description michal novacek 2013-07-12 11:42:00 EDT
Description of problem:
It is possible to run the LVM resource agent on two nodes and both will happily
connect vg from shared disk. This will lead to data corruption when both of the
nodes will start writing to that vg. 

Version-Release number of selected component (if applicable):
resource-agents-3.9.5-6.x86_64
RHEL-7.0-20130708.n.0

How reproducible: always

Steps to Reproduce:
1. Have running cluster with shared disk visible on all nodes.
2. run the following commands:
		export OCF_FUNCTIONS_DIR=/usr/lib/ocf/lib/heartbeat 
		export OCF_RESKEY_volgrpname=STSRHTS14339 
		export OCF_RESKEY_exclusive=true
		/usr/lib/ocf/resource.d/heartbeat/LVM start
3. on two different nodes

Actual results: 
vg is connected on both nodes -- it can be checked by listing /dev/<vg>/

Expected results:
VG must not be ever activated on more nodes. Each attempt to activate volume
should be denied if there is already another node in quorate cluster that does
have it activated.

Additional info:
rhel6 guide on how to achieve failover with tagged lvm on rgmanager:
https://url.corp.redhat.com/adf6e03

VG tag must be node specific ('crm_node -n' is used in rhel6) and it should be.
In rhel6 the agent checks whether there is a tag on the vg. If there is one it
checks whether it belongs to another active node in the cluster and if so it
leaves it alone. This check is not possible if all the nodes use the same tag.
Consider active node that activated vg being reset (which does not remove the
tag from vg) -- how would any of the remaining nodes know whether it is still
active or the node that activated it died (and did not deactivate it)?
Comment 3 David Vossel 2013-07-12 12:52:35 EDT
(In reply to michal novacek from comment #0)
> Description of problem:
> It is possible to run the LVM resource agent on two nodes and both will
> happily
> connect vg from shared disk. This will lead to data corruption when both of
> the
> nodes will start writing to that vg. 
> 
> Version-Release number of selected component (if applicable):
> resource-agents-3.9.5-6.x86_64
> RHEL-7.0-20130708.n.0
> 
> How reproducible: always
> 
> Steps to Reproduce:
> 1. Have running cluster with shared disk visible on all nodes.
> 2. run the following commands:
> 		export OCF_FUNCTIONS_DIR=/usr/lib/ocf/lib/heartbeat 
> 		export OCF_RESKEY_volgrpname=STSRHTS14339 
> 		export OCF_RESKEY_exclusive=true
> 		/usr/lib/ocf/resource.d/heartbeat/LVM start
> 3. on two different nodes
> 
> Actual results: 
> vg is connected on both nodes -- it can be checked by listing /dev/<vg>/
> 
> Expected results:
> VG must not be ever activated on more nodes. Each attempt to activate volume
> should be denied if there is already another node in quorate cluster that
> does
> have it activated.
> 
> Additional info:
> rhel6 guide on how to achieve failover with tagged lvm on rgmanager:
> https://url.corp.redhat.com/adf6e03
> 
> VG tag must be node specific ('crm_node -n' is used in rhel6) and it should
> be.
> In rhel6 the agent checks whether there is a tag on the vg. If there is one
> it
> checks whether it belongs to another active node in the cluster and if so it
> leaves it alone. This check is not possible if all the nodes use the same
> tag.
> Consider active node that activated vg being reset (which does not remove the
> tag from vg) -- how would any of the remaining nodes know whether it is still
> active or the node that activated it died (and did not deactivate it)?

Right, if you run the agent manually on two nodes at the same time it will activate on both of them.  This is not how this agent is designed to be used though.

This new LVM agent requires Pacemaker to enforce exclusive activation with tags.

The agent tags the volume group in a way that ensures it can not be activated outside of the agent (specifically on startup). At that point we are trusting Pacemaker to only allow the resource to live on a single node at a time.

-- Vossel
Comment 4 michal novacek 2013-07-15 06:00:22 EDT
I believe that running the agent manually is the same thing as pacemaker does.
Your point is valid only as long as there is only one pacemaker service for one
vg. The moment you make another service for the same vg you will break things.

Consider this scenario: create two ocf:heartbeat:LVM resources pointing at the
same vg having exclusive=true tag and start them. Move one of the resources to
different node. Move the other resource to different node. See VG activated on
two different nodes. (more than one move of the resource might be necessary).

[virt-129 ~]# pcs resource show havg
 Resource: havg (class=ocf provider=heartbeat type=LVM)
  Attributes: volgrpname=STSRHTS14339 exclusive=true 

[virt-129 ~]# pcs resource show havg2
 Resource: havg2 (class=ocf provider=heartbeat type=LVM)
  Attributes: volgrpname=STSRHTS14339 exclusive=true 

[virt-129 ~]# pcs status
Cluster name: STSRHTS14339
Last updated: Mon Jul 15 11:32:50 2013
Last change: Mon Jul 15 11:32:01 2013 via crm_resource on virt-129.
Stack: corosync
Current DC: virt-131. (2) - partition with quorum
Version: 1.1.10-3.el7-d19719c
3 Nodes configured, unknown expected votes
3 Resources configured.

Online: [ virt-129 virt-131 virt-132 ]

Full list of resources:

 virt-fencing   (stonith:fence_xvm):    Started virt-131. 
 havg   (ocf::heartbeat:LVM):   Started virt-132. 
 havg2  (ocf::heartbeat:LVM):   Started virt-131. 

# show availability of ha-lv-0 on three nodes of the cluster
[virt-129 ~]# for a in 129 131 132; do echo -n "virt-$a ha-lv-0"; ssh virt-$a \
'lvdisplay STSRHTS14339/ha-lv-0 | grep "LV Status"'; done
virt-129 ha-lv-0  LV Status              NOT available
virt-131 ha-lv-0  LV Status              available
virt-132 ha-lv-0  LV Status              available

It is possible to mount ha-lv-0 on virt-131 AND virt-132.

This might not be the brightest idea to do but it must not be possible to
corrupt data using the standard procedures.  

I still believe that it is necessary to use node specific tags the way it is
used in RHEL6. With node specific tags it is not possible for the node to
activate vg when another node in that is in a quorate cluster with this node
hold it activated.
Comment 5 David Vossel 2013-07-15 10:38:32 EDT
(In reply to michal novacek from comment #4)
> I believe that running the agent manually is the same thing as pacemaker
> does.
> Your point is valid only as long as there is only one pacemaker service for
> one
> vg. The moment you make another service for the same vg you will break
> things.
> 
> Consider this scenario: create two ocf:heartbeat:LVM resources pointing at
> the
> same vg having exclusive=true tag and start them. Move one of the resources
> to
> different node. Move the other resource to different node. See VG activated
> on
> two different nodes. (more than one move of the resource might be necessary).
> 
> [virt-129 ~]# pcs resource show havg
>  Resource: havg (class=ocf provider=heartbeat type=LVM)
>   Attributes: volgrpname=STSRHTS14339 exclusive=true 
> 
> [virt-129 ~]# pcs resource show havg2
>  Resource: havg2 (class=ocf provider=heartbeat type=LVM)
>   Attributes: volgrpname=STSRHTS14339 exclusive=true 
> 
> [virt-129 ~]# pcs status
> Cluster name: STSRHTS14339
> Last updated: Mon Jul 15 11:32:50 2013
> Last change: Mon Jul 15 11:32:01 2013 via crm_resource on virt-129.
> Stack: corosync
> Current DC: virt-131. (2) - partition with quorum
> Version: 1.1.10-3.el7-d19719c
> 3 Nodes configured, unknown expected votes
> 3 Resources configured.
> 
> Online: [ virt-129 virt-131 virt-132 ]
> 
> Full list of resources:
> 
>  virt-fencing   (stonith:fence_xvm):    Started virt-131. 
>  havg   (ocf::heartbeat:LVM):   Started virt-132. 
>  havg2  (ocf::heartbeat:LVM):   Started virt-131. 
> 
> # show availability of ha-lv-0 on three nodes of the cluster
> [virt-129 ~]# for a in 129 131 132; do echo -n "virt-$a ha-lv-0"; ssh
> virt-$a \
> 'lvdisplay STSRHTS14339/ha-lv-0 | grep "LV Status"'; done
> virt-129 ha-lv-0  LV Status              NOT available
> virt-131 ha-lv-0  LV Status              available
> virt-132 ha-lv-0  LV Status              available
> 
> It is possible to mount ha-lv-0 on virt-131 AND virt-132.
> 
> This might not be the brightest idea to do but it must not be possible to
> corrupt data using the standard procedures.  

Ah yes, user error. Someone could do that even though it is not very smart.

If we are worried about this sort of behavior, I can make the LVM agent validate the CIB has no other LVM resource with the same vgname attribute before allowing the volume group to activate.  Would that work?  The agent already knows how to detect if the resource is a clone and prevent that scenario.

We need to avoid going back to the node name tags.  That implementation is fragile ( see https://bugzilla.redhat.com/show_bug.cgi?id=976443) and the upstream community has already rejected the inclusion of that behavior into the heartbeat LVM agent.

-- Vossel


> I still believe that it is necessary to use node specific tags the way it is
> used in RHEL6. With node specific tags it is not possible for the node to
> activate vg when another node in that is in a quorate cluster with this node
> hold it activated.
Comment 6 Jaroslav Kortus 2013-07-15 11:07:57 EDT
> If we are worried about this sort of behavior, I can make the LVM agent
> validate the CIB has no other LVM resource with the same vgname attribute
> before allowing the volume group to activate.  Would that work?  The agent
> already knows how to detect if the resource is a clone and prevent that
> scenario.
> 
> We need to avoid going back to the node name tags.  That implementation is
> fragile ( see https://bugzilla.redhat.com/show_bug.cgi?id=976443) and the
> upstream community has already rejected the inclusion of that behavior into
> the heartbeat LVM agent.
> 

Will there still be the need of tagging then? To me it seems that having a common tag for all nodes is equal to not having any tag at all. If pacemaker does the uniqness check and enforces (with fencing) that the service is running on one node only, it should be safe.

Any thoughts?
Comment 7 David Vossel 2013-07-15 11:33:51 EDT
(In reply to Jaroslav Kortus from comment #6)
> > If we are worried about this sort of behavior, I can make the LVM agent
> > validate the CIB has no other LVM resource with the same vgname attribute
> > before allowing the volume group to activate.  Would that work?  The agent
> > already knows how to detect if the resource is a clone and prevent that
> > scenario.
> > 
> > We need to avoid going back to the node name tags.  That implementation is
> > fragile ( see https://bugzilla.redhat.com/show_bug.cgi?id=976443) and the
> > upstream community has already rejected the inclusion of that behavior into
> > the heartbeat LVM agent.
> > 
> 
> Will there still be the need of tagging then? To me it seems that having a
> common tag for all nodes is equal to not having any tag at all. If pacemaker
> does the uniqness check and enforces (with fencing) that the service is
> running on one node only, it should be safe.
> 
> Any thoughts?

We have to have the tag. The tag filtering allows us to claim the volume group for the cluster and prevent the volume group from being activated outside of the cluster.

Say for instance you have a volume group resource that needs to be exclusively activated but we didn't use the tag. A node could get fenced, reboot, and activate the volume group before pacemaker starts. At that point the metadata could be corrupted because the volume group was both activated by a node in the cluster, and this node that is about to rejoin the cluster.  The tag filtering prevents this scenario.  It ensures that the only way the volume group can be activated is through the use of the LVM agent.

-- Vossel
Comment 8 michal novacek 2013-07-15 14:55:11 EDT
(In reply to David Vossel from comment #7)
> (In reply to Jaroslav Kortus from comment #6)
> > > If we are worried about this sort of behavior, I can make the LVM agent
> > > validate the CIB has no other LVM resource with the same vgname attribute
> > > before allowing the volume group to activate.  Would that work?  The agent
> > > already knows how to detect if the resource is a clone and prevent that
> > > scenario.

Yes, that would work. Please do see my other suggestion below.

> > ...
> 
> We have to have the tag. The tag filtering allows us to claim the volume
> group for the cluster and prevent the volume group from being activated
> outside of the cluster.
> 
> Say for instance you have a volume group resource that needs to be
> exclusively activated but we didn't use the tag. A node could get fenced,
> reboot, and activate the volume group before pacemaker starts. At that point
> the metadata could be corrupted because the volume group was both activated
> by a node in the cluster, and this node that is about to rejoin the cluster.
> The tag filtering prevents this scenario.  It ensures that the only way the
> volume group can be activated is through the use of the LVM agent.
> 
> -- Vossel

RHEL6 lvm.sh agent required volume_list to contain
root volume and hostname specific tag. This change in lvm.conf must have been
also reflected in init ramdrive by regenerating it.

I think that the good scenario would be each node having it's root lv (if any)
as the only member of the volume_list line. This change and ramdrive recreation
must be done by user prior to run LVM resource agent. The instruction would be
part of the manual and can be taken from RHEL6 cluster administration guide.
This would allow only the node running the LVM agent to activate vg. Before
activation the agent would also check that there is no other VG of the same and
if so it would not activate. If this check is implemented properly we should be
safe. I also do thing that if the scenario looks like this the tags are not
needed -- restriction of volume_list=['root_lv'] would not allow vg activation
other than manual or via the agent.

Note that putting volume_list=[] into lvm.conf would work as long as init
ramdrive is not recreated. After that system will probably not boot so I would
advise to take away that recomendation from help text of the agent. In the help
I would recommend to put root lv only.
Comment 9 David Vossel 2013-07-15 16:15:39 EDT
(In reply to michal novacek from comment #8)
> RHEL6 lvm.sh agent required volume_list to contain
> root volume and hostname specific tag. This change in lvm.conf must have been
> also reflected in init ramdrive by regenerating it.
> 
> I think that the good scenario would be each node having it's root lv (if
> any)
> as the only member of the volume_list line. This change and ramdrive
> recreation
> must be done by user prior to run LVM resource agent. The instruction would
> be
> part of the manual and can be taken from RHEL6 cluster administration guide.
> This would allow only the node running the LVM agent to activate vg. Before
> activation the agent would also check that there is no other VG of the same
> and
> if so it would not activate. If this check is implemented properly we should
> be
> safe.

Yep, that all sounds good.

> I also do think that if the scenario looks like this the tags are not
> needed -- restriction of volume_list=['root_lv'] would not allow vg
> activation

Without tags it is difficult to validate the environment.  Say the user has a list of items in the volume_list, some of them are volume groups some are tags. At that point I believe we'd have to cross reference each tag in the volume_list with any tags associated with the volume group to know if the volume group could be activated or not... If the user used the '@*' entry in the volume_list, then we'd have to parse through all the host tags to verify volume group activation is not possible.

Using the tags gives us something consistent to work with.  We strip the volume group of any tags it did have and explicitly claim it for the cluster.  No surprises, no difficult edge cases to detect.

> other than manual or via the agent.
> 
> Note that putting volume_list=[] into lvm.conf would work as long as init
> ramdrive is not recreated. After that system will probably not boot so I
> would
> advise to take away that recomendation from help text of the agent. In the

I can re-word that. It wasn't meant as a recommendation as much as a minimal requirement, but I can see someone doing that and being very confused after a reboot.

> help
> I would recommend to put root lv only.

Are we good with the following changes?

1. search cib for potential duplicate (non-clone) lvm resources
2. reword the user txt associated with the exclusive activation option in a way that doesn't result in user's unexpectedly running into bootup problems later on.

-- Vossel
Comment 10 michal novacek 2013-07-16 04:10:06 EDT
(In reply to David Vossel from comment #9)

> > I also do think that if the scenario looks like this the tags are not
> > needed -- restriction of volume_list=['root_lv'] would not allow vg
> > activation
> 
> Without tags it is difficult to validate the environment.  Say the user has
> a list of items in the volume_list, some of them are volume groups some are
> tags. At that point I believe we'd have to cross reference each tag in the
> volume_list with any tags associated with the volume group to know if the
> volume group could be activated or not... If the user used the '@*' entry in
> the volume_list, then we'd have to parse through all the host tags to verify
> volume group activation is not possible.

I do not see why you want to do that -- it should be possible to just make a
check that the volume is not active on another quorate node in the cluster. For
that the tag is not necessary. It might be active on node that is not part of
the cluster but there's nothing the tag will help you with. It seems the same
then to have no tag or have the same tag for all nodes. This said I think we
can leave it or remove it and it makes no difference. I would remove it though
:)

> 
> Using the tags gives us something consistent to work with.  We strip the
> volume group of any tags it did have and explicitly claim it for the
> cluster.  No surprises, no difficult edge cases to detect.

> Are we good with the following changes?
> 
> 1. search cib for potential duplicate (non-clone) lvm resources
> 2. reword the user txt associated with the exclusive activation option in a
> way that doesn't result in user's unexpectedly running into bootup problems
> later on.
> 
> -- Vossel

yes, that should do it.
Comment 11 David Vossel 2013-07-16 15:51:17 EDT
A patch has been posted upstream related to this issue.

https://github.com/ClusterLabs/resource-agents/pull/284
Comment 12 michal novacek 2013-07-19 06:59:30 EDT
It seems to solve the problem, thanks for the quick fix David.
Comment 13 michal novacek 2013-11-13 08:37:23 EST
According to our internal pacemaker regression testing new ocf:heartbeat:LVM agent from resource-agents-3.9.5-18 behaves correctly and have the changes from comment 9 implemented.
Comment 14 Ludek Smid 2014-06-13 08:16:03 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.