1531465 – HA LVM duplicate activation

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1531465 - HA LVM duplicate activation

Summary: HA LVM duplicate activation

Keywords:
Status:	CLOSED DUPLICATE of bug 1486888
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	resource-agents
Sub Component:
Version:	7.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Oyvind Albrigtsen
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-01-05 08:23 UTC by Christoph
Modified:	2018-01-08 14:45 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-01-08 14:43:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1454699	0	unspecified	CLOSED	LVM resource agent does not detect multipath with all paths failed	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1507013	0	high	CLOSED	Pacemaker LVM monitor causing service restarts due to flock() delays in vgscan/vgs commands	2021-08-19 07:32:33 UTC

Description Christoph 2018-01-05 08:23:35 UTC

---++ Description of problem:

we have ha lvm volume with clvmd locking (not tagged). Every time a node joins
the cluster the volume cluster resource needs to run a recovery. Resulting in
restarting all dependent resources

---++ Version-Release number of selected component (if applicable):

resource-agents-3.9.5-105.el7_4.3.x86_64

---++ How reproducible:

always

---++ Steps to Reproduce:
1. setup cluster

2. add dlm and clvmd resources

pcs resource create dlm ocf:pacemaker:controld clone on-fail=fence interleave=true ordered=true
pcs resource create clvmd ocf:heartbeat:clvm clone on-fail=fence interleave=true ordered=true
pcs constraint order start dlm-clone then clvmd-clone

3. add a ha lvm resource

vgcreate -Ay -cy vg_data /dev/mapper/mpath-data
pcs resource create vg_data ocf:heartbeat:LVM exclusive=yes volgrpname="vg_data"
pcs constraint order start clvmd-clone then vg_data

4. reboot a node, cluster status reports

* vg_data_monitor_0 on wsl007 'unknown error' (1): call=49, status=complete,
exitreason='LVM Volume vg_data is not available',

5. check logfiles, resources are restarted on nodeA

notice: Initiating monitor operation vg_data_monitor_0 on nodeB
warning: Action 15 (vg_data_monitor_0) on nodeB failed (target: 7 vs. rc: 1): Error
warning: Processing failed op monitor for vg_data on nodeB: unknown error (1)
error: Resource vg_data (ocf::LVM) is active on 2 nodes attempting recovery
notice: * Start clvmd:1 ( nodeB )
notice: * Recover vg_data ( nodeA )
notice: * Restart fs_data ( nodeA ) due to required vg_data start

---++ Actual results:

Node B joins the cluster. State of all resources on Node B are discovered.
Clvmd is not running, all clustered volume groups are not available
For this reason the LVM resource agent returns with (LVM_status, line 348-349)

ocf_exit_reason "LVM Volume $1 is not available"
return $OCF_ERR_GENERIC

This is not the expected result for pacemaker (wanted OCF_NOT_RUNNING).
Pacemaker detects duplicate resource activation and starts a recovery.

All dependent resources are stopped. The volume is stopped on all nodes.
Volume is started on one node and then all dependent resources are started again

---++ Expected results:

detect that volume is not available and therefore not running. No recovery.

---++ Additional info:

see bug BZ#1454699 which introduced a patch adding vgscan to the resource
agent. Upstream has removed all LVM commands from monitor. See BZ#1507013 which
also has an issue with this patch.

Using clvmd as we also have clusters using gfs2 where we need clvmd/dlm and it is
simpler to run a similar setup for HA-LVM

We have systems running an NFS server on the HA-LVM, resulting in lock recovery
grace period and a 90 second block.

We have java services running on the HA-LVM. Startup time of minutes.

Comment 2 John Ruemker 2018-01-08 14:43:32 UTC

Hello,
Thank you for reporting this.  This issue is being investigated and a solution pursued in Bug #1486888, so I am marking this a duplicate.  

That bug is Red Hat-internal, but if you need assistance, or would like ongoing updates about the state of this investigation, please feel free to open a case with Red Hat Support at https://access.redhat.com and we can help you there. 

Thanks!
John Ruemker
Principal Software Maintenance Engineer

*** This bug has been marked as a duplicate of bug 1486888 ***

Comment 3 Christoph 2018-01-08 14:45:42 UTC

the bug is private. can i get access (or a subscription) to bug #1486888

Note You need to log in before you can comment on or make changes to this bug.