RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1531465 - HA LVM duplicate activation
Summary: HA LVM duplicate activation
Keywords:
Status: CLOSED DUPLICATE of bug 1486888
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: resource-agents
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Oyvind Albrigtsen
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-05 08:23 UTC by Christoph
Modified: 2018-01-08 14:45 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-08 14:43:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1454699 0 unspecified CLOSED LVM resource agent does not detect multipath with all paths failed 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1507013 0 high CLOSED Pacemaker LVM monitor causing service restarts due to flock() delays in vgscan/vgs commands 2021-08-19 07:32:33 UTC

Description Christoph 2018-01-05 08:23:35 UTC
---++ Description of problem:

we have ha lvm volume with clvmd locking (not tagged). Every time a node joins
the cluster the volume cluster resource needs to run a recovery. Resulting in
restarting all dependent resources


---++ Version-Release number of selected component (if applicable):

resource-agents-3.9.5-105.el7_4.3.x86_64


---++ How reproducible:

always


---++ Steps to Reproduce:
1. setup cluster

2. add dlm and clvmd resources

      pcs resource create dlm ocf:pacemaker:controld clone on-fail=fence interleave=true ordered=true
      pcs resource create clvmd ocf:heartbeat:clvm clone on-fail=fence interleave=true ordered=true
      pcs constraint order start dlm-clone then clvmd-clone

3. add a ha lvm resource

      vgcreate -Ay -cy vg_data /dev/mapper/mpath-data
      pcs resource create vg_data ocf:heartbeat:LVM exclusive=yes volgrpname="vg_data"
      pcs constraint order start clvmd-clone then vg_data

4. reboot a node, cluster status reports

   * vg_data_monitor_0 on wsl007 'unknown error' (1): call=49, status=complete,
     exitreason='LVM Volume vg_data is not available',

5. check logfiles, resources are restarted on nodeA

   notice: Initiating monitor operation vg_data_monitor_0 on nodeB
   warning: Action 15 (vg_data_monitor_0) on nodeB failed (target: 7 vs. rc: 1): Error
   warning: Processing failed op monitor for vg_data on nodeB: unknown error (1)
   error: Resource vg_data (ocf::LVM) is active on 2 nodes attempting recovery
   notice:  * Start      clvmd:1          ( nodeB )
   notice:  * Recover    vg_data          ( nodeA )
   notice:  * Restart    fs_data          ( nodeA )   due to required vg_data start

---++ Actual results:

Node B joins the cluster. State of all resources on Node B are discovered.
Clvmd is not running, all clustered volume groups are not available
For this reason the LVM resource agent returns with (LVM_status, line 348-349)

                ocf_exit_reason "LVM Volume $1 is not available"
                return $OCF_ERR_GENERIC

This is not the expected result for pacemaker (wanted OCF_NOT_RUNNING).
Pacemaker detects duplicate resource activation and starts a recovery.

All dependent resources are stopped. The volume is stopped on all nodes.
Volume is started on one node and then all dependent resources are started again


---++ Expected results:

detect that volume is not available and therefore not running. No recovery.


---++ Additional info:

see bug BZ#1454699 which introduced a patch adding vgscan to the resource
agent. Upstream has removed all LVM commands from monitor. See BZ#1507013 which
also has an issue with this patch.

Using clvmd as we also have clusters using gfs2 where we need clvmd/dlm and it is
simpler to run a similar setup for HA-LVM

We have systems running an NFS server on the HA-LVM, resulting in lock recovery
grace period and a 90 second block.

We have java services running on the HA-LVM. Startup time of minutes.

Comment 2 John Ruemker 2018-01-08 14:43:32 UTC
Hello,
Thank you for reporting this.  This issue is being investigated and a solution pursued in Bug #1486888, so I am marking this a duplicate.  

That bug is Red Hat-internal, but if you need assistance, or would like ongoing updates about the state of this investigation, please feel free to open a case with Red Hat Support at https://access.redhat.com and we can help you there. 

Thanks!
John Ruemker
Principal Software Maintenance Engineer

*** This bug has been marked as a duplicate of bug 1486888 ***

Comment 3 Christoph 2018-01-08 14:45:42 UTC
the bug is private. can i get access (or a subscription) to bug #1486888


Note You need to log in before you can comment on or make changes to this bug.