Bug 1531465
| Summary: | HA LVM duplicate activation | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Christoph <c.handel> |
| Component: | resource-agents | Assignee: | Oyvind Albrigtsen <oalbrigt> |
| Status: | CLOSED DUPLICATE | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.4 | CC: | agk, c.handel, cluster-maint, fdinitto, jruemker |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-01-08 14:43:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Hello, Thank you for reporting this. This issue is being investigated and a solution pursued in Bug #1486888, so I am marking this a duplicate. That bug is Red Hat-internal, but if you need assistance, or would like ongoing updates about the state of this investigation, please feel free to open a case with Red Hat Support at https://access.redhat.com and we can help you there. Thanks! John Ruemker Principal Software Maintenance Engineer *** This bug has been marked as a duplicate of bug 1486888 *** the bug is private. can i get access (or a subscription) to bug #1486888 |
---++ Description of problem: we have ha lvm volume with clvmd locking (not tagged). Every time a node joins the cluster the volume cluster resource needs to run a recovery. Resulting in restarting all dependent resources ---++ Version-Release number of selected component (if applicable): resource-agents-3.9.5-105.el7_4.3.x86_64 ---++ How reproducible: always ---++ Steps to Reproduce: 1. setup cluster 2. add dlm and clvmd resources pcs resource create dlm ocf:pacemaker:controld clone on-fail=fence interleave=true ordered=true pcs resource create clvmd ocf:heartbeat:clvm clone on-fail=fence interleave=true ordered=true pcs constraint order start dlm-clone then clvmd-clone 3. add a ha lvm resource vgcreate -Ay -cy vg_data /dev/mapper/mpath-data pcs resource create vg_data ocf:heartbeat:LVM exclusive=yes volgrpname="vg_data" pcs constraint order start clvmd-clone then vg_data 4. reboot a node, cluster status reports * vg_data_monitor_0 on wsl007 'unknown error' (1): call=49, status=complete, exitreason='LVM Volume vg_data is not available', 5. check logfiles, resources are restarted on nodeA notice: Initiating monitor operation vg_data_monitor_0 on nodeB warning: Action 15 (vg_data_monitor_0) on nodeB failed (target: 7 vs. rc: 1): Error warning: Processing failed op monitor for vg_data on nodeB: unknown error (1) error: Resource vg_data (ocf::LVM) is active on 2 nodes attempting recovery notice: * Start clvmd:1 ( nodeB ) notice: * Recover vg_data ( nodeA ) notice: * Restart fs_data ( nodeA ) due to required vg_data start ---++ Actual results: Node B joins the cluster. State of all resources on Node B are discovered. Clvmd is not running, all clustered volume groups are not available For this reason the LVM resource agent returns with (LVM_status, line 348-349) ocf_exit_reason "LVM Volume $1 is not available" return $OCF_ERR_GENERIC This is not the expected result for pacemaker (wanted OCF_NOT_RUNNING). Pacemaker detects duplicate resource activation and starts a recovery. All dependent resources are stopped. The volume is stopped on all nodes. Volume is started on one node and then all dependent resources are started again ---++ Expected results: detect that volume is not available and therefore not running. No recovery. ---++ Additional info: see bug BZ#1454699 which introduced a patch adding vgscan to the resource agent. Upstream has removed all LVM commands from monitor. See BZ#1507013 which also has an issue with this patch. Using clvmd as we also have clusters using gfs2 where we need clvmd/dlm and it is simpler to run a similar setup for HA-LVM We have systems running an NFS server on the HA-LVM, resulting in lock recovery grace period and a 90 second block. We have java services running on the HA-LVM. Startup time of minutes.