Bug 1118358

Summary: Regression in resource-agents
Product: Red Hat Enterprise Linux 6 Reporter: Josef Zimek <pzimek>
Component: resource-agentsAssignee: David Vossel <dvossel>
Status: CLOSED DUPLICATE QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.7CC: agk, cluster-maint, fdinitto
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-07-14 15:20:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Josef Zimek 2014-07-10 13:59:24 UTC
Description of problem:

Cluster service ends up in failed state instead of relocating b/c it is unable to unmount fs (b/c it is used by app) so lvm is unable to deactivate vg. Customer test it w/ fuser directive in /usr/share/cluster/utils/lib-fs.sh and it works but the directive was removed in official release:

* Thu Oct 3 2013 David Vossel <dvossel> - 3.9.2-38
 - Removes usage of fuser -kvm from fs-lib.sh based agents.
   This resolves issue with fuser blocking netfs mounts.

commit 3f4438cad1d6ccaa4721320d1a3bf42637f1cfe3
Author: David Vossel <dvossel>
Date:   Thu Oct 3 10:11:35 2013 -0500

    - fs-lib.sh, Removes usage of fuser -kvm from fs-lib.sh based agents.
    - tomcat-6.sh, Do not fail on stop if config validation fails.
    - tomcat-6.sh, Set tomcat usr correctly.
    
    Resolves: rhbz#981717
    Resolves: rhbz#983273
    Resolves: rhbz#1014298


The fuser directive was removed from file /usr/share/cluster/utils/fs-lib.sh:

 do_force_unmount() {
         # Proceed with fuser -kvm...
         return 1



The result is that Cluster fail to umount the LVM and relocation act fail to switch resource between nodes. Following some CMAN & logs output :

root@node1]# clusvcadm -r ORAUBP-SG -m 
node1 Trying to relocate service:ORAUBP-SG to node2...Aborted; service failed

Jun 28 10:27:18 rgmanager [fs] unmounting /oracle/UBP/sapdata3
Jun 28 10:27:18 rgmanager [fs] unmounting /oracle/UBP/sapdata2
Jun 28 10:27:19 rgmanager [fs] unmounting /oracle/UBP/sapdata1
Jun 28 10:27:19 rgmanager [fs] umount failed: 1
Jun 28 10:27:19 rgmanager [fs] Sending SIGTERM to processes on /oracle/UBP/sapdata1
Jun 28 10:27:24 rgmanager [fs] unmounting /oracle/UBP/sapdata1
Jun 28 10:27:25 rgmanager [fs] umount failed: 1
Jun 28 10:27:25 rgmanager [fs] Sending SIGKILL to processes on /oracle/UBP/sapdata1
Jun 28 10:27:30 rgmanager [fs] unmounting /oracle/UBP/sapdata1
Jun 28 10:27:30 rgmanager [fs] umount failed: 1
Jun 28 10:27:30 rgmanager [fs] Sending SIGKILL to processes on /oracle/UBP/sapdata1
Jun 28 10:27:30 rgmanager [fs] 'umount /oracle/UBP/sapdata1' failed, error=1
Jun 28 10:27:30 rgmanager stop on fs "lvOraUBPData1" returned 1 (generic error)
Jun 28 10:27:31 rgmanager [fs] unmounting /oracle/UBP/sapreorg
Jun 28 10:27:31 rgmanager [fs] unmounting /oracle/UBP/oraarch
Jun 28 10:27:31 rgmanager [fs] unmounting /oracle/UBP/mirrlogB
Jun 28 10:27:32 rgmanager [fs] unmounting /oracle/UBP/mirrlogA
Jun 28 10:27:32 rgmanager [fs] unmounting /oracle/UBP/origlogB
Jun 28 10:27:32 rgmanager [fs] unmounting /oracle/UBP/origlogA
Jun 28 10:27:33 rgmanager [fs] unmounting /oracle/UBP/11203
Jun 28 10:27:33 rgmanager [fs] umount failed: 1
Jun 28 10:27:33 rgmanager [fs] Sending SIGTERM to processes on /oracle/UBP/11203
Jun 28 10:27:38 rgmanager [fs] unmounting /oracle/UBP/11203
Jun 28 10:27:38 rgmanager [fs] unmounting /oracle/UBP
Jun 28 10:27:39 rgmanager [fs] umount failed: 1
Jun 28 10:27:39 rgmanager [fs] Sending SIGTERM to processes on /oracle/UBP
Jun 28 10:27:44 rgmanager [fs] unmounting /oracle/UBP
Jun 28 10:27:44 rgmanager [fs] umount failed: 1
Jun 28 10:27:44 rgmanager [fs] Sending SIGKILL to processes on /oracle/UBP
Jun 28 10:27:49 rgmanager [fs] unmounting /oracle/UBP
Jun 28 10:27:49 rgmanager [fs] umount failed: 1
Jun 28 10:27:50 rgmanager [fs] Sending SIGKILL to processes on /oracle/UBP
Jun 28 10:27:50 rgmanager [fs] 'umount /oracle/UBP' failed, error=1
Jun 28 10:27:50 rgmanager stop on fs "lvOraUBP" returned 1 (generic error)
Jun 28 10:27:52 rgmanager stop on lvm "lvOraUBPData1" returned 1 (generic error)
Jun 28 10:27:55 rgmanager stop on lvm "lvOraUBP" returned 1 (generic error)
Jun 28 10:27:55 rgmanager [ip] Removing IPv4 address 1.2.3.4/24 from bond1
Jun 28 10:28:05 rgmanager [ip] Removing IPv4 address 1.2.3.5/26 from bond0.1
Jun 28 10:28:15 rgmanager #12: RG service:ORAUBP-SG failed to stop; intervention required
Jun 28 10:28:15 rgmanager Service service:ORAUBP-SG is failed



Version-Release number of selected component (if applicable):

tested w/ resource-agents-3.9.2-40.el6.x86_64
however the change was introduced in resource-agents-3.9.2-38.el6.x86_64


How reproducible:
always

Steps to Reproduce:
1.relocate clustered service using lib-fs.sh and LVM
2.service fails to unmount the fs and VG fails to deactivate
3.resulting in service going into failed state

If the self_fence is enabled the node is fenced and service relocates but this happens also during manual relocation and customer is not very happy about this behaviour.

Actual results:
node is self_fenced during service relocation

Expected results:
service relocated without self_fence kicking in


Additional info:

Comment 2 David Vossel 2014-07-10 14:34:10 UTC
It is very likely that this issue is resolved already in the pending 6.6 release as a result of the fixes for these two issues.

https://bugzilla.redhat.com/show_bug.cgi?id=1051115
https://bugzilla.redhat.com/show_bug.cgi?id=1089004

Comment 3 David Vossel 2014-07-14 15:20:00 UTC

*** This bug has been marked as a duplicate of bug 1089004 ***