Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 769730

Summary:

rgmanager uncleanly exits on cman shutdown

Product:

Red Hat Enterprise Linux 5

Reporter:

Adam Drew <adrew>

Component:

rgmanager

Assignee:

Ryan McCabe <rmccabe>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

low

Docs Contact:

Priority:

medium

Version:

5.7

CC:

ahecox, cluster-maint, djansa, edamato, fdinitto, jwest, mjuricek, mkelly

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

rgmanager-2.0.52-34.el5

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-01-08 07:05:13 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

807971

Attachments:

Description	Flags
Patch to resolve this issue	none

Description Adam Drew 2011-12-21 22:56:48 UTC

Description of problem:
rgmanager uncleanly shuts down on cman exit, leaving the rgmanager dlm group around, making it impossible to finish shutting down rgmanager and cman:

[root@node2 systemtap]# cman_tool services
type             level name       id       state       
fence            0     default    00010002 none        
[2]
dlm              1     rgmanager  00070001 none        
[1 2]
[root@node2 systemtap]# service cman stop
Stopping cluster: 
   Stopping fencing... done
   Stopping cman... failed
/usr/sbin/cman_tool: Error leaving cluster: Device or resource busy
                                                           [FAILED]
[root@node2 systemtap]# service rgmanager status
clurgmgrd dead but pid file exists
[root@node2 systemtap]# cman_tool services
type             level name       id       state       
dlm              1     rgmanager  00070001 none        
[1 2]

Version-Release number of selected component (if applicable):
rgmanager-2.0.52-21.el5

Comment 1 Adam Drew 2011-12-21 22:59:43 UTC

Workaround for this issue is to check cman_tool services before the cman stop and first stop everything in that list. Then cman stop will work fine.

Comment 2 Adam Drew 2011-12-21 23:01:44 UTC

Created attachment 549095 [details]
Patch to resolve this issue

Attached Lon's patch that should resolve this.

Comment 4 Adam Drew 2012-01-03 21:14:38 UTC

I tested this patch and I don't think it does exactly what we want:

[root@node2 ~]# clustat
Cluster Status for adrew-test @ Tue Jan  3 16:11:51 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 node1.adrew.net                                                     1 Online
 node2.adrew.net                                                     2 Online, Local, rgmanager

 Service Name                                                     Owner (Last)                                                     State         
 ------- ----                                                     ----- ------                                                     -----         
 service:script-test                                              node2.adrew.net                                                  started       
[root@node2 ~]# cman_tool services
type             level name       id       state       
fence            0     default    00010002 none        
[2]
dlm              1     rgmanager  00070001 none        
[1 2]
[root@node2 ~]# service cman stop
Stopping cluster: 
   Stopping fencing... done
   Stopping cman... failed
/usr/sbin/cman_tool: Error leaving cluster: Device or resource busy
                                                           [FAILED]
[root@node2 ~]# clustat
msg_receive: Broken pipe
msg_receive_simple: Broken pipe
Cluster Status for adrew-test @ Tue Jan  3 16:12:13 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 node1.adrew.net                                                     1 Online
 node2.adrew.net                                                     2 Online, Local

[root@node2 ~]# service cman stop
Stopping cluster: 
   Stopping fencing... done
   Stopping cman... failed
/usr/sbin/cman_tool: Error leaving cluster: Device or resource busy
                                                           [FAILED]
[root@node2 ~]# cman_tool services
type             level name       id       state       
dlm              1     rgmanager  00070001 none        
[1 2]

So the stop order when we stop cman is:
fence
cman
everything else

We succesfully stop fence, but then cman stop fails. That leaves us in a similar (but new) bad position.

I think we may need a patch in cman instead of rgmanager.

Comment 6 Lon Hohberger 2012-01-17 16:18:15 UTC

The risk to making rgmanager not shut down when cman asks (currently, it halts and exits uncleanly) is the stop ordering.  Perhaps this is what you meant.

Today:

- fenced exits
- we ask cman to leave
  - rgmanager halts, allowing cman to leave
  - dlm stops cman from leaving (active lockspaces)

At this point, rgmanager is dead - it has halted services, so if the cluster node hangs, there is minimal risk.

With patch:

- fenced exits
  - we ask cman to leave
  - rgmanager refuses

At this point, rgmanager is alive and services are running.  This is problematic -- because 'fenced' has exited, which means the node will not be fenced if it hangs.

Comment 7 RHEL Program Management 2012-04-02 10:41:28 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 9 Ryan McCabe 2012-06-15 14:49:01 UTC

It might be nice to additionally fix fence_tool leave to fail (as it does on RHEL6) when there are active lockspaces.

Comment 10 Ryan McCabe 2012-07-11 16:42:25 UTC

Patch applied in RHEL59 cluster.git commit 55710722d15be8f2eafdae472086182f88b2a0d5

Comment 17 Fabio Massimo Di Nitto 2012-07-27 19:32:47 UTC

(In reply to comment #9)
> It might be nice to additionally fix fence_tool leave to fail (as it does on
> RHEL6) when there are active lockspaces.

Right, but we still need a fix in stable32 branch and rhel6 because nobody says a user can't issue cman_tool leave. All deamons are expected to behave and respond correctly to the request beside how you got to that point/request.

Comment 22 Ryan McCabe 2012-10-09 16:27:20 UTC

*** Bug 697656 has been marked as a duplicate of this bug. ***

Comment 24 errata-xmlrpc 2013-01-08 07:05:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0026.html