Bug 191961

Summary: clustat segfault when node is fenced
Product: [Retired] Red Hat Cluster Suite Reporter: Lenny Maiorani <lenny>
Component: rgmanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint, henry.harris
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007:149 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-06-21 16:13:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lenny Maiorani 2006-05-16 16:09:15 UTC
Description of problem:
At nearly the same time as node3 was fenced, clustat was being run on node1 and
segfault'd. 


Version-Release number of selected component (if applicable):
1.9.46-1.3speed


Actual results:
May 15 19:18:45 sqaone01 kernel: CMAN: removing node sqaone03 from the cluster :
Missed too many heartbeats
May 15 19:18:45 sqaone01 kernel: clustat[22394]: segfault at 000000000000002a
rip 0000003765bb1463 rsp 0000007fbffffa90 error 4
May 15 19:18:46 sqaone01 fenced: sqaone03 not a cluster member after 0 sec
post_fail_delay
May 15 19:18:46 sqaone01 fenced: fencing node "sqaone03"
May 15 19:18:46 sqaone01 fenced: fence "sqaone03" success


Expected results:
no segfault

Comment 1 Lon Hohberger 2006-05-16 19:55:34 UTC
Did ccsd die as well?

Comment 2 Lenny Maiorani 2006-05-17 15:06:44 UTC
Nothing else died. The node was still running OK after the segfault.

Comment 3 Lon Hohberger 2006-05-17 15:37:06 UTC
Thank you -- I will keep looking; so far, I have not been able to reporduce it,
so I think it is a timing issue of some sort (eg - getting a member list while
cman is handling the transition).

Comment 4 Lenny Maiorani 2006-06-07 22:24:52 UTC
I think the new rgmanager (now using 1.9.46-1.4.2x) has fixed this.