Bug 494977 - segfault in check_rdomain_crash() during failover
segfault in check_rdomain_crash() during failover
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager (Show other bugs)
5.3
All Linux
low Severity high
: rc
: ---
Assigned To: Lon Hohberger
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-08 17:11 EDT by Lon Hohberger
Modified: 2010-10-23 04:53 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 07:05:06 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch #1 as attachment. (768 bytes, patch)
2009-04-08 17:18 EDT, Lon Hohberger
no flags Details | Diff
Patch #2 as attachment. (887 bytes, patch)
2009-04-08 17:18 EDT, Lon Hohberger
no flags Details | Diff

  None (edit)
Description Lon Hohberger 2009-04-08 17:11:57 EDT
Description of problem: rgmanager-

If using rgmanager with a restricted failover domain where some of the nodes of the domain are offline during a failover event, rgmanager can crash with signal-11.

Program terminated with signal 11, Segmentation fault.
#0  0x000000000042c1b1 in s_intersection (left=0xff44020, ll=3, right=0x0, 
    rl=1, ret=0x41dd5eb8, retl=0x41dd5eb4) at sets.c:135
135				if (left[l] != right[r])
(gdb) p right
$1 = (set_type_t *) 0x0
(gdb) p r
$2 = 0
(gdb) p rl
$3 = 1
(gdb) list check_rdomain_crash
427	}
428	
429	
430	int
431	check_rdomain_crash(char *svcName)
432	{
433		int *nodes = NULL, nodecount;
434		int *fd_nodes = NULL, fd_nodecount, fl;
435		int *isect = NULL, icount;
436		char fd_name[256];
(gdb) bt
#0  0x000000000042c1b1 in s_intersection (left=0xff44020, ll=3, right=0x0, 
    rl=1, ret=0x41dd5eb8, retl=0x41dd5eb4) at sets.c:135
#1  0x000000000040d31e in check_rdomain_crash (
    svcName=0x41dd6030 "service:mail3.epbfi.com") at groups.c:448
#2  0x000000000040d799 in consider_start (node=0xff3a2f0, 
    svcName=0x41dd6030 "service:mail3.epbfi.com", svcStatus=0x41dd5fd0, 
    membership=0xff426f0) at groups.c:585
#3  0x000000000040dd24 in eval_groups (local=1, nodeid=1, nodeStatus=1)
    at groups.c:765
#4  0x0000000000419b0e in node_event (local=1, nodeID=1, nodeStatus=1, clean=1)
    at rg_event.c:130
#5  0x000000000041a54f in _event_thread_f (arg=0x0) at rg_event.c:489
#6  0x00000039e3c06367 in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb) up
#1  0x000000000040d31e in check_rdomain_crash (
    svcName=0x41dd6030 "service:mail3.epbfi.com") at groups.c:448
448		if (s_intersection(fd_nodes, fd_nodecount, nodes, nodecount, 
(gdb) list
443			goto out_free;
444	
445		if (!(fl & FOD_RESTRICTED))
446			goto out_free;
447		
448		if (s_intersection(fd_nodes, fd_nodecount, nodes, nodecount, 
449			    &isect, &icount) < 0)
450			goto out_free;
451	
452		if (icount == 0) {
(gdb) p nodes
$4 = (int *) 0x0
(gdb) p nodecount
$5 = 1

This happens when either malloc() fails or due to the fact that rgmanager's check_rdomain_crash() function didn't correctly form node/nodecount values.

There are two patches which are really necessary:

Patch 1 fixes a log message, but also corrects the fact that node/nodecount values were wrong (this doesn't cause any real problem other than erroneous log messages):

http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=587a851e9b8d13e36c17b44b607d1b7fdd2e4840

Patch 2 fixes the segfault in all cases (even if malloc() fails):

http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=d5dba53a3bc5c20629b449bee5e3b0be4c71b538


Version-Release number of selected component (if applicable): rgmanager-2.0.46-1

How reproducible: Occasionally
Comment 1 Lon Hohberger 2009-04-08 17:18:25 EDT
Created attachment 338803 [details]
Patch #1 as attachment.
Comment 2 Lon Hohberger 2009-04-08 17:18:49 EDT
Created attachment 338804 [details]
Patch #2 as attachment.
Comment 3 Lon Hohberger 2009-04-08 17:22:03 EDT
You can also work around this bugzilla by enabling central_processing in cluster.conf:

  <rm central_processing="1" ... >
Comment 7 Issue Tracker 2009-04-16 13:21:11 EDT
The associated IT has been closed.

Internal Status set to 'Resolved'
Status set to: Closed by Tech

This event sent from IssueTracker by jleddy 
 issue 284271
Comment 12 Chris Ward 2009-07-03 14:29:40 EDT
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.
Comment 16 errata-xmlrpc 2009-09-02 07:05:06 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1339.html

Note You need to log in before you can comment on or make changes to this bug.