Description of problem: rgmanager- If using rgmanager with a restricted failover domain where some of the nodes of the domain are offline during a failover event, rgmanager can crash with signal-11. Program terminated with signal 11, Segmentation fault. #0 0x000000000042c1b1 in s_intersection (left=0xff44020, ll=3, right=0x0, rl=1, ret=0x41dd5eb8, retl=0x41dd5eb4) at sets.c:135 135 if (left[l] != right[r]) (gdb) p right $1 = (set_type_t *) 0x0 (gdb) p r $2 = 0 (gdb) p rl $3 = 1 (gdb) list check_rdomain_crash 427 } 428 429 430 int 431 check_rdomain_crash(char *svcName) 432 { 433 int *nodes = NULL, nodecount; 434 int *fd_nodes = NULL, fd_nodecount, fl; 435 int *isect = NULL, icount; 436 char fd_name[256]; (gdb) bt #0 0x000000000042c1b1 in s_intersection (left=0xff44020, ll=3, right=0x0, rl=1, ret=0x41dd5eb8, retl=0x41dd5eb4) at sets.c:135 #1 0x000000000040d31e in check_rdomain_crash ( svcName=0x41dd6030 "service:mail3.epbfi.com") at groups.c:448 #2 0x000000000040d799 in consider_start (node=0xff3a2f0, svcName=0x41dd6030 "service:mail3.epbfi.com", svcStatus=0x41dd5fd0, membership=0xff426f0) at groups.c:585 #3 0x000000000040dd24 in eval_groups (local=1, nodeid=1, nodeStatus=1) at groups.c:765 #4 0x0000000000419b0e in node_event (local=1, nodeID=1, nodeStatus=1, clean=1) at rg_event.c:130 #5 0x000000000041a54f in _event_thread_f (arg=0x0) at rg_event.c:489 #6 0x00000039e3c06367 in ?? () #7 0x0000000000000000 in ?? () (gdb) up #1 0x000000000040d31e in check_rdomain_crash ( svcName=0x41dd6030 "service:mail3.epbfi.com") at groups.c:448 448 if (s_intersection(fd_nodes, fd_nodecount, nodes, nodecount, (gdb) list 443 goto out_free; 444 445 if (!(fl & FOD_RESTRICTED)) 446 goto out_free; 447 448 if (s_intersection(fd_nodes, fd_nodecount, nodes, nodecount, 449 &isect, &icount) < 0) 450 goto out_free; 451 452 if (icount == 0) { (gdb) p nodes $4 = (int *) 0x0 (gdb) p nodecount $5 = 1 This happens when either malloc() fails or due to the fact that rgmanager's check_rdomain_crash() function didn't correctly form node/nodecount values. There are two patches which are really necessary: Patch 1 fixes a log message, but also corrects the fact that node/nodecount values were wrong (this doesn't cause any real problem other than erroneous log messages): http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=587a851e9b8d13e36c17b44b607d1b7fdd2e4840 Patch 2 fixes the segfault in all cases (even if malloc() fails): http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=d5dba53a3bc5c20629b449bee5e3b0be4c71b538 Version-Release number of selected component (if applicable): rgmanager-2.0.46-1 How reproducible: Occasionally
Created attachment 338803 [details] Patch #1 as attachment.
Created attachment 338804 [details] Patch #2 as attachment.
You can also work around this bugzilla by enabling central_processing in cluster.conf: <rm central_processing="1" ... >
Test packages: http://people.redhat.com/lhh/rgmanager-2.0.46-1.el5.3.1bz494977.i386.rpm http://people.redhat.com/lhh/rgmanager-2.0.46-1.el5.3.1bz494977.src.rpm http://people.redhat.com/lhh/rgmanager-2.0.46-1.el5.3.1bz494977.x86_64.rpm
The associated IT has been closed. Internal Status set to 'Resolved' Status set to: Closed by Tech This event sent from IssueTracker by jleddy issue 284271
~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1339.html