486711 – clusvcadm -e <service> -F kills cluster if no restricted failover domain member is available for <service>.

Bug 486711 - clusvcadm -e <service> -F kills cluster if no restricted failover domain member is available for <service>.

Summary: clusvcadm -e <service> -F kills cluster if no restricted failover domain memb...

Keywords:
Status:	CLOSED DUPLICATE of bug 486717
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	rgmanager
Sub Component:
Version:	5.2
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Lon Hohberger
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-02-21 11:41 UTC by Yevheniy Demchenko
Modified:	2009-04-16 22:56 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-02-26 20:16:25 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
patch. (498 bytes, patch) 2009-02-21 11:44 UTC, Yevheniy Demchenko	no flags	Details \| Diff
View All

Description Yevheniy Demchenko 2009-02-21 11:41:12 UTC

Description of problem:

If user tries to enable service belonging to a restricted domain with clusvcadm -e -F when no failover domain member is available, clurgmgrd dies on SIGSEGV on all nodes effectively shutting down the whole cluster. 


Version-Release number of selected component (if applicable):

Bug was initially found in rgmanager-2.0.38, still present in 2.0.46
How reproducible:
Always.

Steps to Reproduce:
1. Install 3(at least) nodes cluster (node01,node02,node03)
2. Define restricted failover domain test_domain (node01, node02)
3. Define a service test_service for this failover domain
4. Stop rgmanager on node01 and node02
5. Try "clusvcadm -e service:test_service -F" on node03
  
Actual results:
clurgmgrd gets segfault, node03 (and possibly other nodes with running clurgmgrd) reboots.

Expected results:
clusvcadm reports failed service.

Additional info:

[14615] debug: Node 1 is not listening
[14615] debug: Node 2 is not listening
[16761] debug: Sent remote-start request to 0
[16757] debug: Evaluating RG service:service_test, state stopped, owner none
[16757] notice: Marking service:service_test as stopped: Restricted domain unavailable
[16757] debug: Event (0:5:0) Processed
PID 14615 Thread 14615: SIGSEGV
[16757] debug: 3 events processed

As i suspect, the problem is in "Sent remote-start request to 0". Anyway, attached patch makes clurgmgrd to work.

diff -U 3 -r ./rgmanager-2.0.38.orig/src/daemons/rg_state.c ./rgmanager-2.0.38/src/daemons/rg_state.c
--- ./rgmanager-2.0.38.orig/src/daemons/rg_state.c	2008-03-27 21:12:36.000000000 +0100
+++ ./rgmanager-2.0.38/src/daemons/rg_state.c	2009-02-21 11:34:54.000000000 +0100
@@ -2057,7 +2057,7 @@
 		      	ret = handle_start_remote_req(svcName, request);
 			if (ret == RG_EAGAIN)
 				goto out;
-		} else if (target < 0) {
+		} else if (!(target > 0)) {
 			ret = RG_EFAIL;
 			goto out;
 		} else {

Comment 1 Yevheniy Demchenko 2009-02-21 11:44:12 UTC

Created attachment 332821 [details]
patch.

Comment 2 Lon Hohberger 2009-02-26 15:48:42 UTC

It looks like this patch is included in your other patch.

Comment 3 Lon Hohberger 2009-02-26 20:16:25 UTC

Marking as a dup of 486717 since that patch includes the fix for this one.

*** This bug has been marked as a duplicate of bug 486717 ***

Note You need to log in before you can comment on or make changes to this bug.