232216 – IP failover ignoring restricited configuration

Bug 232216 - IP failover ignoring restricited configuration

Summary: IP failover ignoring restricited configuration

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	rgmanager
Sub Component:
Version:	4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Lon Hohberger
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-03-14 13:35 UTC by Dave Berry
Modified:	2009-04-16 20:22 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-07-31 18:30:43 UTC
Embargoed:

Attachments	(Terms of Use)

Description Dave Berry 2007-03-14 13:35:30 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.10) Gecko/20070226 Fedora/1.5.0.10-1.fc6 Firefox/1.5.0.10 pango-text

Description of problem:
3 node GFS cluster sharing 2 virtual IPs as 2 different services.
IPs are listed as services in the cluster.conf and the failover is set to use ordered/restricted. 
IP failover when the box goes down but does not return to the correctly prioritized box when it returns. 

<failoverdomain name="ip_domain2" ordered="1" restricted="1">
                                <failoverdomainnode name="fs102" priority="1"/>
                                <failoverdomainnode name="fs101" priority="2"/>
                                <failoverdomainnode name="fs02" priority="3"/>





Version-Release number of selected component (if applicable):
rgmanager-1.9.54-1

How reproducible:
Always


Steps to Reproduce:
1. Configure failover domain with ordered/restricted flags and service of a VIP
2. Shutdown primary box and failover IP
3. Bring up primaryy box and watch logs to see service returns

Actual Results:
Service does not fail back

 Mar  8 11:03:26 fs101 clurgmgrd[5684]: <debug> Relocating group nfs_ip2 to better node fs102
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <debug> Event (0:2:1) Processed
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <notice> Stopping service nfs_ip2
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <err> #52: Failed changing RG status
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <debug> Handling failure request for RG nfs_ip2
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <err> #57: Failed changing RG status

Expected Results:
The IP should fail back to better node

Additional info:

Comment 1 Lon Hohberger 2007-03-20 19:52:28 UTC

This isn't actually a policy bug; the cause of error #52 is the key here - that
shouldn't happen.  Could you try with the 1.9.54-3.228823 packages available here: 

http://people.redhat.com/lhh/packages.html

Comment 2 Dave Berry 2007-03-21 12:58:16 UTC

Tried with new  rgmanager package and I get the same results

Mar 20 16:49:03 fs102 clurgmgrd[5659]: <info> State change: fs101 UP 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Evaluating RG nfs_ip1, state
started, owner fs102 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Relocating group nfs_ip1 to
better node fs101 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Evaluating RG nfs_ip2, state
started, owner fs102 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Event (0:3:1) Processed 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <notice> Stopping service nfs_ip1 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <err> #52: Failed changing RG status 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Handling failure request for RG
nfs_ip1 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <err> #57: Failed changing RG status 
Mar 20 16:49:19 fs102 clurgmgrd: [5659]: <debug> Checking 172.16.1.224, Level 0

Comment 3 Lon Hohberger 2007-03-26 15:13:19 UTC

Hi,

I tried to reproduce this several times - and haven't been able to.  Could you
give me some hints about your systems?  It must be some sort of a race condition.

Last Thursday, I received a patch from a community user of linux-cluster which
*may* address this if you're willing to try it (though, I must be clear, I
couldn't get it to happen with or without their patch).  The reason it *may*
address this is because it fixes two bugs in the view-formation (data
distribution) code and an error case in the rgmanager message code.

Comment 4 Lon Hohberger 2007-03-26 15:14:41 UTC

By hints, I mean things like RAM / processor speed / # of cores

Comment 5 Dave Berry 2007-04-18 14:31:43 UTC

Both boxes are identical(Dell 1950s)
2 Dual Core Intel Xeon 2Ghz processors
2GB RAM
Qlogic QLA2432 fibre card 
Broadcom BCM5708 Gigabit Ethernet

Comment 6 Lon Hohberger 2007-05-02 12:56:52 UTC

Ok - I'll have to build using the patch from the community users.  The patch
addresses several things - including bugs in the vft subsystem (the part that's
throwing errors :) ).

Comment 7 Lon Hohberger 2007-05-16 15:49:31 UTC

This *should* be fixed in 4.5; could you retest on the current rgmanager package?

Comment 8 Lon Hohberger 2007-07-31 18:30:43 UTC

Per comment #3.

Note You need to log in before you can comment on or make changes to this bug.