Bug 232216 - IP failover ignoring restricited configuration
Summary: IP failover ignoring restricited configuration
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: rgmanager
Version: 4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-03-14 13:35 UTC by Dave Berry
Modified: 2009-04-16 20:22 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-07-31 18:30:43 UTC
Embargoed:


Attachments (Terms of Use)

Description Dave Berry 2007-03-14 13:35:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.10) Gecko/20070226 Fedora/1.5.0.10-1.fc6 Firefox/1.5.0.10 pango-text

Description of problem:
3 node GFS cluster sharing 2 virtual IPs as 2 different services.
IPs are listed as services in the cluster.conf and the failover is set to use ordered/restricted. 
IP failover when the box goes down but does not return to the correctly prioritized box when it returns. 

<failoverdomain name="ip_domain2" ordered="1" restricted="1">
                                <failoverdomainnode name="fs102" priority="1"/>
                                <failoverdomainnode name="fs101" priority="2"/>
                                <failoverdomainnode name="fs02" priority="3"/>





Version-Release number of selected component (if applicable):
rgmanager-1.9.54-1

How reproducible:
Always


Steps to Reproduce:
1. Configure failover domain with ordered/restricted flags and service of a VIP
2. Shutdown primary box and failover IP
3. Bring up primaryy box and watch logs to see service returns

Actual Results:
Service does not fail back

 Mar  8 11:03:26 fs101 clurgmgrd[5684]: <debug> Relocating group nfs_ip2 to better node fs102
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <debug> Event (0:2:1) Processed
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <notice> Stopping service nfs_ip2
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <err> #52: Failed changing RG status
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <debug> Handling failure request for RG nfs_ip2
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <err> #57: Failed changing RG status

Expected Results:
The IP should fail back to better node

Additional info:

Comment 1 Lon Hohberger 2007-03-20 19:52:28 UTC
This isn't actually a policy bug; the cause of error #52 is the key here - that
shouldn't happen.  Could you try with the 1.9.54-3.228823 packages available here: 

http://people.redhat.com/lhh/packages.html

Comment 2 Dave Berry 2007-03-21 12:58:16 UTC
Tried with new  rgmanager package and I get the same results

Mar 20 16:49:03 fs102 clurgmgrd[5659]: <info> State change: fs101 UP 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Evaluating RG nfs_ip1, state
started, owner fs102 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Relocating group nfs_ip1 to
better node fs101 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Evaluating RG nfs_ip2, state
started, owner fs102 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Event (0:3:1) Processed 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <notice> Stopping service nfs_ip1 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <err> #52: Failed changing RG status 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Handling failure request for RG
nfs_ip1 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <err> #57: Failed changing RG status 
Mar 20 16:49:19 fs102 clurgmgrd: [5659]: <debug> Checking 172.16.1.224, Level 0 

Comment 3 Lon Hohberger 2007-03-26 15:13:19 UTC
Hi,

I tried to reproduce this several times - and haven't been able to.  Could you
give me some hints about your systems?  It must be some sort of a race condition.

Last Thursday, I received a patch from a community user of linux-cluster which
*may* address this if you're willing to try it (though, I must be clear, I
couldn't get it to happen with or without their patch).  The reason it *may*
address this is because it fixes two bugs in the view-formation (data
distribution) code and an error case in the rgmanager message code.

Comment 4 Lon Hohberger 2007-03-26 15:14:41 UTC
By hints, I mean things like RAM / processor speed / # of cores

Comment 5 Dave Berry 2007-04-18 14:31:43 UTC
Both boxes are identical(Dell 1950s)
2 Dual Core Intel Xeon 2Ghz processors
2GB RAM
Qlogic QLA2432 fibre card 
Broadcom BCM5708 Gigabit Ethernet

Comment 6 Lon Hohberger 2007-05-02 12:56:52 UTC
Ok - I'll have to build using the patch from the community users.  The patch
addresses several things - including bugs in the vft subsystem (the part that's
throwing errors :) ).

Comment 7 Lon Hohberger 2007-05-16 15:49:31 UTC
This *should* be fixed in 4.5; could you retest on the current rgmanager package?

Comment 8 Lon Hohberger 2007-07-31 18:30:43 UTC
Per comment #3.


Note You need to log in before you can comment on or make changes to this bug.