Bug 232216 - IP failover ignoring restricited configuration
IP failover ignoring restricited configuration
Status: CLOSED WORKSFORME
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: rgmanager (Show other bugs)
4
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-03-14 09:35 EDT by Dave Berry
Modified: 2009-04-16 16:22 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-07-31 14:30:43 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dave Berry 2007-03-14 09:35:30 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.10) Gecko/20070226 Fedora/1.5.0.10-1.fc6 Firefox/1.5.0.10 pango-text

Description of problem:
3 node GFS cluster sharing 2 virtual IPs as 2 different services.
IPs are listed as services in the cluster.conf and the failover is set to use ordered/restricted. 
IP failover when the box goes down but does not return to the correctly prioritized box when it returns. 

<failoverdomain name="ip_domain2" ordered="1" restricted="1">
                                <failoverdomainnode name="fs102" priority="1"/>
                                <failoverdomainnode name="fs101" priority="2"/>
                                <failoverdomainnode name="fs02" priority="3"/>





Version-Release number of selected component (if applicable):
rgmanager-1.9.54-1

How reproducible:
Always


Steps to Reproduce:
1. Configure failover domain with ordered/restricted flags and service of a VIP
2. Shutdown primary box and failover IP
3. Bring up primaryy box and watch logs to see service returns

Actual Results:
Service does not fail back

 Mar  8 11:03:26 fs101 clurgmgrd[5684]: <debug> Relocating group nfs_ip2 to better node fs102
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <debug> Event (0:2:1) Processed
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <notice> Stopping service nfs_ip2
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <err> #52: Failed changing RG status
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <debug> Handling failure request for RG nfs_ip2
Mar  8 11:03:26 fs101 clurgmgrd[5684]: <err> #57: Failed changing RG status

Expected Results:
The IP should fail back to better node

Additional info:
Comment 1 Lon Hohberger 2007-03-20 15:52:28 EDT
This isn't actually a policy bug; the cause of error #52 is the key here - that
shouldn't happen.  Could you try with the 1.9.54-3.228823 packages available here: 

http://people.redhat.com/lhh/packages.html
Comment 2 Dave Berry 2007-03-21 08:58:16 EDT
Tried with new  rgmanager package and I get the same results

Mar 20 16:49:03 fs102 clurgmgrd[5659]: <info> State change: fs101 UP 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Evaluating RG nfs_ip1, state
started, owner fs102 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Relocating group nfs_ip1 to
better node fs101 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Evaluating RG nfs_ip2, state
started, owner fs102 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Event (0:3:1) Processed 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <notice> Stopping service nfs_ip1 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <err> #52: Failed changing RG status 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <debug> Handling failure request for RG
nfs_ip1 
Mar 20 16:49:04 fs102 clurgmgrd[5659]: <err> #57: Failed changing RG status 
Mar 20 16:49:19 fs102 clurgmgrd: [5659]: <debug> Checking 172.16.1.224, Level 0 
Comment 3 Lon Hohberger 2007-03-26 11:13:19 EDT
Hi,

I tried to reproduce this several times - and haven't been able to.  Could you
give me some hints about your systems?  It must be some sort of a race condition.

Last Thursday, I received a patch from a community user of linux-cluster which
*may* address this if you're willing to try it (though, I must be clear, I
couldn't get it to happen with or without their patch).  The reason it *may*
address this is because it fixes two bugs in the view-formation (data
distribution) code and an error case in the rgmanager message code.
Comment 4 Lon Hohberger 2007-03-26 11:14:41 EDT
By hints, I mean things like RAM / processor speed / # of cores
Comment 5 Dave Berry 2007-04-18 10:31:43 EDT
Both boxes are identical(Dell 1950s)
2 Dual Core Intel Xeon 2Ghz processors
2GB RAM
Qlogic QLA2432 fibre card 
Broadcom BCM5708 Gigabit Ethernet
Comment 6 Lon Hohberger 2007-05-02 08:56:52 EDT
Ok - I'll have to build using the patch from the community users.  The patch
addresses several things - including bugs in the vft subsystem (the part that's
throwing errors :) ).
Comment 7 Lon Hohberger 2007-05-16 11:49:31 EDT
This *should* be fixed in 4.5; could you retest on the current rgmanager package?
Comment 8 Lon Hohberger 2007-07-31 14:30:43 EDT
Per comment #3.

Note You need to log in before you can comment on or make changes to this bug.