From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux ppc; rv:1.7.3) Gecko/20041014 Firefox/0.10.1 Description of problem: We have a 2.1AS cluster setup for cyrus. NodeA is a dual Xeon machine, NodeB was a dual p3 machine. Since it was removed for repair last week, I replaced it with single cpu Xeon machine. Cyrus service was running on NodeA, when today it crashed and NodeB took over. But, for some strange reason, it keeps dropping the service's shared IP. Here's the log at info level: Feb 17 22:53:37 nodeb clusvcmgrd[1789]: <info> state change: node nodea new state Down Feb 17 22:53:39 nodeb clusvcmgrd[1789]: <info> state change: node nodea new state Up Feb 17 22:53:43 nodeb clusvcmgrd[1789]: <info> state change: node nodea new state Down Feb 17 22:53:45 nodeb clusvcmgrd[1789]: <info> state change: node nodea new state Up Feb 17 22:53:55 nodeb clusvcmgrd[1789]: <info> state change: node nodea new state Down Feb 17 22:53:57 nodeb clusvcmgrd[1789]: <info> state change: node nodea new state Up Feb 17 23:17:19 nodeb clusvcmgrd[1789]: <info> state change: node nodea new state Down Feb 17 23:17:21 nodeb clusvcmgrd[1789]: <info> state change: node nodea new state Up Feb 17 23:23:29 nodeb clusvcmgrd[1789]: <info> state change: node nodea new state Down Feb 17 23:23:29 nodeb clusvcmgrd: [19449]: <notice> ipalias notice: Starting cluster alias Feb 17 23:23:29 nodeb clusvcmgrd: [19449]: <info> ipalias info: Starting IP address <shared ip> Feb 17 23:23:30 nodeb clusvcmgrd: [19449]: <info> ipalias info: Sending Gratuitous arp for <shared ip> (00:02:B3:9D:7F:06) Feb 17 23:23:31 nodeb clusvcmgrd[1789]: <info> state change: node nodea new state Up Feb 17 23:23:31 nodeb clusvcmgrd: [19524]: <notice> ipalias notice: Stopping cluster alias Feb 17 23:23:31 nodeb clusvcmgrd: [19524]: <info> ipalias info: Stopping IP address <shared ip> Feb 17 23:29:19 nodeb clusvcmgrd[1789]: <info> state change: node nodea new state Down Feb 17 23:29:21 nodeb clusvcmgrd[1789]: <info> state change: node nodea new state Up ... and so on. Meantime on NodeA everything is ok, interface with shared ip (eth0:0) is down, machine is normaly accessible. I have no idea why NodeB is changing its idea abount NodeA that frequent now ... also, I don't see why should a change of state of nodea trigger a stopping of shared ip on NodeB? Version-Release number of selected component (if applicable): clumanager-1.0.19-2 How reproducible: Didn't try Steps to Reproduce: I haven't really tried to reproduce it, as this is the production system. Additional info: /etc/cluster.conf: # This file is automatically generated. Do not manually edit! [cluhbd] logLevel = 6 [clupowerd] logLevel = 6 [cluquorumd] logLevel = 6 sameTimeNetdown = 20 sameTimeNetup = 20 [clurmtabd] logLevel = 4 [cluster] alias_ip = <shared ip> name = <name> timestamp = 1108659552 [clusvcmgrd] logLevel = 7 [database] version = 2.0 [members] start member0 start chan0 name = private-link1 type = net end chan0 id = 0 name = <nodea> powerSwitchIPaddr = <nodea> powerSwitchPortName = unused quorumPartitionPrimary = /dev/raw/raw1 quorumPartitionShadow = /dev/raw/raw2 end member0 start member1 start chan0 name = private-link2 type = net end chan0 id = 1 name = <nodeb> powerSwitchIPaddr = <nodeb> powerSwitchPortName = unused quorumPartitionPrimary = /dev/raw/raw1 quorumPartitionShadow = /dev/raw/raw2 end member1 [powercontrollers] start powercontroller0 IPaddr = <nodea> login = unused passwd = unused type = null end powercontroller0 start powercontroller1 IPaddr = <nodeb> login = unused passwd = unused type = null end powercontroller1 [services] start service0 checkInterval = 300 name = cyrus start network0 broadcast = <broadcast ip> ipAddress = <shared ip> netmask = <netmask ip> end network0 preferredNode = nodeb relocateOnPreferredNodeBoot = no userScript = /opt/scripts/cyrus end service0 clustat output: Cluster Status Monitor (<name>) 00:16:49 Cluster alias: <dns name of shared ip> ========================= M e m b e r S t a t u s ========================== Member Status Node Id Power Switch -------------- ---------- ---------- ------------ nodea Up 0 Good nodeb Up 1 Good ========================= H e a r t b e a t S t a t u s ==================== Name Type Status ------------------------------ ---------- ------------ private-link <--> private-link network Unknown ========================= S e r v i c e S t a t u s ======================== Last Monitor Restart Service Status Owner Transition Interval Count -------------- -------- -------------- ---------------- -------- ------- cyrus started nodeb 15:33:32 Feb 17 300 0 I have "private-link" defined in /etc/hosts and it's normally pingable from both machines. It's dedicated network interface connected with crossover cable. Clustat never showed anything else but "Unknown" about it ... I have already scheduled a downtime next friday to upgrade to the latest 1.0.28 version. I see there are some "nice to have" fixes in start/stop logic. I'll report if it fixes this problem; I hope it does :) If you have any other idea I could try, please come up with it till the next friday :)
Hi, have you filed a request with Red Hat Support? http://www.redhat.com/apps/support/
As it turns out, the IP alias ("cluster alias IP") always will run on node ID 0 if node ID 0 is online. From the source code: /* * Cluster alias management. Low node ID wins the alias when both are up. */ Cluster alias IP != service IP, which should only move (with the rest of the service) if "preferred node" and "relocate on preferred node boot" are set.