Bug 17511 - RedHat HA server problems...
RedHat HA server problems...
Status: CLOSED WORKSFORME
Product: Red Hat High Availability Server
Classification: Retired
Component: ipvsadm (Show other bugs)
1.0
i686 Linux
high Severity medium
: ---
: ---
Assigned To: Phil Copeland
Wil Harris
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2000-09-14 14:04 EDT by Scott Sherman
Modified: 2007-03-26 23:35 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2000-09-18 20:42:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Red Hat Bugzilla 2000-09-14 14:04:17 EDT
I have:

1. Installed RedHat 6.2
2. Installed all Errata
3. Upgraded the kernel to 2.2.16-4 (Also have 2.2.16-3 on machine).
4. Upgraded to the following:
ipvsadm-1.11-4.i386.rpm
piranha-0.4.17-2.i386.rpm
piranha-gui-0.4.17-2.i386.rpm
piranha-docs-0.4.16-2.i386.rpm
5.  Put the following line in /etc/rc.d/rc.local:  ipchains -A forward -j 
MASQ -s 192.168.0.0/24 -d 0.0.0.0/0

I have done this on two machines: They are as follows.



lb1	
-----------------------
eth0:  	195.50.89.219
eth1: 	192.168.0.219

External floating address: 195.50.89.225
Internal floating address:   192.168.0.250


lb2
------------------------
eth0:  	195.50.89.220
eth1: 	192.168.0.220

victim (web server)
---------------------------
eth0:192.168.0.31


Note:  In reality, Load balancer 1 is currently being used for Ultra 
Monkey....  So its just lb2 for the time being.  I have set up lb2 as the 
active router.



Both have identical lvs.cf files in the /etc/sysconfig/ha/ directory. 
Apache is running on port 5080 on each as well.   Pulse seems to work ok 
(lb2 goes down, lb1 takes over its floating IP addresses).  However,  when 
I  type the Virtual Server address into a web browser using port 80, 
nothing comes up. If I use port 5080, I get lb2's index page. 

Everything looks ok with ipvsadm -L -n:


[root@lb2 /root]# ipvsadm -L -n
IP Virtual Server version 0.9.14 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port          Forward Weight ActiveConn InActConn
TCP  195.50.89.225:80 wlc persistent 300
  -> 192.168.0.31:80             Masq    100    0          0




(configuration files follow)


/etc/lvs.cf

primary = 195.50.89.220
service = lvs
rsh_command = ssh
backup_active = 0
backup = 195.50.89.219
heartbeat = 1
heartbeat_port = 539
keepalive = 6
deadtime = 18
network = nat
nat_router = 192.168.0.250 eth1:1
virtual [cluster1] {
     active = 1
     address = 195.50.89.225 eth0:1
     port = 80
     persistent = 300
     send = "GET / HTTP/1.0\r\n\r\n"
     expect = "HTTP"
     load_monitor = ruptime
     scheduler = wlc
     protocol = tcp
     persistent = 300
     timeout = 6
     reentry = 15
     server [victim] {
         address = 192.168.0.31
         active = 1
         weight = 100
     }



/etc/sysctl.conf

# Disables packet forwarding
net.ipv4.ip_forward = 1
# Enables source route verification
net.ipv4.conf.all.rp_filter = 1
# Disables automatic defragmentation (needed for masquerading, LVS)
net.ipv4.ip_always_defrag = 1
# Disables the magic-sysrq key
kernel.sysrq = 0


Output of ifconfig with pulse running:

eth0      Link encap:Ethernet  HWaddr 00:50:DA:21:CF:C3
          inet addr:195.50.89.220  Bcast:195.50.89.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:11583 errors:0 dropped:0 overruns:1 frame:0
          TX packets:1084 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          Interrupt:10 Base address:0xc000

eth0:1    Link encap:Ethernet  HWaddr 00:50:DA:21:CF:C3
          inet addr:195.50.89.225  Bcast:195.50.89.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:10 Base address:0xc000

eth1      Link encap:Ethernet  HWaddr 00:01:29:00:05:2C
          inet addr:192.168.0.220  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13512 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4651 errors:0 dropped:0 overruns:0 carrier:0
          collisions:20 txqueuelen:100
          Interrupt:11 Base address:0xc400 Memory:e0041000-e0041900

eth1:1    Link encap:Ethernet  HWaddr 00:01:29:00:05:2C
          inet addr:192.168.0.250  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:11 Base address:0xc400 Memory:e0041000-e0041900

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3924  Metric:1
          RX packets:93 errors:0 dropped:0 overruns:0 frame:0
          TX packets:93 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0




Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt 
Iface
192.168.0.220   0.0.0.0         255.255.255.255 UH        0 0          0 
eth1
195.50.89.220   0.0.0.0         255.255.255.255 UH        0 0          0 
eth0
192.168.0.0     0.0.0.0         255.255.255.0   U         0 0          0 
eth1
195.50.89.0     0.0.0.0         255.255.255.0   U         0 0          0 
eth0
127.0.0.0       0.0.0.0         255.0.0.0       U         0 0          0 lo
0.0.0.0         192.168.0.1     0.0.0.0         UG        0 0          0 
eth1


Here is victim's (real server),

Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt 
Iface
192.168.0.31    0.0.0.0         255.255.255.255 UH        0 0          0 
eth0
192.168.0.0     0.0.0.0         255.255.255.0   U         0 0          0 
eth0
127.0.0.0       0.0.0.0         255.0.0.0       U         0 0          0 lo
0.0.0.0         192.168.0.250   0.0.0.0         UG        0 0          0 
eth0



Here is the results of cat'ing my /proc filesystem:

[root@lb2 /root]# cat /proc/sys/net/ipv4/ip_forward
1
cat /proc/sys/net/ipv4/ip_always_defrag
9

*this is a little weird.  I tend to get numbers like 8 and 12....

[root@lb2 /root]# cat /proc/sys/net/ipv4/conf/all/hidden
1
[root@lb2 /root]# cat /proc/sys/net/ipv4/conf/eth1/hidden
1



Here is ipchains:


ipchains -L -n 
Chain input (policy ACCEPT):
Chain forward (policy ACCEPT):
target     prot opt     source                destination           ports
MASQ       all  ------  192.168.0.0/24       0.0.0.0/0             n/a
Chain output (policy ACCEPT):


Rwhod is running:

[root@lb2 /root]# ps -aux|grep rwhod
root       534  0.0  0.1  1132  552 ?        S    11:47   0:00 rwhod




Here are some results from pulse -n -v:


[root@lb2 /root]# pulse -n -v
pulse: STARTING PULSE AS MASTER
pulse: DEBUG -- setting SEND_heartbeat timer
pulse: DEBUG -- setting SEND_heartbeat timer
pulse: DEBUG -- setting SEND_heartbeat timer
pulse: DEBUG -- setting NEED_heartbeat timer
pulse: partner dead: activating lvs
pulse: running command  "/usr/sbin/lvs" "--nodaemon" "-c" "/etc/lvs.cf"
lvs: running command  "/usr/sbin/ipvsadm" "-C"
lvs: starting virtual service [cluster1] active: 80
lvs: running command  "/usr/sbin/ipvsadm" "-A" "-t" "195.50.89.225:80" "-
s" "wlc
" "-p" "300"
pulse: DEBUG -- setting SEND_heartbeat timer
lvs: running command  "/usr/sbin/nanny" "-c" "-h" "192.168.0.31" "-
p" "80" "-s"
"GET / HTTP/1.0\r\n\r\n" "-x" "HTTP" "-a" "15" "-I" "/usr/sbin/ipvsadm" "-
t" "6"
 "-w" "100" "-V" "195.50.89.225" "-M" "m" "-U" "ruptime" "--nodaemon"
nanny: starting LVS client monitor for 195.50.89.225:80
pulse: DEBUG -- Executing '/sbin/ifconfig eth0:1 n.n.n.n up'
pulse: DEBUG -- Executing '/sbin/ifconfig eth1:1 n.n.n.n up'
lvs: create_monitor for [cluster1]/[victim] running as pid 965
pulse: running command  "/sbin/ifconfig" "eth1:1" "192.168.0.250" "up"
pulse: running command  "/usr/sbin/send_arp" "-
i" "eth1" "192.168.0.250" "000129
00052C" "192.168.0.255" "ffffffffffff"
pulse: DEBUG -- Executing '/usr/sbin/send_arp'
pulse: running command  "/sbin/ifconfig" "eth0:1" "195.50.89.225" "up"
pulse: running command  "/usr/sbin/send_arp" "-
i" "eth0" "195.50.89.225" "0050DA
21CFC3" "195.50.89.255" "ffffffffffff"
pulse: DEBUG -- Executing '/usr/sbin/send_arp'
nanny: making 192.168.0.31:80 available
nanny: running command  "/usr/sbin/ipvsadm" "-a" "-t" "195.50.89.225:80" "-
r" "1
92.168.0.31" "-m" "-w" "100"
nanny: running command  "ruptime" "192.168.0.31" "uptime"
nanny: bad load average returned: lb1         down   3+00:58
lb2           up      0:34,     1 user,   load 0.05, 0.01, 0.00
victim        up  15+17:37,     3 users,  load 1.01, 1.05, 1.07
pulse: gratuitous lvs arps finished
pulse: DEBUG -- setting SEND_heartbeat timer
pulse: DEBUG -- setting SEND_heartbeat timer
pulse: DEBUG -- setting NEED_heartbeat timer
pulse: DEBUG -- setting SEND_heartbeat timer
nanny: running command  "ruptime" "192.168.0.31" "uptime"
nanny: bad load average returned: lb1         down   3+00:58
lb2           up      0:34,     1 user,   load 0.05, 0.01, 0.00
victim        up  15+17:37,     3 users,  load 1.01, 1.05, 1.07
pulse: DEBUG -- setting SEND_heartbeat timer
pulse: DEBUG -- setting SEND_heartbeat timer



Lb1 is currently running ultra monkey, which is why I have it disabled in 
lvs.cf.  If I get piranha redirecting traffic the way it is supposed to, I 
will wipe Ultra Monkey and reinstall piranha in a 'heartbeat' ;).  
Thanks for all your help.

Sincerely,

Scott
Comment 1 Red Hat Bugzilla 2000-09-15 04:47:21 EDT
Added /etc/sysconfig/network-scripts/ifcfg-lo:0 to my real server and now 
everything seems to work.  This doesn't make any sense, since I am using NAT, 
does it?  Any ideas?
Comment 2 Red Hat Bugzilla 2000-09-15 11:35:16 EDT
*Scratch*
Ok the good news (for yourslef) is that it now works
The bad news is that your default network configuration is not as expected!
The is an axiom - if it's not broke, don't fix it.. so for the moment we'll play
it by the way you
currently have your setup in place as this now seems to work for you. I'm have
say I'm
quite sorry you've had all this distress trying to get the enviroment right for
either ultra-monkey
or piranha (which indicates that whatever is going on, it's generic)

I'm curious now. Does this mean you now have two loopback entries in your
network table?
The loopback is the special device used by the most unix kernels
(BSD/AIX/Linux/SunOS)
to help facilitate a class of IPC (Inter process communication) calls.

The default loopback configuration is

	[root@alpha network-scripts]# cat ifcfg-lo
	DEVICE=lo
	IPADDR=127.0.0.1
	NETMASK=255.0.0.0
	NETWORK=127.0.0.0
	# If you're having problems with gated making 127.0.0.0/8 a martian,
	# you can change this to something else (255.255.255.255, for example)
	BROADCAST=127.255.255.255
	ONBOOT=yes
	NAME=loopback

Would you be afronted if I could have a copy of  your ifcfg-eth0, ifcfg-lo
ifcfg-lo:0 and /etc/sysconfig/network files?

You are correct. In a NAT enviroment your real server would look fairly
unintresting
and would communicate via the network device with no direct involvement of the
loopback device. The NAT enviroment is supposed to cater for a low
administrative
overhead setup, not this mystery tour you've been lead down 8(

(an example real host I've pasted in here, You will note that this is slightly
different to the routing table you pasted in for the real server above though I
would not have expected any functional differance)
	[root@test92 /root]# netstat -nr
	Kernel IP routing table
	Destination     Gateway          Genmask         Flags   MSS Window  irtt Iface
	207.175.44.0   0.0.0.0             255.255.255.0   U         0 0          0
eth0
	127.0.0.0        0.0.0.0             255.0.0.0          U         0 0         
0 lo
	0.0.0.0           207.175.44.254 0.0.0.0              UG      0 0          0
eth0

	[root@test92 /root]# ifconfig
	eth0      Link encap:Ethernet  HWaddr 00:08:C7:33:C6:63  
	          inet addr:207.175.44.92  Bcast:207.175.44.255  Mask:255.255.255.0
	          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
	          RX packets:1952443 errors:0 dropped:0 overruns:0 frame:0
	          TX packets:2747 errors:0 dropped:0 overruns:0 carrier:0
	          collisions:0 txqueuelen:100 
	          Interrupt:11 Base address:0x2c00 

	lo        Link encap:Local Loopback  
	          inet addr:127.0.0.1  Mask:255.0.0.0
	          UP LOOPBACK RUNNING  MTU:3924  Metric:1
	          RX packets:14 errors:0 dropped:0 overruns:0 frame:0
	          TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
	          collisions:0 txqueuelen:0 

Regards

Phil
=--=
Comment 3 Red Hat Bugzilla 2000-09-18 15:01:52 EDT
OK, here is where I'm at now: 


/etc/sysconfig/network-scripts/ifcfg-lo:0 looked like this:

DEVICE=lo:0
IPADDR=195.50.89.225
NETMASK=255.255.255.255
NETWORK=195.50.89.0
BROADCAST=192.168.0.255
ONBOOT=yes
NAME=loopback


This was actually an accident, I forgot to change the broadcast address.  By 
doing this, It caused victim (real server) to send out ARP broadcasts with this 
info.  I'm assuming this is what got things to work.

Obviously, this became a problem when it came time to add another real server.

I have taken victim out of the cluster and now have a new real server called 
oki1.  I have added the lines to rc.local on oki1:

ifcofig eth0:0 195.50.89.225 up 
echo 1 > /proc/sys/net/ipv4/cong/all/hidden
echo 1 > /proc/sys/net/ipv4/cong/eth0/hidden

Switched over to direct routing and everything works like a dream.  I am 
beginning tp think this all has something to do with us being behind a 
firewall. NAT just doesn't want to work properly.  What ports does piranha/LVS 
make use of?  I noticed some traffic going to port 539 with tcpdump. 




Comment 4 Red Hat Bugzilla 2000-09-18 20:42:13 EDT
You posted on the mailing list that this is now working. Can we close this
bugzilla entry?

Note You need to log in before you can comment on or make changes to this bug.