From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031015 Firebird/0.7 Description of problem: installed on 2 Dell Poweredge 1750, with 2 Xeon in each, and 2 Go of memory in each: RHEL 3 ES with latest erratas installed, and ClusterSuite installed. Configuration of LVS done with Piranha. pulse has been run on the 2 nodes. All seems okay....5 minutes. After 5 minutes, the master node get a kernel panic, the backup node takes the net traffic automtically, and 5 minutes later, the backup node get a kernel panic too. Version-Release number of selected component (if applicable): ipvsadm-1.21-9 kernel-2.4.21-4.0.1.EL How reproducible: Always Steps to Reproduce: 1.install RHEL 3 on two PE 1750 2.install cluster suite, configure lvs.cf 3. run pulse on each node, and wait Actual Results: kernel panic Expected Results: nothing Additional info: did the installation at the customer site back to RHEL 2.1AS. Running without any problem.
This is not a problem with ipvsadm, but with the ip_vs patch in the kernel. ip_vs is simply broken with the RHEL 3 smp kernel. The kernel gets stuck when ip_vs tries to expire a connection (this is why it takes a few minutes). WORKAROUND: use the up kernel. Here is a quick reproducer requiring only two machines, one ipvs router and one client: Set up Machine A as an ipvs router: % IP=192.168.xx.xx #the ip address of Machine A % RS=192.168.xx.xx #some other address (doesn't need to reach anything) % ipvsadm -A -t $IP:80 -s rr % ipvsadm -a -t $IP:80 -r $RS:80 -g -w 1 % ipvsadm --set 30 30 30 #optional - shortens the timeout Use Machine B to send a single request to the router % IP=192.168.xx.xx #the ip address of Machine A % echo 'GET /' | nc -nvw 3 $IP 80
This bug was fixed in kernel-2.4.21-4.9.EL, and will appear in update 1. Essentially, del_timer_sync was called from the timer expiry routine, causing it to block on itself. The call was changed to: if (timer_pending()) del_timer()
A client of us has reported this issue. The problem is also in RHEL 2.1 smp kernel. WORKAROUND: use the up kernel too. If you require more information please do contact me.
What kernel version?
Jeffrey, You can ignore comment #5, I have got the OS version wrong. Many apologies.
This is fixed in by the kernel errata in RHEL3 Update 1