From Bugzilla Helper: User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.2-2 i686) Description of problem: I want to use the linux multipath feature to load balance all traffic over 2 ethernet connections. For this I use the iproute2 package. I declare the route as follows (adresses changed): "ip route add default scope global equalize nexthop via 192.168.1.1 dev eth0 nexthop via 192.168.0.1 dev eth1" "ip route" give the following output: 192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.128 192.168.0.0/24 dev eth1 proto kernel scope link src 192.168.0.128 127.0.0.0/8 dev lo scope link default equalize nexthop via 192.168.1.1 dev eth0 weight 1 nexthop via 192.168.0.1 dev eth1 weight 1 So far so good..... The problem is however that if I start connections, they all go out through eth0. But half the connections have the source adress of eth1. In a simulated environment (internal network) I receive half of the packets back on eth1, but they are not send through eth1 (off course this is a problem when using different ISP's). This is probably a bug, an unimplemented feature or just my own stupidity.... but it gets even funnier: If I do the following (setup masquerading.... eth2 is 192.168.10.1): iptables -t nat -A POSTROUTING -s 192.168.10.0/24 -i eth0 -j MASQUERADE iptables -t nat -A POSTROUTING -s 192.168.10.0/24 -i eth1 -j MASQUERADE all connections from machines that are masqueraded do get load balanced right AND go out through the right network card (50/50 connection based, source and networkcard works....it does actually works when tested with a cable + ADSL provider). So in this case netfilter does get it right.... but local generated packets still get it wrong. How reproducible: Always Steps to Reproduce: 1. Add the routes with ip route 2. setup TCP/IP connections (telnet pinging,....) 3. Actual Results: Connections always go out through eth0. However the source adress changes on a 50/50 base. Expected Results: the connections should load balance over the two ethernet cards. Additional info:
I can only guess here that this might actually be a kernel bug/problem. From the very good report you have given i'd assume that the new iproute implementation simply doesn't do it right for the 'easy' case using direct load balancing with the ip command. In contrast the more complex iptables masquerading might do it the right way and might not use the same redirection mechanism in the kernel as the ip command and normal routing tables. You could give a very recent 2.4 kernel a try, like the newest one from rawhide and check if the problem still exists. If it does i can forward this to our kernel group and see if they know of this problem. Thanks again for the excellent bug report ;) Read ya, Phil
Hi, I have some more information for you. I have tried to use kernel 2.4.6 from rawhide, but it doesn't work. Some more info: first I look that I'm really running 2.4.6: uname -a Linux localhost.localdomain 2.4.6-2 #1 Tue Jul 10 18:14:02 EDT 2001 i686 unknown after the "ip route" command I look all the routes are allright: 164.35.198.0/24 dev eth0 proto kernel scope link src 164.35.198.52 10.20.14.0/24 dev eth1 proto kernel scope link src 10.20.14.5 127.0.0.0/8 dev lo scope link default equalize nexthop via 164.35.198.1 dev eth0 weight 1 nexthop via 10.20.14.1 dev eth1 weight 1 I do two pings, one should go out through eth0, the other through eth1 first I look at eth0 (source adress 164.35.198.52): ping 10.201.0.201 64 bytes from 10.201.0.201: icmp_seq=0 ttl=254 time=3.474 msec 64 bytes from 10.201.0.201: icmp_seq=1 ttl=254 time=1.364 msec 64 bytes from 10.201.0.201: icmp_seq=2 ttl=254 time=1.392 msec that seems to be ok..... From tcpdump: 07:57:47.656928 > 164.35.198.52 > 10.201.0.201: icmp: echo request (DF) 07:57:47.660367 < 10.201.0.201 > 164.35.198.52: icmp: echo reply (DF) 07:57:48.656706 > 164.35.198.52 > 10.201.0.201: icmp: echo request (DF) 07:57:48.658044 < 10.201.0.201 > 164.35.198.52: icmp: echo reply (DF) 07:57:49.656715 > 164.35.198.52 > 10.201.0.201: icmp: echo request (DF) 07:57:49.658077 < 10.201.0.201 > 164.35.198.52: icmp: echo reply (DF) So this case is ok. Now look at what happens when it decides to go over eth1 (source adress 10.20.14.5): ping 10.201.0.1 64 bytes from 10.201.0.1: icmp_seq=83 ttl=27 time=123.683 msec 64 bytes from 10.201.0.1: icmp_seq=84 ttl=27 time=122.493 msec 64 bytes from 10.201.0.1: icmp_seq=85 ttl=27 time=122.322 msec This seems to be ok.....but then I look at tcpdump. For eth1 I only see this: 07:53:24.519727 < 10.201.0.1 > 10.20.14.5: icmp: echo reply 07:53:25.518546 < 10.201.0.1 > 10.20.14.5: icmp: echo reply 07:53:26.520365 < 10.201.0.1 > 10.20.14.5: icmp: echo reply so no request?....wait....let's look at the tcpdump of eth0: 07:53:24.396781 > 10.20.14.5 > 10.201.0.1: icmp: echo request (DF) 07:53:25.396715 > 10.20.14.5 > 10.201.0.1: icmp: echo request (DF) 07:53:26.396714 > 10.20.14.5 > 10.201.0.1: icmp: echo request (DF) So all my requests go out through eth0.... In this case everything is coming back through eth1 because of a friendly router configuration, but with two different ISP on both ethernetcards it does not work because of source-adress checking at the providers. I also tried a 2.2.18 kernel (on a testmachine running SuSE 7.1), at that one works perfectly (only tested the local packets, because it doesn't run iptables)... I also have to add that on a certain moment in testing (last week, kernel 2.4.2) I got it working in a reversed way (eth0 packets going out through eth1 en eth1 packets going out of eth0...). This seems to be quite odd because everything in the configuration was thesame as the othertestcases.I only got this once and I can't reproduce this. I hope this information is usefull to you.
OK, it's been a while, but could you give the latest kernel of RH 7.3 a try? It's based on 2.4.18, so it might actually solve your problem. If it does i think we can safely assume that is has been fixed in recent kernels... Read ya, Phil
Lucky I just installed RH7.3 yesterday :-) I tried it again, but it still behaves strange. Local packets still find their way out of a wrong adapter. The symptons are a little bit different. Now everything goes out on adapter nr2 (eth1) with source adress of adapter nr1 (eth0). (without any attempt of balancing).
OK, this has taken quite some time, but the latest version of iproute in rawhide now contains a fix that resambles the problems you described here. So if you still read this bug after such a long time could you give that version a shot and see if it fixes your problem? Thanks, Read ya, Phil