From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.2-2 i686)
Description of problem:
I want to use the linux multipath feature to load balance all traffic over
2 ethernet connections.
For this I use the iproute2 package. I declare the route as follows
"ip route add default scope global equalize
nexthop via 192.168.1.1 dev eth0
nexthop via 192.168.0.1 dev eth1"
"ip route" give the following output:
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.128
192.168.0.0/24 dev eth1 proto kernel scope link src 192.168.0.128
127.0.0.0/8 dev lo scope link
nexthop via 192.168.1.1 dev eth0 weight 1
nexthop via 192.168.0.1 dev eth1 weight 1
So far so good.....
The problem is however that if I start connections, they all go out through
eth0. But half the connections have the source adress of eth1. In a
simulated environment (internal network) I receive half of the packets back
on eth1, but they are not send through eth1 (off course this is a problem
when using different ISP's).
This is probably a bug, an unimplemented feature or just my own
stupidity.... but it gets even funnier:
If I do the following (setup masquerading.... eth2 is 192.168.10.1):
iptables -t nat -A POSTROUTING -s 192.168.10.0/24 -i eth0 -j MASQUERADE
iptables -t nat -A POSTROUTING -s 192.168.10.0/24 -i eth1 -j MASQUERADE
all connections from machines that are masqueraded do get load balanced
right AND go out through the right network card (50/50 connection based,
source and networkcard works....it does actually works when tested with a
cable + ADSL provider).
So in this case netfilter does get it right.... but local generated packets
still get it wrong.
Steps to Reproduce:
1. Add the routes with ip route
2. setup TCP/IP connections (telnet pinging,....)
Actual Results: Connections always go out through eth0. However the source
adress changes on a 50/50 base.
Expected Results: the connections should load balance over the two
I can only guess here that this might actually be a kernel bug/problem.
From the very good report you have given i'd assume that the new iproute
implementation simply doesn't do it right for the 'easy' case using direct load
balancing with the ip command. In contrast the more complex iptables
masquerading might do it the right way and might not use the same redirection
mechanism in the kernel as the ip command and normal routing tables.
You could give a very recent 2.4 kernel a try, like the newest one from rawhide
and check if the problem still exists. If it does i can forward this to our
kernel group and see if they know of this problem.
Thanks again for the excellent bug report ;)
Read ya, Phil
Hi, I have some more information for you.
I have tried to use kernel 2.4.6 from rawhide, but it doesn't work.
Some more info:
first I look that I'm really running 2.4.6:
Linux localhost.localdomain 2.4.6-2 #1 Tue Jul 10 18:14:02 EDT 2001 i686
after the "ip route" command I look all the routes are allright:
126.96.36.199/24 dev eth0 proto kernel scope link src 188.8.131.52
10.20.14.0/24 dev eth1 proto kernel scope link src 10.20.14.5
127.0.0.0/8 dev lo scope link
nexthop via 184.108.40.206 dev eth0 weight 1
nexthop via 10.20.14.1 dev eth1 weight 1
I do two pings, one should go out through eth0, the other through eth1
first I look at eth0 (source adress 220.127.116.11):
64 bytes from 10.201.0.201: icmp_seq=0 ttl=254 time=3.474 msec
64 bytes from 10.201.0.201: icmp_seq=1 ttl=254 time=1.364 msec
64 bytes from 10.201.0.201: icmp_seq=2 ttl=254 time=1.392 msec
that seems to be ok.....
07:57:47.656928 > 18.104.22.168 > 10.201.0.201: icmp: echo request (DF)
07:57:47.660367 < 10.201.0.201 > 22.214.171.124: icmp: echo reply (DF)
07:57:48.656706 > 126.96.36.199 > 10.201.0.201: icmp: echo request (DF)
07:57:48.658044 < 10.201.0.201 > 188.8.131.52: icmp: echo reply (DF)
07:57:49.656715 > 184.108.40.206 > 10.201.0.201: icmp: echo request (DF)
07:57:49.658077 < 10.201.0.201 > 220.127.116.11: icmp: echo reply (DF)
So this case is ok.
Now look at what happens when it decides to go over eth1 (source adress
64 bytes from 10.201.0.1: icmp_seq=83 ttl=27 time=123.683 msec
64 bytes from 10.201.0.1: icmp_seq=84 ttl=27 time=122.493 msec
64 bytes from 10.201.0.1: icmp_seq=85 ttl=27 time=122.322 msec
This seems to be ok.....but then I look at tcpdump.
For eth1 I only see this:
07:53:24.519727 < 10.201.0.1 > 10.20.14.5: icmp: echo reply
07:53:25.518546 < 10.201.0.1 > 10.20.14.5: icmp: echo reply
07:53:26.520365 < 10.201.0.1 > 10.20.14.5: icmp: echo reply
so no request?....wait....let's look at the tcpdump of eth0:
07:53:24.396781 > 10.20.14.5 > 10.201.0.1: icmp: echo request (DF)
07:53:25.396715 > 10.20.14.5 > 10.201.0.1: icmp: echo request (DF)
07:53:26.396714 > 10.20.14.5 > 10.201.0.1: icmp: echo request (DF)
So all my requests go out through eth0.... In this case everything is coming
back through eth1 because of a friendly router configuration, but with two
different ISP on both ethernetcards it does not work because of source-adress
checking at the providers.
I also tried a 2.2.18 kernel (on a testmachine running SuSE 7.1), at that one
works perfectly (only tested the local packets, because it doesn't run
I also have to add that on a certain moment in testing (last week, kernel 2.4.2)
I got it working in a reversed way (eth0 packets going out through eth1 en eth1
packets going out of eth0...). This seems to be quite odd because everything in
the configuration was thesame as the othertestcases.I only got this once and I
can't reproduce this.
I hope this information is usefull to you.
OK, it's been a while, but could you give the latest kernel of RH 7.3 a try?
It's based on 2.4.18, so it might actually solve your problem.
If it does i think we can safely assume that is has been fixed in recent kernels...
Read ya, Phil
Lucky I just installed RH7.3 yesterday :-)
I tried it again, but it still behaves strange. Local packets still find their
way out of a wrong adapter.
The symptons are a little bit different. Now everything goes out on adapter
nr2 (eth1) with source adress of adapter nr1 (eth0). (without any attempt of
OK, this has taken quite some time, but the latest version of iproute
in rawhide now contains a fix that resambles the problems you
So if you still read this bug after such a long time could you give
that version a shot and see if it fixes your problem?
Read ya, Phil