Bug 48099 - problem when using multiple default gateways
Summary: problem when using multiple default gateways
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: iproute
Version: 7.1
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Phil Knirsch
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-07-09 18:45 UTC by Alain Wenmaekers
Modified: 2015-03-05 01:09 UTC (History)
1 user (show)

Fixed In Version: 2.4.7-14
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-04-21 15:03:29 UTC
Embargoed:


Attachments (Terms of Use)

Description Alain Wenmaekers 2001-07-09 18:45:20 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.2-2 i686)

Description of problem:
I want to use the linux multipath feature to load balance all traffic over
2 ethernet connections.
For this I use the iproute2 package. I declare the route as follows
(adresses changed):
"ip route add default scope global equalize
	nexthop via 192.168.1.1 dev eth0
	nexthop via 192.168.0.1 dev eth1"

"ip route" give the following output:
192.168.1.0/24 dev eth0  proto kernel  scope link  src 192.168.1.128
192.168.0.0/24 dev eth1  proto kernel  scope link  src 192.168.0.128
127.0.0.0/8 dev lo  scope link
default equalize
        nexthop via 192.168.1.1  dev eth0 weight 1
        nexthop via 192.168.0.1  dev eth1 weight 1

So far so good.....
The problem is however that if I start connections, they all go out through
eth0. But half the connections have the source adress of eth1.  In a
simulated environment (internal network) I receive half of the packets back
on eth1, but they are not send through eth1 (off course this is a problem
when using different ISP's).

This is probably a bug, an unimplemented feature or just my own
stupidity.... but it gets even funnier:

If I do the following (setup masquerading.... eth2 is 192.168.10.1):
iptables -t nat -A POSTROUTING -s 192.168.10.0/24 -i eth0 -j MASQUERADE
iptables -t nat -A POSTROUTING -s 192.168.10.0/24 -i eth1 -j MASQUERADE
all connections from machines that are masqueraded do get load balanced
right AND go out through the right network card (50/50 connection based,
source and networkcard works....it does actually works when tested with a
cable + ADSL provider).


So in this case netfilter does get it right.... but local generated packets
still get it wrong.

How reproducible:
Always

Steps to Reproduce:
1. Add the routes with ip route
2. setup TCP/IP connections (telnet pinging,....)
3.
	

Actual Results:  Connections always go out through eth0. However the source
adress changes on a 50/50 base.

Expected Results:  the connections should load balance over the two
ethernet cards.

Additional info:

Comment 1 Phil Knirsch 2001-07-16 12:53:33 UTC
I can only guess here that this might actually be a kernel bug/problem.

From the very good report you have given i'd assume that the new iproute
implementation simply doesn't do it right for the 'easy' case using direct load
balancing with the ip command. In contrast the more complex iptables
masquerading might do it the right way and might not use the same redirection
mechanism in the kernel as the ip command and normal routing tables.

You could give a very recent 2.4 kernel a try, like the newest one from rawhide
and check if the problem still exists. If it does i can forward this to our
kernel group and see if they know of this problem.

Thanks again for the excellent bug report ;)

Read ya, Phil

Comment 2 Alain Wenmaekers 2001-07-17 06:33:18 UTC
Hi, I have some more information for you.

I have tried to use kernel 2.4.6 from rawhide, but it doesn't work.

Some more info:

first I look that I'm really running 2.4.6:
uname -a
Linux localhost.localdomain 2.4.6-2 #1 Tue Jul 10 18:14:02 EDT 2001 i686
unknown

after the "ip route" command I look all the routes are allright:
164.35.198.0/24 dev eth0  proto kernel  scope link  src 164.35.198.52
10.20.14.0/24 dev eth1  proto kernel  scope link  src 10.20.14.5
127.0.0.0/8 dev lo  scope link
default equalize
        nexthop via 164.35.198.1  dev eth0 weight 1
        nexthop via 10.20.14.1  dev eth1 weight 1

I do two pings, one should go out through eth0, the other through eth1

first I look at eth0 (source adress 164.35.198.52):
ping 10.201.0.201
64 bytes from 10.201.0.201: icmp_seq=0 ttl=254 time=3.474 msec
64 bytes from 10.201.0.201: icmp_seq=1 ttl=254 time=1.364 msec
64 bytes from 10.201.0.201: icmp_seq=2 ttl=254 time=1.392 msec
that seems to be ok.....
From tcpdump:
07:57:47.656928 > 164.35.198.52 > 10.201.0.201: icmp: echo request (DF)
07:57:47.660367 < 10.201.0.201 > 164.35.198.52: icmp: echo reply (DF)
07:57:48.656706 > 164.35.198.52 > 10.201.0.201: icmp: echo request (DF)
07:57:48.658044 < 10.201.0.201 > 164.35.198.52: icmp: echo reply (DF)
07:57:49.656715 > 164.35.198.52 > 10.201.0.201: icmp: echo request (DF)
07:57:49.658077 < 10.201.0.201 > 164.35.198.52: icmp: echo reply (DF)

So this case is ok.

Now look at what happens when it decides to go over eth1 (source adress
10.20.14.5):
ping 10.201.0.1
64 bytes from 10.201.0.1: icmp_seq=83 ttl=27 time=123.683 msec
64 bytes from 10.201.0.1: icmp_seq=84 ttl=27 time=122.493 msec
64 bytes from 10.201.0.1: icmp_seq=85 ttl=27 time=122.322 msec

This seems to be ok.....but then I look at tcpdump.
For eth1 I only see this:
07:53:24.519727 < 10.201.0.1 > 10.20.14.5: icmp: echo reply
07:53:25.518546 < 10.201.0.1 > 10.20.14.5: icmp: echo reply
07:53:26.520365 < 10.201.0.1 > 10.20.14.5: icmp: echo reply
so no request?....wait....let's look at the tcpdump of eth0:
07:53:24.396781 > 10.20.14.5 > 10.201.0.1: icmp: echo request (DF)
07:53:25.396715 > 10.20.14.5 > 10.201.0.1: icmp: echo request (DF)
07:53:26.396714 > 10.20.14.5 > 10.201.0.1: icmp: echo request (DF)

So all my requests go out through eth0.... In this case everything is coming
back through eth1 because of a friendly router configuration, but with two
different ISP on both ethernetcards it does not work because of source-adress
checking at the providers.

I also tried a 2.2.18 kernel (on a testmachine running SuSE 7.1), at that one
works perfectly (only tested the local packets, because it doesn't run
iptables)...

I also have to add that on a certain moment in testing (last week, kernel 2.4.2)
I got it working in a reversed way (eth0 packets going out through eth1 en eth1
packets going out of eth0...). This seems to be quite odd because everything in
the configuration was thesame as the othertestcases.I only got this once and I
can't reproduce this.

I hope this information is usefull to you.


Comment 3 Phil Knirsch 2002-05-27 11:21:59 UTC
OK, it's been a while, but could you give the latest kernel of RH 7.3 a try?
It's based on 2.4.18, so it might actually solve your problem.

If it does i think we can safely assume that is has been fixed in recent kernels...

Read ya, Phil

Comment 4 Alain Wenmaekers 2002-05-27 14:58:22 UTC
Lucky I just installed RH7.3 yesterday :-)   
   
I tried it again, but it still behaves strange. Local packets still find their 
way out of a wrong adapter. 
 
The symptons are a little bit different. Now everything goes out on adapter 
nr2 (eth1) with source adress of adapter nr1 (eth0). (without any attempt of 
balancing).   
   


Comment 5 Phil Knirsch 2004-04-21 15:03:29 UTC
OK, this has taken quite some time, but the latest version of iproute
in rawhide now contains a fix that resambles the problems you
described here.

So if you still read this bug after such a long time could you give
that version a shot and see if it fixes your problem?

Thanks,

Read ya, Phil


Note You need to log in before you can comment on or make changes to this bug.