Bug 23052 - pump fails on dual NIC machine
Summary: pump fails on dual NIC machine
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: pump
Version: 7.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Elliot Lee
QA Contact: David Lawrence
URL:
Whiteboard:
: 23477 27492 53700 77802 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2000-12-31 05:10 UTC by Need Real Name
Modified: 2007-04-18 16:30 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-06-28 16:13:31 UTC
Embargoed:


Attachments (Terms of Use)
patch to fix 0.0.0.0 source adress issue with multiple NICs (663 bytes, patch)
2001-06-29 20:41 UTC, Olivier Baudron
no flags Details | Diff

Description Need Real Name 2000-12-31 05:10:43 UTC
I'm having the following problem on a RH7.0 installation with two NICs.

Both NICs are 3Com 3c509b cards and I am using the stock 3c509 module
that comes with RH7.0.

The first NIC (eth0) is connected to my internal network and has the address 192.168.1.1

The second NIC (eth1) is connected to my cablemodem and should be getting its address via DHCP.

eth1 takes a long time (62 seconds) to start at boot time.

If I type:
/etc/rc.d/init.d/network stop
pump -i eth1

pump reports "Operation failed." after about 60 seconds and tcpdump shows:

16:58:43.090510 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x2ec86818 ether 0:60:97:18:99:55 [|bootp]
16:58:43.096178 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:58:43.560696 eth1 > arp who-has 192.168.1.254 tell 192.168.1.1 (0:60:97:18:99:55)
16:58:43.560987 eth1 < arp reply 192.168.1.254 is-at 0:50:ba:43:3b:f1 (0:60:97:18:99:55)
16:58:43.561018 eth1 > 192.168.1.1.ftp > 192.168.1.254.61144: FP 3046472654:3046472722(68) ack 642693321 win 32120 <nop,nop,timestamp 69557 124641066> (DF)
16:58:44.090631 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:58:45.090626 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:58:46.760721 eth1 > 192.168.1.1.ftp > 192.168.1.254.61144: FP 0:68(68) ack 1 win 32120 <nop,nop,timestamp 69877 124641066> (DF)
16:58:47.090719 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x2ec86818 ether 0:60:97:18:99:55 [|bootp]
16:58:48.100787 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:58:49.100632 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:58:50.100625 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:58:53.110700 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:58:53.160671 eth1 > 192.168.1.1.ftp > 192.168.1.254.61144: FP 0:68(68) ack 1 win 32120 <nop,nop,timestamp 70517 124641066> (DF)
16:58:54.090677 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x2ec86818 ether 0:60:97:18:99:55 [|bootp]
16:58:54.110636 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:58:55.110626 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:58:58.120682 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:58:59.120627 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:00.120625 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:03.091507 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:04.090628 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:05.090625 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:05.960674 eth1 > 192.168.1.1.ftp > 192.168.1.254.61144: FP 0:68(68) ack 1 win 32120 <nop,nop,timestamp 71797 124641066> (DF)
16:59:07.090681 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x2ec86818 ether 0:60:97:18:99:55 [|bootp]
16:59:08.100704 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:09.100628 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:10.100624 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:13.090678 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x2ec86818 ether 0:60:97:18:99:55 [|bootp]
16:59:13.091586 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x2ec86818 ether 0:60:97:18:99:55 [|bootp]
16:59:13.101816 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x4cc86818 ether 0:60:97:18:99:55 [|bootp]
16:59:13.117353 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:14.110639 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:15.110625 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:17.100724 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x4cc86818 ether 0:60:97:18:99:55 [|bootp]
16:59:18.120821 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:19.120631 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:20.120625 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:23.131566 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:24.100691 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x4cc86818 ether 0:60:97:18:99:55 [|bootp]
16:59:24.130637 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:25.130625 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:28.140758 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:29.140629 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:30.140625 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:31.560725 eth1 > arp who-has 192.168.1.254 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:31.560977 eth1 < arp reply 192.168.1.254 is-at 0:50:ba:43:3b:f1 (0:60:97:18:99:55)
16:59:31.561009 eth1 > 192.168.1.1.ftp > 192.168.1.254.61144: FP 0:68(68) ack 1 win 32120 <nop,nop,timestamp 74357 124641066> (DF)
16:59:33.150683 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:34.150629 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:35.150625 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:37.100678 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x4cc86818 ether 0:60:97:18:99:55 [|bootp]
16:59:38.160696 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:39.160627 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:40.160625 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55)
16:59:43.100681 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x4cc86818 ether 0:60:97:18:99:55 [|bootp]
16:59:43.101578 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x4cc86818 ether 0:60:97:18:99:55 [|bootp]

If I instead type:
/etc/rc.d/init.d/network stop
rmmod 3c509
pump -i eth1

eth1 comes up very quickly (about 1/2 second) and tcpdump shows the following:

17:01:44.133756 eth1 > 0.0.0.0.bootpc > 255.255.255.255.bootps: xid:0xf5c96818 ether 0:60:97:18:99:55 [|bootp]
17:01:44.180377 eth1 <

As you can see, it seems that the DHCP requests are being sent as if eth1 has been assigned the address
192.168.1.1  ...but this is the address of eth0.

When I remove the module after running the network stop script, the DHCP request is being sent out with a 0.0.0.0 address and the interface comes up fine.

Please feel free to contact me if you need further information.

Thanks,
Mike Cencula
mike.com

Comment 1 Jonathan Larmour 2001-06-06 03:38:17 UTC
I can confirm this problem is still in RHL 7.1 (and current rawhide). In the
relatively common firewall type configuration of a dual-homed machine with one
public and one private addr, with the public one set by DHCP, if the private
interface already has an addr configured, pump will send out its request on the
public interface with the private addr as the source address.

Given that these private addresses are frequently 10.*.*.* or 172.16.*.*
addresses, some DHCP servers will refuse to listen to these requests. When
dhcpcd kicks in when pump fails, it all works because it correctly uses a source
address of 0.0.0.0.



Comment 2 Olivier Baudron 2001-06-29 16:41:09 UTC
The problem in the code is the following:

dhcp.c: 1009
    memset(&clientAddr.sin_addr, 0, sizeof(&clientAddr.sin_addr));
    clientAddr.sin_family = AF_INET;
    clientAddr.sin_port = htons(BOOTP_CLIENT_PORT);	/* bootp client */

    if (bind(s, (struct sockaddr *) &clientAddr, sizeof(clientAddr))) {

This does *not* set the IP source adress to 0.0.0.0
Indeed, as is said in the Stevens (page 92), when binding the socket, we can
specify a wildcard that lets the kernel choose the adress. This wildcard is the
constant INADDR_ANY. And unfortunately, this constant is 0.0.0.0.

So, the kernel choose the IP source address on its own, and I suppose it is the
first interface it finds. If there is no bug in the kernel in doing this, the
only solution to fix the problem id to use a raw socket instead, and reimplement
udp/ip, as is done in dhcpcd.

Comment 3 Olivier Baudron 2001-06-29 20:41:01 UTC
Created attachment 22246 [details]
patch to fix 0.0.0.0 source adress issue with multiple NICs

Comment 4 Olivier Baudron 2001-06-29 20:42:43 UTC
First I have a question: in dhcp.c:pumpDhcpRun(), why the socket is not created
with createSocket()? Is there any good reason for this?

Next, I have noticed, that in its own building socket instructions,
pumpDhcpRun() does not set SO_BINDTODEVICE before binding the device to the
0.0.0.0 adress. So, it is probable that if other interfaces are already running,
the kernel set one of these ip adresses as source. The attached patch (above)
should fix the problem.

Comment 5 Olivier Baudron 2001-06-30 11:22:01 UTC
Argh, I just got a 2nd NIC, and tried my patch .... it does not work!
Well, there might be a problem with the kernel: it is possible to send 0.0.0.0
source udp packet only if all interfaces are down.

Comment 6 Jonathan Larmour 2001-07-01 15:19:37 UTC
I tried the patch and I'm afraid it doesn't work. I investigated a bit closer in
the kernel to see what it does. Fundamentally it all depends on the function
inet_select_addr(). And I can't see any way to force an address of 0.0.0.0. But
I'm not a kernel expert and may have traced it through incorrectly. There's a
lot of hairy stuff here.


Comment 7 Olivier Baudron 2001-07-01 16:44:20 UTC
Well... I'm afraid raw sockets are necessary at this point.
Unless a kernel expert can tell us a trick for this issue? ;)

Comment 8 Need Real Name 2001-07-23 03:06:19 UTC
I am the original poster of this bug, but my e-mail address has changed to 
mike.  Although I'm not currently a programmer, I am learning.  If there is 
something I can assist with, please feel free to contact me.


Comment 9 Elliot Lee 2001-08-08 04:19:01 UTC
In rawhide, pump is obsoleted by dhcpcd, which I know does its own packet 
construction, and I'm pretty sure that it is more RFC-compliant. :)

pump is still used in the installer though, so if this problem occurs at 
install time, maybe a solution is possible after all with the info you have 
given.

Comment 10 Elliot Lee 2001-08-08 04:23:24 UTC
*** Bug 23477 has been marked as a duplicate of this bug. ***

Comment 11 Elliot Lee 2001-08-08 05:04:15 UTC
*** Bug 27492 has been marked as a duplicate of this bug. ***

Comment 12 Olivier Baudron 2001-08-10 14:30:08 UTC
Well... I must have missed something.
Mike opened a bug report and said "pump fails with a dual nic PC, and the reason
may be this one". Then several people made some tests, and finally the
conclusion was: "Mike, you're right, there is a bug in the pump code at lines
there and there"

Then come redhat people... who said "fine guys, if you don't use pump anymore,
it should work. So let's close this bug and be happy".

So, why doesn't redhat leave this bug open???
Else, what more info do you *need* ???

Comment 13 Need Real Name 2001-09-15 07:08:04 UTC
The problem with dhcpcd is that it is that it is 401096 bytes in size.  Since 
pump weighs in at 46256 bytes, it fits much better on the bootable floppy I 
created for routing / masq / packet filtering.

Comment 14 Need Real Name 2001-09-15 07:48:27 UTC
Please forgive me if this is blatantly wrong (after all, I am new to 
programming)...But couldn't bind() be called before calling connect() in order 
to set the source address to 0.0.0.0?

One thing that might be a problem is where Stevens says (bottom of p.91), "A 
process can bind a specific IP address to its socket.  The ip address must 
belong to an interface on the host.  For a TCP client, this assigns the source 
IP address that will be used for IP datagrams sent on the socket."

So, the question is: Since the interface is down, does it have an IP address 
at all?  If not, how do you assign 0.0.0.0 to a socket when 0.0.0.0 doesn't 
belong to any interfaces on the host.

Another thought: Could it be possible to bring up the interface initially with 
0.0.0.0 as the address?  That way, this address would belong to an interface 
on the host and could then be used by bind() as a source address for the 
socket?

Just my $.02


Comment 15 Olivier Baudron 2001-09-15 20:38:38 UTC
* There is no need for connect() before sending datas on an udp socket.
* The intercace is not down. At the beginning, pump activates the interface in
pumpPrepareInterface(), and in particular, it sets its IP adress to 0.0.0.0 (at
least, it tries), and set some flags (one of them says the interface is up).
Also, I traced in the program and noticed that setting the IP adress to 0.0.0.0
has no effect: after the ioctl() call, ifconfig shows that no IP adress is
assigned to the interface. This is again a problem with the special 0 value,
since replacing this by anything else works (as is shown with ifconfig).

=> So the main problem is that ioctl(s, SIOCSIFADDR, &req) does not work with
the ip adress 0.0.0.0.


Comment 16 Mario Lorenz 2001-09-17 18:49:41 UTC
*** Bug 53700 has been marked as a duplicate of this bug. ***

Comment 17 Mika Länsirinne 2002-11-13 21:17:58 UTC
*** Bug 77802 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.