I'm having the following problem on a RH7.0 installation with two NICs. Both NICs are 3Com 3c509b cards and I am using the stock 3c509 module that comes with RH7.0. The first NIC (eth0) is connected to my internal network and has the address 192.168.1.1 The second NIC (eth1) is connected to my cablemodem and should be getting its address via DHCP. eth1 takes a long time (62 seconds) to start at boot time. If I type: /etc/rc.d/init.d/network stop pump -i eth1 pump reports "Operation failed." after about 60 seconds and tcpdump shows: 16:58:43.090510 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x2ec86818 ether 0:60:97:18:99:55 [|bootp] 16:58:43.096178 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:58:43.560696 eth1 > arp who-has 192.168.1.254 tell 192.168.1.1 (0:60:97:18:99:55) 16:58:43.560987 eth1 < arp reply 192.168.1.254 is-at 0:50:ba:43:3b:f1 (0:60:97:18:99:55) 16:58:43.561018 eth1 > 192.168.1.1.ftp > 192.168.1.254.61144: FP 3046472654:3046472722(68) ack 642693321 win 32120 <nop,nop,timestamp 69557 124641066> (DF) 16:58:44.090631 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:58:45.090626 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:58:46.760721 eth1 > 192.168.1.1.ftp > 192.168.1.254.61144: FP 0:68(68) ack 1 win 32120 <nop,nop,timestamp 69877 124641066> (DF) 16:58:47.090719 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x2ec86818 ether 0:60:97:18:99:55 [|bootp] 16:58:48.100787 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:58:49.100632 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:58:50.100625 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:58:53.110700 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:58:53.160671 eth1 > 192.168.1.1.ftp > 192.168.1.254.61144: FP 0:68(68) ack 1 win 32120 <nop,nop,timestamp 70517 124641066> (DF) 16:58:54.090677 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x2ec86818 ether 0:60:97:18:99:55 [|bootp] 16:58:54.110636 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:58:55.110626 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:58:58.120682 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:58:59.120627 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:00.120625 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:03.091507 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:04.090628 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:05.090625 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:05.960674 eth1 > 192.168.1.1.ftp > 192.168.1.254.61144: FP 0:68(68) ack 1 win 32120 <nop,nop,timestamp 71797 124641066> (DF) 16:59:07.090681 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x2ec86818 ether 0:60:97:18:99:55 [|bootp] 16:59:08.100704 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:09.100628 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:10.100624 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:13.090678 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x2ec86818 ether 0:60:97:18:99:55 [|bootp] 16:59:13.091586 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x2ec86818 ether 0:60:97:18:99:55 [|bootp] 16:59:13.101816 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x4cc86818 ether 0:60:97:18:99:55 [|bootp] 16:59:13.117353 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:14.110639 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:15.110625 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:17.100724 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x4cc86818 ether 0:60:97:18:99:55 [|bootp] 16:59:18.120821 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:19.120631 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:20.120625 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:23.131566 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:24.100691 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x4cc86818 ether 0:60:97:18:99:55 [|bootp] 16:59:24.130637 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:25.130625 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:28.140758 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:29.140629 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:30.140625 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:31.560725 eth1 > arp who-has 192.168.1.254 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:31.560977 eth1 < arp reply 192.168.1.254 is-at 0:50:ba:43:3b:f1 (0:60:97:18:99:55) 16:59:31.561009 eth1 > 192.168.1.1.ftp > 192.168.1.254.61144: FP 0:68(68) ack 1 win 32120 <nop,nop,timestamp 74357 124641066> (DF) 16:59:33.150683 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:34.150629 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:35.150625 eth1 > arp who-has 65.24.0.166 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:37.100678 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x4cc86818 ether 0:60:97:18:99:55 [|bootp] 16:59:38.160696 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:39.160627 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:40.160625 eth1 > arp who-has 65.24.0.167 tell 192.168.1.1 (0:60:97:18:99:55) 16:59:43.100681 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x4cc86818 ether 0:60:97:18:99:55 [|bootp] 16:59:43.101578 eth1 > 192.168.1.1.bootpc > 255.255.255.255.bootps: xid:0x4cc86818 ether 0:60:97:18:99:55 [|bootp] If I instead type: /etc/rc.d/init.d/network stop rmmod 3c509 pump -i eth1 eth1 comes up very quickly (about 1/2 second) and tcpdump shows the following: 17:01:44.133756 eth1 > 0.0.0.0.bootpc > 255.255.255.255.bootps: xid:0xf5c96818 ether 0:60:97:18:99:55 [|bootp] 17:01:44.180377 eth1 < As you can see, it seems that the DHCP requests are being sent as if eth1 has been assigned the address 192.168.1.1 ...but this is the address of eth0. When I remove the module after running the network stop script, the DHCP request is being sent out with a 0.0.0.0 address and the interface comes up fine. Please feel free to contact me if you need further information. Thanks, Mike Cencula mike.com
I can confirm this problem is still in RHL 7.1 (and current rawhide). In the relatively common firewall type configuration of a dual-homed machine with one public and one private addr, with the public one set by DHCP, if the private interface already has an addr configured, pump will send out its request on the public interface with the private addr as the source address. Given that these private addresses are frequently 10.*.*.* or 172.16.*.* addresses, some DHCP servers will refuse to listen to these requests. When dhcpcd kicks in when pump fails, it all works because it correctly uses a source address of 0.0.0.0.
The problem in the code is the following: dhcp.c: 1009 memset(&clientAddr.sin_addr, 0, sizeof(&clientAddr.sin_addr)); clientAddr.sin_family = AF_INET; clientAddr.sin_port = htons(BOOTP_CLIENT_PORT); /* bootp client */ if (bind(s, (struct sockaddr *) &clientAddr, sizeof(clientAddr))) { This does *not* set the IP source adress to 0.0.0.0 Indeed, as is said in the Stevens (page 92), when binding the socket, we can specify a wildcard that lets the kernel choose the adress. This wildcard is the constant INADDR_ANY. And unfortunately, this constant is 0.0.0.0. So, the kernel choose the IP source address on its own, and I suppose it is the first interface it finds. If there is no bug in the kernel in doing this, the only solution to fix the problem id to use a raw socket instead, and reimplement udp/ip, as is done in dhcpcd.
Created attachment 22246 [details] patch to fix 0.0.0.0 source adress issue with multiple NICs
First I have a question: in dhcp.c:pumpDhcpRun(), why the socket is not created with createSocket()? Is there any good reason for this? Next, I have noticed, that in its own building socket instructions, pumpDhcpRun() does not set SO_BINDTODEVICE before binding the device to the 0.0.0.0 adress. So, it is probable that if other interfaces are already running, the kernel set one of these ip adresses as source. The attached patch (above) should fix the problem.
Argh, I just got a 2nd NIC, and tried my patch .... it does not work! Well, there might be a problem with the kernel: it is possible to send 0.0.0.0 source udp packet only if all interfaces are down.
I tried the patch and I'm afraid it doesn't work. I investigated a bit closer in the kernel to see what it does. Fundamentally it all depends on the function inet_select_addr(). And I can't see any way to force an address of 0.0.0.0. But I'm not a kernel expert and may have traced it through incorrectly. There's a lot of hairy stuff here.
Well... I'm afraid raw sockets are necessary at this point. Unless a kernel expert can tell us a trick for this issue? ;)
I am the original poster of this bug, but my e-mail address has changed to mike. Although I'm not currently a programmer, I am learning. If there is something I can assist with, please feel free to contact me.
In rawhide, pump is obsoleted by dhcpcd, which I know does its own packet construction, and I'm pretty sure that it is more RFC-compliant. :) pump is still used in the installer though, so if this problem occurs at install time, maybe a solution is possible after all with the info you have given.
*** Bug 23477 has been marked as a duplicate of this bug. ***
*** Bug 27492 has been marked as a duplicate of this bug. ***
Well... I must have missed something. Mike opened a bug report and said "pump fails with a dual nic PC, and the reason may be this one". Then several people made some tests, and finally the conclusion was: "Mike, you're right, there is a bug in the pump code at lines there and there" Then come redhat people... who said "fine guys, if you don't use pump anymore, it should work. So let's close this bug and be happy". So, why doesn't redhat leave this bug open??? Else, what more info do you *need* ???
The problem with dhcpcd is that it is that it is 401096 bytes in size. Since pump weighs in at 46256 bytes, it fits much better on the bootable floppy I created for routing / masq / packet filtering.
Please forgive me if this is blatantly wrong (after all, I am new to programming)...But couldn't bind() be called before calling connect() in order to set the source address to 0.0.0.0? One thing that might be a problem is where Stevens says (bottom of p.91), "A process can bind a specific IP address to its socket. The ip address must belong to an interface on the host. For a TCP client, this assigns the source IP address that will be used for IP datagrams sent on the socket." So, the question is: Since the interface is down, does it have an IP address at all? If not, how do you assign 0.0.0.0 to a socket when 0.0.0.0 doesn't belong to any interfaces on the host. Another thought: Could it be possible to bring up the interface initially with 0.0.0.0 as the address? That way, this address would belong to an interface on the host and could then be used by bind() as a source address for the socket? Just my $.02
* There is no need for connect() before sending datas on an udp socket. * The intercace is not down. At the beginning, pump activates the interface in pumpPrepareInterface(), and in particular, it sets its IP adress to 0.0.0.0 (at least, it tries), and set some flags (one of them says the interface is up). Also, I traced in the program and noticed that setting the IP adress to 0.0.0.0 has no effect: after the ioctl() call, ifconfig shows that no IP adress is assigned to the interface. This is again a problem with the special 0 value, since replacing this by anything else works (as is shown with ifconfig). => So the main problem is that ioctl(s, SIOCSIFADDR, &req) does not work with the ip adress 0.0.0.0.
*** Bug 53700 has been marked as a duplicate of this bug. ***
*** Bug 77802 has been marked as a duplicate of this bug. ***