Description of problem: /var/lib/dhclient/dhclient-*-eth0.lease; Huge numbers mysteriously appeared: option dhcp-lease-time 4294967295; option dhcp-renewal-time 4294967295; option dhcp-rebinding-time 4294967295; dhclient choked on them; From /var/log/messages: Dec 10 00:19:15 localhost dhclient[1516]: DHCPREQUEST on eth0 to 192.168.0.1 port 67 Dec 10 00:19:15 localhost dhclient[1516]: DHCPACK from 192.168.0.1 Dec 10 00:19:15 localhost dhclient[1516]: Unable to set up timer: out of range Dec 10 00:19:15 localhost dhclient[1516]: exiting. Dec 10 00:19:15 localhost NetworkManager[1235]: <info> (eth0): DHCPv4 client pid 1516 exited with status 1 Dec 10 00:19:15 localhost NetworkManager[1235]: <info> (eth0): device state change: 8 -> 9 (reason 6) Dec 10 00:19:15 localhost NetworkManager[1235]: <warn> Activation (eth0) failed. Dec 10 00:19:15 localhost NetworkManager[1235]: <info> (eth0): device state change: 9 -> 3 (reason 0) Dec 10 00:19:15 localhost NetworkManager[1235]: <info> (eth0): deactivating device (reason: 0). Dec 10 00:19:15 localhost avahi-daemon[1253]: Withdrawing address record for 192.168.0.181 on eth0. The system repeatedly tried to renew the lease, but simply kept failing. The network connection, eth0, had been operating normally, without incident, until during one renewal, it failed due to the huge numbers. Version-Release number of selected component (if applicable): dhclient-4.2.0-14.P1.fc14.x86_64 NetworkManager-0.8.1-10.git20100831.fc14.x86_64 avahi-0.6.27-2.fc14.x86_64 How reproducible: I presume if unsuitable huge numbers are put in, it will fail every time. Steps to Reproduce: 1. 2. 3. Actual results: Network bombs; user tears hair. Expected results: Network just works, just as it has done for quite some time. Additional info: I presume a bug is responsible for putting in the huge numbers, but it would be a more robust design for dhclient to substitute acceptable numbers, rather than to just exit with status 1. For example the default numbers: lease { interface "eth0"; fixed-address 192.168.0.181; option subnet-mask 255.255.255.0; option routers 192.168.0.1; option dhcp-lease-time 86400; option dhcp-message-type 5; option domain-name-servers 192.168.0.1; option dhcp-server-identifier 192.168.0.1; option dhcp-renewal-time 43200; option dhcp-rebinding-time 75600; renew 6 2010/12/11 07:07:12; rebind 6 2010/12/11 17:46:35; expire 6 2010/12/11 20:46:35; } My thanks to everyone involved.
First we need to know where these huge numbers come from 1) install wireshark-gnome, run wireshark 2) in menu Capture->Options..., select your interface (i.e. eth0), press Start 3) kill dhclient. NetworkManager should spot it and start dhclient again 4) after few seconds stop the capturing 5) save the packet capture (Save as...) and attach it here, thanks
Created attachment 468545 [details] wireshark report as per instructions -- dhclient killed with automatic restart
Well, the packet dump doesn't show anything strange. Renewal and Rebinding times sent by server are 12 hours and 21 hours. Please try the dhclient-4.2.0-16.P2.fc14 from updates-testing repo. It contains upstream (better than mine) fix for bug #628258 (seems similar to this bug, so maybe it's somehow related). 1) su -c 'yum --enablerepo=updates-testing update dhclient' 2) service NetworkManager stop 3) kill <dhclient pid> 4) rm /var/lib/dhclient/* 5) start packet capturing in wireshark 6) service NetworkManager start; wait till interface is configured 7) stop packet capture 8) check lease file and if it contains those huge numbers, attach the packet dump Thank you
(In reply to comment #0) > How reproducible: > I presume if unsuitable huge numbers are put in, it will fail every time. I tried to put option dhcp-lease-time 4294967295; option dhcp-renewal-time 4294967295; option dhcp-rebinding-time 4294967295; into lease file, but the lease was renewed (dhclient-4.2.0-14.P1.fc14.x86_64) without any problem.
Is there any news ? Did you updated to dhclient-4.2.0-16.P2.fc14 ? Have you seen the problem since then ?
Seeing similar problems with dhcp from rawhide with the Wi-Fi at Residencia Rialta in A Coruña where the GUADEC accommodation is now. dhclient-4.2.4-6.fc18.x86_64 This patch works around the issue for me: diff --git a/common/dispatch.c b/common/dispatch.c index 4f377c5..58ccc66 100644 --- a/common/dispatch.c +++ b/common/dispatch.c @@ -324,8 +324,18 @@ void add_timeout (when, where, what, ref, unref) q->next = timeouts; timeouts = q; + log_error ("About to check timer, sec is %ld usec is %d (max sec %u)", sec, usec, DHCP_SEC_MAX); + isc_interval_set(&interval, sec & DHCP_SEC_MAX, usec * 1000); + log_error ("Interval: %u sec %u nano", interval.seconds, interval.nanoseconds); status = isc_time_nowplusinterval(&expires, &interval); + + if (status == ISC_R_RANGE) { + expires.seconds = interval.seconds; + expires.nanoseconds = interval.nanoseconds; + status = ISC_R_SUCCESS; + } + if (status != ISC_R_SUCCESS) { /* * The system time function isn't happy or returned This is the output in the messages logs: Jul 25 20:01:01 sirocco dhclient[6365]: DHCPREQUEST on wlan0 to 255.255.255.255 port 67 (xid=0x60d11a10) Jul 25 20:01:01 sirocco dhclient[6365]: About to check timer, sec is 6 usec is 660394 (max sec 4294967295) Jul 25 20:01:01 sirocco dhclient[6365]: Interval: 6 sec 660394000 nano Jul 25 20:01:01 sirocco dhclient[6365]: DHCPACK from 172.20.0.2 (xid=0x60d11a10) Jul 25 20:01:01 sirocco dhclient[6365]: About to check timer, sec is 4294967295 usec is 260546 (max sec 4294967295) Jul 25 20:01:01 sirocco dhclient[6365]: Interval: 4294967295 sec 260546000 nano Jul 25 20:01:01 sirocco dhclient[6365]: bound to 172.20.9.134 -- renewal in 4294967295 seconds. I'm wondering whether there's a 64-bit issue here. I can try to capture a successful association if that's useful, or get details about the APs used.
Using wireshark shows the DHCP server replying with "4294967295" seconds in length which Wireshark interprets as infinity. The patch above does something similar within the data structures' limits.
This is correct: See related NetworkManager bug report: https://bugzilla.gnome.org/show_bug.cgi?id=680596
Copying the most important part from there: If you look at the following options: option dhcp-renewal-time 4294967295; option dhcp-rebinding-time 4294967295; And if you use e.g. python to count the correct values: >>> datetime.datetime.now() + datetime.timedelta(0, 4294967295) datetime.datetime(2148, 9, 1, 2, 5, 56, 531976) You'll see that the computed value is perfectly correct. And the lifetime value is actually 0xffffffff. >>> 0xffffffff 4294967295 See http://tools.ietf.org/html/rfc2131#section-3.3: A client acquires a lease for a network address for a fixed period of time (which may be infinite). Throughout the protocol, times are to be represented in units of seconds. The time value of 0xffffffff is reserved to represent "infinity". My feeling is that dhclient doesn't properly handle the 'infinity' value.
From NetworkManager bugzilla: > Andreas Proschofsky 2012-07-25 18:41:51 UTC > > Just wanted to report: The wireless seems to work just fine for me with > NetworkManager 0.9.4.0 and dhcpcd (so no dhclient)
Clearly this code in bind's libisc-export.so (which both NTP and dhclient use) is completely wrong for handling any kind of lease stuff. isc_time_nowplusinterval() comes from bind-9.9.1-P1/lib/isc/unix/time.c in the bind package. struct isc_interval { unsigned int seconds; unsigned int nanoseconds; }; struct isc_time { unsigned int seconds; unsigned int nanoseconds; }; isc_result_t isc_time_nowplusinterval(isc_time_t *t, const isc_interval_t *i) { ... if (gettimeofday(&tv, NULL) == -1) { ... if ((tv.tv_sec > INT_MAX || i->seconds > INT_MAX) && ((long long)tv.tv_sec + i->seconds > UINT_MAX)) return (ISC_R_RANGE); First, the "i->seconds > INT_MAX" check is bogus, because values of 4294967295 (close to UINT_MAX) are clearly allowed for the lease interval. Second, the "tv.tv_sec + i->seconds" check is also bogus because "NOW + lease time of UINT_MAX" is always > UINT_MAX... Second, since isc_interval's seconds member is an unsigned in, how is it useful to *ever* check it against INT_MAX? If it was supposed to be limited to INT_MAX, then the type should probably have been just 'int' to start with.
Some nice comments in dispatch.c from DHCP: * The ISC timer library doesn't seem to like negative values * and can't accept any values above 4G-1 seconds so we limit * the values to 0 <= value < 4G-1. We do it before except the code doesn't actually do that (it still allows setting the seconds to 4G/DHCP_SEC_MAX/0xffffffff). Plus even if the value was clamped to 4G-1, you'd still run afoul of the "i->seconds > INT_MAX" check in the timer code.
Warning: I cannot reproduce it against ISC DHCP server because the server terminates (I've just filed bug 843185). Therefore the client doesn't recieve his response and won't fail!
OK, I have built a custom build of dhcp for my experimental dhcp server (for the patch and build see again bug 843185). But, even though the server now keeps up and responds properly to any requests, dhclient configures the network normally. dhclient-4.2.4-0.4.rc1.fc17.x86_64 The DHCP response with 0xffffffff lease time is here (from tcpdump): 23:36:35.875318 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328) 192.168.25.1.bootps > 192.168.25.11.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0xcf258717, Flags [none] (0x0000) Your-IP 192.168.25.11 Client-Ethernet-Address 52:54:00:eb:e9:fb Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: ACK Server-ID Option 54, length 4: 192.168.25.1 Lease-Time Option 51, length 4: 4294967295 Subnet-Mask Option 1, length 4: 255.255.255.0 Domain-Name Option 15, length 11: "example.org" Domain-Name-Server Option 6, length 4: 8.8.8.8 Default-Gateway Option 3, length 4: 192.168.25.1 dhclient log: Jul 25 23:36:35 station dhclient[29205]: DHCPREQUEST on eth1 to 255.255.255.255 port 67 (xid=0x178725cf) Jul 25 23:36:35 station dhclient[29205]: DHCPACK from 192.168.25.1 (xid=0x178725cf) Jul 25 23:36:38 station NET[29280]: /usr/sbin/dhclient-script : updated /etc/resolv.conf Jul 25 23:36:38 station dhclient[29205]: bound to 192.168.25.11 -- renewal in 2147483645 seconds. Jul 25 23:40:10 station yum[29324]: Updated: 12:dhclient-4.2.4-2.fc17.x86_64 Jul 25 23:40:59 station dhclient[29339]: DHCPREQUEST on eth1 to 255.255.255.255 port 67 (xid=0x28d9d45e) Jul 25 23:40:59 station dhclient[29339]: DHCPACK from 192.168.25.1 (xid=0x28d9d45e) Jul 25 23:41:01 station NET[29414]: /usr/sbin/dhclient-script : updated /etc/resolv.conf Jul 25 23:41:01 station dhclient[29339]: bound to 192.168.25.11 -- renewal in 2147483646 seconds. dhclient lease file: lease { interface "eth1"; fixed-address 192.168.25.12; option subnet-mask 255.255.255.0; option routers 192.168.25.1; option dhcp-lease-time 4294967295; option dhcp-message-type 5; option domain-name-servers 8.8.8.8; option dhcp-server-identifier 192.168.25.1; option domain-name "example.org"; renew 2 2080/08/13 00:57:12; rebind 2 2131/08/28 09:22:41; expire 0 2148/09/01 04:11:19; } dhclient return value: 0 It works even with NetworkManager.
Retried with dhclient-4.2.4-2.fc17.x86_64 with the same result.
renew 2 2080/08/13 00:57:12; rebind 2 2131/08/28 09:22:41; expire 0 2148/09/01 04:11:19; Those are not the same time, as they are in the Rialta case.
Seems like bug #789601 at first glance. It's marked as security-related but it's most likely not.
Can you test this-scratch build ? It's build with patch from bug #789601 which hopefully is the same issue. http://koji.fedoraproject.org/koji/taskinfo?taskID=4330107
(In reply to comment #18) > Can you test this-scratch build ? > It's build with patch from bug #789601 which hopefully is the same issue. > > http://koji.fedoraproject.org/koji/taskinfo?taskID=4330107 I can't see how that patch would help since nowplustime() still checks interval->seconds against INT_MAX (and interval->seconds is UINT_MAX) and then checks current time plus interval against UINT_MAX which it will clearly overrun since interval seconds is already UINT_MAX. Unless I'm completely mistaken? The patches check for > DHCP_SEC_MAX doesn't trigger because when->tv_sec is the max, not greater than it. basically nowplustime() is just broken for large leases and the function needs a rework to deal with intervals larger then UINT_MAX / 2. But that gets complicated because maybe BIND/DHCP/NTP still need to run on stuff that still has 32 bit Unix time, I don't know. But my read of the patch is that it doesn't fix this issue.
One approach is to always limit the lease time to INT_MAX and just eat the renew when the lease is half done. We're talking about large enough values here that I don't think it matters whether its 4G or 2G seconds. That solves the problem without meaningfully impacting behavior. That's a lease time of *68* years and if your internet link is still working then I'd be really surprised.
(In reply to comment #16) > renew 2 2080/08/13 00:57:12; > rebind 2 2131/08/28 09:22:41; > expire 0 2148/09/01 04:11:19; > > Those are not the same time, as they are in the Rialta case. Those are just derived values from what comes in the DHCP datagram. Please can anyone do 'tcpdump --vv' on the DHCP datagrams from server in Rialta to compare?
Created attachment 600475 [details] dhcp-packets-rialta gzipped pcapng wireshark capture. I figured out how to export specific packets from Wireshark, so here's the dhcp request and ack.
Bastien, can you please test the build from comment #18 even when Dan thinks it can't work ? :-)
Created attachment 600483 [details] patch (In reply to comment #22) > Created attachment 600475 [details] > dhcp-packets-rialta lease-time, renewal-time and rebinding-time are all infinite (0xffffffff). To be able to reproduce this with ISC dhcpd I added the following lines to dhcpd.conf option dhcp-renewal-time 0xffffffff; option dhcp-rebinding-time 0xffffffff; default-lease-time infinite; max-lease-time infinite; min-lease-time infinite; and also used the attached patch (otherwise dhcpd doesn't send renewal-time and rebinding-time at all). But it still doesn't seem to be the trigger.
(In reply to comment #24) > But it still doesn't seem to be the trigger. Actually it *is*. With this configuration on server and patch (from previous comment) I see "Unable to set up timer: out of range" and dhclient exits after receiving the ACK from server. When I patch the client with the patch from bug #789601 I see "Timeout requested too large reducing to 2^^32-10 bound to 192.168.1.9 -- renewal in 4294967293 seconds." and dhclient continues.
So can somebody please confirm/disprove that the build from comment #18 fixes the problem in real life ? Thanks.
(In reply to comment #26) > So can somebody please confirm/disprove that the build from comment #18 > fixes the problem in real life ? Thanks. The linked build does fix the problem for me.
Thanks Jiri and Bastien, now it's pretty clear that I was using: default-lease-time infinite; but the reproducer is: default-lease-time infinite; option dhcp-renewal-time 0xffffffff; option dhcp-rebinding-time 0xffffffff; I will re-test it tomorrow at work. Time for a critical path bugfix update?
Created attachment 600669 [details] Condensed testcase of original code (without patch from #789601) This testcase is condensed from the actual dhclient and libisc sources and should indicate the problems in a much clearer manner.
The infinite timeout problem only happens on 64-bit hosts, due to the definition of time_t, which is 4-bytes on i686 and 8-bytes on x64. The actual problem on x64 is the overflow that occurs in dhcpack(): client -> new -> expiry += cur_time; /* Lease lengths can never be negative. */ if (client -> new -> expiry < cur_time) client -> new -> expiry = TIME_MAX; if 'expiry' is already UINT_MAX, then adding cur_time (which is an alias for tv.tv_sec from gettimeofday()) will cause it to wrap over, and the if() check will reset expiry to TIME_MAX, which is 2147483647, or INT_MAX. So then everything works. On 64-bit hosts, however, expiry does *not* wrap, and thus the if() does not trigger, and expiry remains larger than UINT_MAX. Thus when this line in add_timeout() is run, expiry is usually reset back to UINT_MAX: sec = when->tv_sec - cur_tv.tv_sec; because cur_tv.tv_sec has not changed yet. Thus, the code passes UINT_MAX into isc_time_nowplusinterval(), which is clearly not 64-bit safe and thus we fail the checks here: if ((tv.tv_sec > INT_MAX || i->seconds > INT_MAX) && ((long long)tv.tv_sec + i->seconds > UINT_MAX)) return -3; and cause the lease to fail, since i->seconds is > INT_MAX and tv.tv_sec (from gettimeofday()) + i->seconds (which is UINT_MAX) is clearly larger than UINT_MAX.
So with the patch, this block is what fixes the issue on x64: /* * We need to reduce (to 2^^32-1) the absolute time from an epoch * (i.e. value of when->tv_sec) and not the relative time (value of * sec variable). * In other words, we have to make sure that once the * isc_time_nowplusinterval() adds current time to the given relative * time the result will be less than 2^^32-1. */ if (when->tv_sec > DHCP_SEC_MAX) { log_error("Timeout requested too large " "reducing to 2^^32-10"); /* * HACK: 9 is some magic number of seconds * because some time goes by between the last call of gettimeofday() * and the one in isc_time_nowplusinterval() * I'm sure the ISC guys will figure out something better ;-) */ when->tv_sec = DHCP_SEC_MAX - 9; } because on 64-bit hosts, when->tv_sec will be UINT_MAX + gettimeofday() and thus larger than DHCP_SEC_MAX. So that clearly fixes the issue. But I guess my objection is that it doesn't really give any indication of *what* the issue is, and *why* it fixes the problem. Plus, it's never a problem on i686, and the patch doesn't say anything about the time_t size issues that are the root cause of the problem. Second, I think it would be clearer to fix this issue at the root, and add a comment to say what's actually going on, something like the attached patch.
Created attachment 600672 [details] Alternate fix for fixing 64-bit interval calculations
dhcp-4.2.4-9.P1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/FEDORA-2012-11079/dhcp-4.2.4-9.P1.fc17
dhcp-4.2.3-11.P2.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/FEDORA-2012-11110/dhcp-4.2.3-11.P2.fc16
Still can't use ISC dhcp server to reproduce, see bug 843802.
Finally reproduced with slightly different values: https://bugzilla.redhat.com/show_bug.cgi?id=843802#c5 And confirmed the fix at least this way.
<snip> Aug 1 02:47:55 sirocco yum[28598]: Updated: 12:dhclient-4.2.4-10.P1.fc18.x86_64 <snip> Aug 1 02:50:03 sirocco dhclient[28791]: bound to 172.20.9.134 -- renewal in 4294967295 seconds. Works for me.
Hello, This fixes an issue with some routers I have access to: 08:16:33.178639 IP (tos 0x10, ttl 16, id 0, offset 0, flags [none], proto UDP (17), length 328) 172.16.53.1.bootps > 172.16.53.55.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0x5bd54350, Flags [none] (0x0000) Your-IP 172.16.53.55 Client-Ethernet-Address 10:1f:74:xx:yy:zz Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: ACK Server-ID Option 54, length 4: 172.16.53.1 Lease-Time Option 51, length 4: 4294967295 Subnet-Mask Option 1, length 4: 255.255.255.0 Domain-Name-Server Option 6, length 12: 172.16.53.14,... Default-Gateway Option 3, length 4: 172.16.53.1 RN Option 58, length 4: 4294967295 RB Option 59, length 4: 4294967295 END Option 255, length 0 PAD Option 0, length 0, occurs 6 # dhclient -4 -d em1 Internet Systems Consortium DHCP Client 4.2.4 Copyright 2004-2012 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Listening on LPF/em1/10:1f:74:xx:yy:zz Sending on LPF/em1/10:1f:74:xx:yy:zz Sending on Socket/fallback DHCPREQUEST on em1 to 255.255.255.255 port 67 (xid=0x31420b71) DHCPACK from 172.16.53.1 (xid=0x31420b71) Unable to set up timer: out of range And now it works with the above update. Thanks!
> This fixes an issue with some routers I have access to: Do you know what these routers are?
dhcp-4.2.4-9.P1.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.
dhcp-4.2.3-11.P2.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.