Bug 789601

Summary: dhcpd fails with "Unable to set up timer: out of range"
Product: [Fedora] Fedora Reporter: John Levon <levon>
Component: dhcpAssignee: Jiri Popelka <jpopelka>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: 16CC: bnocera, dcbw, jpopelka, ovasik, psimerda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-06 03:57:36 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 796459    
Attachments:
Description Flags
Provisional patch
none
wireshark capture of dhcp discover packet
none
John's packet dump (DHCP messages only).
none
final patch
none
Simpler alternate patch for 64-bit interval calculation bug (see also #662254) none

Description John Levon 2012-02-11 12:23:27 EST
dhcpd fails after a while with:

Feb 11 17:19:18 pent dhcpd: Timeout requested too large reducing to 2^^32-1
Feb 11 17:19:18 pent dhcpd: Unable to set up timer: out of range
Feb 11 17:19:18 pent dhcpd[29451]: Timeout requested too large reducing to 2^^32-1
Feb 11 17:19:18 pent dhcpd:
Feb 11 17:19:18 pent dhcpd[29451]: Unable to set up timer: out of range

dhcp-4.2.3-6.P2.fc16.x86_64

Removing or modifying lease options below seems to make no difference. I don't know of a work-around.

dhcpd.conf is:

ddns-update-style interim;
authoritative;
ignore client-updates;

subnet 192.168.0.0 netmask 255.255.255.0 {
}

subnet 192.168.1.0 netmask 255.255.255.0 {

# --- default gateway
	option routers			192.168.1.1;
	option subnet-mask		255.255.255.0;

	option nis-domain		"localdomain";
	option domain-name		"localdomain";
	option domain-name-servers	8.8.8.8;

#	option time-offset		-18000;	# Eastern Standard Time

#	option ip-forwarding off;

	default-lease-time infinite;
	max-lease-time infinite;

	host rent {
#		hardware ethernet 0:c0:9f:66:fa:fd;
#		hardware ethernet 0:0b:6b:4c:40:52;
		hardware ethernet 0:1a:6b:6a:21:5b;
#		hardware ethernet 00:1b:77:5a:50:7b;
		fixed-address 192.168.1.3;
		option host-name "rent";
	}

	host argument {
		hardware ethernet 00:12:3f:eb:7f:8f;
		fixed-address 192.168.1.4;
		option host-name "argument";
	}

	host sent {
		hardware ethernet 00:1c:bf:42:fb:8a;
		fixed-address 192.168.1.8;
		option host-name "sent";
	}

	host went {
		hardware ethernet 00:0f:b5:9f:c3:78;
		fixed-address 192.168.1.100;
		option host-name "went";
	}

	host parent {
		hardware ethernet b8:ff:61:11:cc:34;
		fixed-address 192.168.1.5;
		option host-name "parent";
	}

	range 192.168.1.9 192.168.1.90;
}
Comment 1 John Levon 2012-02-11 12:49:20 EST
It looks like the client with the request is an Android HTC phone.
Comment 2 John Levon 2012-02-13 12:45:16 EST
In fact, if this really is dhcpd dying due to a bad request, this is a big security hole.
Comment 3 Jiri Popelka 2012-02-13 12:53:22 EST
OK, marking as "Security Sensitive" for now. I'll look at the problem tomorrow.
Comment 4 Jiri Popelka 2012-02-14 03:28:55 EST
Good news first: The problem is in code (common/dispatch.c:add_timeout()) that was newly added in 4.2.0 so no RHEL version is affected.
Comment 5 Jiri Popelka 2012-02-14 03:41:30 EST
We already once had a problem (bug #628258) with this part of code.
I reported the fix upstream but they chose a different solution, which obviously hasn't been perfect.
Their fix was released in 4.2.1b1 with this comment in changelog:
- Limit the timeout period allowed in the dispatch code to 2^^32-1 seconds.
  Thanks to a report from Jiri Popelka at Red Hat.
  [ISC-Bugs #22033], [Red Hat Bug #628258]
Comment 6 Jiri Popelka 2012-02-14 04:06:26 EST
John,

thank you for the report. It took me some time to get to it because from the description I hadn't been aware of the fact that the server crashes.

Anyway this really looks like a security problem and to narrow it down some packet dump is crucial (we need to see the message that came from the client before server crashes). Are you able to get one ? You can use wireshark-gnome or tcpdump. Thanks again.
Comment 7 John Levon 2012-02-14 08:46:11 EST
I can, though it will probably be tomorrow at the earliest.

In the meantime I'm using dnsmasq as a workaround.
Comment 8 Jiri Popelka 2012-02-15 09:31:35 EST
I've added some debugging outputs to the dhcpd code so it should write out values of some important variables before the crash.

Download it from here http://jpopelka.fedorapeople.org/789601/dhcpd
and make it executable (chmod +x dhcpd).

You could either replace /usr/sbin/dhcpd with it (restore SELinux context with 'restorecon -Fvv /usr/sbin/dhcpd') or simply leave it wherever you want and run  it (as root) with /path/to/dhcpd -d [<interface>]
Comment 9 John Levon 2012-02-15 20:54:56 EST
Feb 16 01:54:06 pent dhcpd: Timeout requested too large reducing to 2^^32-1
Feb 16 01:54:06 pent dhcpd: when->tv_sec: 0x4f466e6c (5624983148)
Feb 16 01:54:06 pent dhcpd: when->tv_usec: 0x0 (0)
Feb 16 01:54:06 pent dhcpd: cur_tv.tv_sec: 0x4f3c61be (1329357246)
Feb 16 01:54:06 pent dhcpd: cur_tv.tv_usec: 0x32e58 (208472)
Feb 16 01:54:06 pent dhcpd: sec: 0xffffffff (4294967295)
Feb 16 01:54:06 pent dhcpd: sec & 0xFFFFFFFF: 0xffffffff (4294967295)
Feb 16 01:54:06 pent dhcpd: usec: 0x0 (0)
Feb 16 01:54:06 pent dhcpd: interval.seconds: 0xffffffff (4294967295)
Feb 16 01:54:06 pent dhcpd: interval.nanoseconds: 0x0 (0)
Feb 16 01:54:06 pent dhcpd: expires.seconds: 0xccec5a98 (3438041752)
Feb 16 01:54:06 pent dhcpd: expires.nanoseconds: 0x7f0c (32524)
Feb 16 01:54:06 pent dhcpd: Unable to set up timer: out of range
Comment 10 Jiri Popelka 2012-02-16 08:34:12 EST
Created attachment 562491 [details]
Provisional patch

I hopefully localized the problem and have a patch.
I locally built the packages but don't know if it's safe tu upload them (not the srpm of course) to a public site like http://jpopelka.fedorapeople.org so John can test them.
Meanwhile there's only the patched dhcpd binary http://jpopelka.fedorapeople.org/789601/dhcpd

Some comment what's (I think) going on is in the patch.
The core fix with comment should be:

dhcp-4.2.3-P2/common/dispatch.c
@@ -246,26 +246,40 @@ void add_timeout (when, where, what, ref
 	 * the working code use the same values.
 	 */
 
+	/*
+	 * We need to reduce (to 2^^32-1) the absolute time from an epoch
+	 * (i.e. value of when->tv_sec) and not the relative time (value of
+	 * sec variable).
+	 * In other words, we have to make sure that once the
+	 * isc_time_nowplusinterval() adds current time to the given relative
+	 * time the result will be less than 2^^32-1.
+	 */
+	if (when->tv_sec > DHCP_SEC_MAX) {
+		log_error("Timeout requested too large "
+			  "reducing to 2^^32-10");
+		/*
+		 * HACK: 9 is some magic number of seconds
+		 *       because some time goes by between the last call of gettimeofday()
+		 *       and the one in isc_time_nowplusinterval()
+		 *       I'm sure the ISC guys will figure out something better ;-)
+		 */
+		when->tv_sec = DHCP_SEC_MAX - 9;
+	}
 	sec  = when->tv_sec - cur_tv.tv_sec;
 	usec = when->tv_usec - cur_tv.tv_usec;
Comment 11 Jiri Popelka 2012-02-16 08:37:08 EST
Anyway the packet dump (as requested in comment #6) would be still very useful.
So I can try to reproduce it here.
Comment 12 John Levon 2012-02-17 14:46:25 EST
Created attachment 563976 [details]
wireshark capture of dhcp discover packet
Comment 13 Jiri Popelka 2012-02-18 02:19:48 EST
Thanks, however I can't see anything wrong in the packet and my server (even with your configuration) correctly answers with DHCPOFFER indeed.
Maybe it was some other packet ? You can send the whole packet dump (not just the one message) directly to my email.

It would be also great if you could test the patched dhcpd
http://jpopelka.fedorapeople.org/789601/dhcpd
Comment 14 John Levon 2012-02-18 06:56:40 EST
The patched binary works for me.
Comment 15 Jiri Popelka 2012-02-19 11:40:19 EST
Well, even the Discover, Offer, Request, ACK messages in the packet capture you sent to me look good. So when does the server exit ? After it sends the ACK ?

(In reply to comment #14)
> The patched binary works for me.

Great news, thanks.
Comment 16 John Levon 2012-02-19 12:23:37 EST
I'm not sure how to answer your question, I don't know when the server exits beyond "very soon after".
Comment 17 Jiri Popelka 2012-02-20 10:07:22 EST
Created attachment 564453 [details]
John's packet dump (DHCP messages only).

I went through the packet dump once more but haven't been able to find anything strange (I was hoping to see some big values in some time option or something like that).

I set up a server/client to be as much as possible to John's server/client.
I tried to reproduce the problem here but with no luck.

So although we don't have a reproducer, we know where in the code the problem is, that it's a security problem (DoS) and we have a patch that John confirmed as fixing it.

To security response team:
What's the next step ? Can I report this upstream (security-officer@isc.org) ?
Comment 18 Jiri Popelka 2012-02-20 10:09:26 EST
Created attachment 564454 [details]
final patch

I just removed the debug messages, otherwise it's the same as the patch from comment #10.
Comment 19 Tomas Hoger 2012-02-21 05:37:07 EST
(In reply to comment #17)
> What's the next step ? Can I report this upstream (security-officer@isc.org) ?

Yes, you can notify upstream.  Please CC s-r-t@ on your report.  TY!
Comment 20 Jiri Popelka 2012-02-23 06:33:37 EST
So, we have a reproducer. In my case the trick is in
 default-lease-time infinite;
 max-lease-time infinite;
and then 
1) let some client get a lease
2) restart the server
and now the first client's request will send the server down.

So it's probably not a security problem after all.

However upstream is asking for some more info.
I already answered what I know and will re-send the questions directly to John
because I don't think we need to use this ticket as an intermediary.
Comment 21 Jiri Popelka 2012-07-25 21:27:57 EDT
To security-response-team:

Please remove the security flag (I'm not able to do that), see bug #796459, comment #8.
Comment 22 Pavel Šimerda (pavlix) 2012-07-26 03:47:49 EDT
*** Bug 843185 has been marked as a duplicate of this bug. ***
Comment 23 Pavel Šimerda (pavlix) 2012-07-26 03:50:41 EDT
I confirm DHCP server now DHCPACK's the lease and continues to run with:

Jul 26 09:43:02 router dhcpd: Timeout requested too large reducing to 2^^32-10
Jul 26 09:43:02 router dhcpd: DHCPREQUEST for 192.168.25.10 from 52:54:00:eb:e9:fb (station) via eth1
Jul 26 09:43:02 router dhcpd: DHCPACK on 192.168.25.10 to 52:54:00:eb:e9:fb (station) via eth1

Next dhclient run just:

Jul 26 09:44:29 router dhcpd: DHCPREQUEST for 192.168.25.10 from 52:54:00:eb:e9:fb (station) via eth1
Jul 26 09:44:29 router dhcpd: DHCPACK on 192.168.25.10 to 52:54:00:eb:e9:fb (station) via eth1

Using http://koji.fedoraproject.org/koji/taskinfo?taskID=4330107 suggested by jpopelka.
Comment 24 Dan Williams 2012-07-26 22:41:28 EDT
Created attachment 600673 [details]
Simpler alternate patch for 64-bit interval calculation bug (see also #662254)
Comment 25 Fedora Update System 2012-07-27 04:45:20 EDT
dhcp-4.2.4-9.P1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/FEDORA-2012-11079/dhcp-4.2.4-9.P1.fc17
Comment 26 Fedora Update System 2012-07-27 04:48:31 EDT
dhcp-4.2.3-11.P2.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/FEDORA-2012-11110/dhcp-4.2.3-11.P2.fc16
Comment 27 Fedora Update System 2012-08-01 14:28:51 EDT
dhcp-4.2.4-9.P1.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 28 Fedora Update System 2012-08-06 03:51:02 EDT
dhcp-4.2.3-11.P2.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.