587070 – dhclient up to 4 seconds faster

Bug 587070 - dhclient up to 4 seconds faster

Summary: dhclient up to 4 seconds faster

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	dhcp
Sub Component:
Version:	13
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Jiri Popelka
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-04-28 19:38 UTC by MarcH
Modified:	2010-05-22 10:26 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2010-05-22 10:26:07 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
gets rid of a useless wait (584 bytes, patch) 2010-04-28 19:38 UTC, MarcH	no flags	Details \| Diff
View All

Description MarcH 2010-04-28 19:38:25 UTC

Created attachment 409949 [details]
gets rid of a useless wait

Description of problem:

Before trying to get a lease, dhclient waits between 0 and 4 seconds for no good reason. In June 2006 the ISC promised to get rid of this wait but never did:
https://lists.isc.org/mailman/htdig/dhcp-users/2006-June/thread.html#928

The one-line patch attached simply gets rid of this wait.


Version-Release number of selected component (if applicable):
At least 3 and 4.


Steps to Reproduce:
1. Connect your network cable
2. Be patient
3. Be more patient

Comment 1 MarcH 2010-04-28 21:54:47 UTC

FWIW this patch has been filed in ISC's (hidden?) bug tracker: [ISC-Bugs #21219]

Comment 2 Jiri Popelka 2010-04-29 14:53:03 UTC

Thanks.
I know about that delay.
I fought it is reasonable,
but when Ted Lemon and David W. Hankins say it can be safely removed why not.
It really looks like a way to save some booting time :-)

RFC 3315 (DHCP for IPv6) defines delay before sending first Solicit, Confirm, Information-request message.
These delays are set to 1 second,
so compromise can be to use 1 second delay also for dhclient for IPv4.

Comment 3 Fedora Update System 2010-04-29 15:33:53 UTC

dhcp-4.1.1-20.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/dhcp-4.1.1-20.fc13

Comment 4 MarcH 2010-04-29 22:26:49 UTC

(In reply to comment #2)
> but when Ted Lemon and David W. Hankins say it can be safely
> removed why not.  It really looks like a way to save some
> booting time :-)

I think the only case where clients would really start running
dhclient at the same time is the case where they are all directly
linked to the same switch/router, and this switch (not the
clients) has a power outage. So this would never be more than a
few hundreds clients potentially synchronized.

> RFC 3315 (DHCP for IPv6) defines delay before sending first
> Solicit, Confirm, Information-request message.

Thanks for the reference.

> These delays are set to 1 second,

In the reference I read: "delayed by a RANDOM amount of time
BETWEEN 0 and 1 second".

> so compromise can be to use 1 second delay also for dhclient
  for IPv4.

Your current patch is either 0 or 1 second delay. And never in
between.


I just had a better look at the cur_time macro and realized it is
rounding to the previous second. So cur_time + random() % 1 in
your patch will make half the clients start as soon as they can,
and the other half start all synchronized on the next second.

What about this:

  tv.tv_sec  = cur_time + 1; /* convert lower rounding to upper rounding */
  tv.tv_usec = random() * 1000000; /* stagger clients in case of
  
This would add a random delay between 0 and 2 seconds: 0-1
seconds of (unfortunate) rounding + 0-1 seconds to de-synchronize
clients.

The "exact" RFC solution would be this:
  tv.tv_sec  = cur_tv.tv_sec
  tv.tv_usec = cur_tv.tv_usec + random() * 1000000;
  if (tv.tv_usec >= 1000000) {
      tv.tv_usec -= 1000000;
      tv.tv_sec++;
 }
But this would bypass the cur_time macro, which is here for
some reason I guess (see RELNOTES)

Comment 5 Jiri Popelka 2010-04-30 10:07:18 UTC

(In reply to comment #4)
> (In reply to comment #2)
> > but when Ted Lemon and David W. Hankins say it can be safely
> > removed why not.  It really looks like a way to save some
> > booting time :-)
> 
> I think the only case where clients would really start running
> dhclient at the same time is the case where they are all directly
> linked to the same switch/router, and this switch (not the
> clients) has a power outage. So this would never be more than a
> few hundreds clients potentially synchronized.
> 
When the switch/router has a power outage,
the clients do not (re)start their dhclient.
Client needs to contact server (switch/router) either in case
the client is (re)starting itself or in the case when the client
needs to renew its lease. So when the server (switch/router) has
power outage only clients that (re)start at that moment
or need to renew its lease at that moment notice it.

> > RFC 3315 (DHCP for IPv6) defines delay before sending first
> > Solicit, Confirm, Information-request message.
> 
> Thanks for the reference.
> 
> > These delays are set to 1 second,
> 
> In the reference I read: "delayed by a RANDOM amount of time
> BETWEEN 0 and 1 second".
> 
Sorry, I meant *max* delay(s).

> The "exact" RFC solution would be this:
>   tv.tv_sec  = cur_tv.tv_sec
>   tv.tv_usec = cur_tv.tv_usec + random() * 1000000;
>   if (tv.tv_usec >= 1000000) {
>       tv.tv_usec -= 1000000;
>       tv.tv_sec++;
>  }
> But this would bypass the cur_time macro, which is here for
> some reason I guess (see RELNOTES)    

If we want to leave there some delay,
we can make it the exact way as it is in dhc6.c:
  tv.tv_sec = cur_tv.tv_sec;
  tv.tv_usec = cur_tv.tv_usec;
  tv.tv_usec += (random() % 100) * 10000;
  if (tv.tv_usec >= 1000000) {
      tv.tv_sec += 1;
      tv.tv_usec -= 1000000;
  } 

But I think there's so much randomness,
that we can remove the delay at all.
So the patch will look like:
-    tv.tv_sec = cur_time + random() % 5;
+    tv.tv_sec = cur_time;

Comment 6 MarcH 2010-04-30 23:20:07 UTC

(In reply to comment #5)
> When the switch/router has a power outage,
> the clients do not (re)start their dhclient.

When the switch/router has a power outage, all clients running NetworkManager will all restart their dhclient (at the same time). This is (unfortunately) the default behaviour that with at least Fedora 10, 11 and 12. It's not dhclient's fault, but this is what happens.

More about this here:
http://thread.gmane.org/gmane.linux.network.networkmanager.devel/15570/

Comment 7 Derek Atkins 2010-05-05 12:42:35 UTC

Yeah, but that's a bug in NM and should be fixed in NM.  We shouldn't try to work around their bug here in dhclient.

Comment 8 MarcH 2010-05-10 08:39:05 UTC

(In reply to comment #7)
> Yeah, but that's a bug in NM and should be fixed in NM.  We shouldn't try to
> work around their bug here in dhclient.    

I do not think this is a bug but a feature that should be optional.

Suppose a switch has a power failure for a long time, longer than DHCP leases. Wouldn't you be happy that all notes linked to it are back in business as soon as the power is back on? (possibly hammering the DHCP server a bit).

You can easily find other use cases where this is a desired feature, obviously starting with a laptop. Granted, the "brief switch outage" is not one of them. Even there it is not so bad since NM's off/on will not harm existing connections (as opposed to Windows but I digress).

I guess any DHCP server would be able to support a few hundreds of simultaneous requests without problem anyway so this discussion is probably just for the record.

Comment 9 Fedora Update System 2010-05-22 01:50:00 UTC

dhcp-4.1.1-21.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 10 Jiri Popelka 2010-05-22 10:26:07 UTC

https://admin.fedoraproject.org/updates/dhcp-4.1.1-21.fc13

Note You need to log in before you can comment on or make changes to this bug.