Bug 916116 - dhclient should use monotonic time
Summary: dhclient should use monotonic time
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: dhcp
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pavel Zhukov
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1155719 1479593 (view as bug list)
Depends On:
Blocks: ARMTracker IoT 1732883
TreeView+ depends on / blocked
 
Reported: 2013-02-27 10:46 UTC by Fabian Deutsch
Modified: 2019-09-14 01:54 UTC (History)
10 users (show)

Fixed In Version: dhcp-4.3.6-36.fc30 dhcp-4.3.6-34.fc29
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1732883 (view as bug list)
Environment:
Last Closed: 2019-08-15 18:09:04 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Fabian Deutsch 2013-02-27 10:46:31 UTC
Description of problem:
The dhcpclient-script is first fetching an IP from a server, setting the IP and taking care of the lease.
Then - if the dhcp server provides it - the script is also updating the TZ/time. This can lead to situation where the script is first fetching and setting the ip, sets the timezone/time (massively different from current), drops the ip because the lease seems to be expired (b/c the tz/time changed), dhclient fetches another ip.

Comment 1 Jiri Popelka 2013-03-01 15:00:33 UTC
(In reply to comment #0)
> Then - if the dhcp server provides it - the script is also updating the  TZ/time.

Only if it's enabled with a setting in an ifcfg- file 
or the /etc/sysconfig/network file:  DHCP_TIME_OFFSET_SETS_TIMEZONE=yes

> This can lead to situation where the script is first fetching and
> setting the ip, sets the timezone/time (massively different from current),
> drops the ip because the lease seems to be expired (b/c the tz/time
> changed), dhclient fetches another ip.

Yes, something similar was once discussed in bug #631521 - marked as private so I'll post some snippets in next comment. Changing dhclient/dhcpd internals to use monotonic time is quite invasive change in my opinion, but I'll leave this request open. Does the behaviour you describe have any consequences/side effects ? (it seems quite harmless to me)

Comment 2 Jiri Popelka 2013-03-01 15:01:51 UTC
some snippets from (private) bug #631521:

 Dan Williams 2010-09-20 21:14:29 CEST
Looking at the code, it seems most timeouts are handled with the "add_timeout()" function from common/dispatch.c.  That appears to make heavy use of gettimeofday() to determine whether timeouts have elapsed.  And AFAIK gettimeofday() does depend on the timezone and current system clock.  That explains why it gives up the least when the timezone changes.  One suggestion was to use clock_gettime(CLOCK_MONOTONIC) instead, but there's a lot of gettimeofday() calls sprinkled around dhclient.

 Dan Williams 2010-10-18 19:48:46 CEST
The problem is that dhclient uses gettimeofday() as it's timekeeping mechanism.  That call depends on the current timezone and system clock time.  So, if dhclient gets a lease, it internally registers a timeout using the value of gettimeofday().  Now, when the system clock changes because the user changed the timezone or advanced the date, the value of gettimeofday() will advance by whatever amount the timezone or user change is for.  That is often an hour or more.

dhclient periodically calls gettimeofday() and loops through the internal timeouts to see if any have expired.  Of course, since the return value of gettimeofday() is now a few hours after the value it returned a second or two ago (due to the timezone change), the timeout is past-due and the lease is considered "expired".  That's depsite the fact that maybe only a minute or less has actually passed since the lease was acquired.

The problem (IMHO) is that dhclient does not use a monotonic (ie, immutable since system boot) clock to determine timeouts.  It's highly unlikely that the DHCP server changes its timezone or clock at the same time the client does, so the server still thinks the client has a valid lease for the next hour or whatever.  But the dhclient, because it's using gettimeofday(), thinks the lease has expired even though it has not.

 Dan Williams 2010-10-18 19:51:05 CEST
The point here being that I believe dhclient should track lease times as "absolute elapsed seconds since lease was acquired" without taking the timezone or system clock time into account, which is what gettimeofday() does.  And the way to do that is via clock_gettime(CLOCK_MONOTONIC) instead of gettimeofday().

 Jiri Popelka 2010-10-19 12:30:37 CEST
Yes, I had been thinking about
gettimeofday() vs. clock_gettime(CLOCK_MONOTONIC)
before I added the comment.

Yes, It's not good to use not monotonic clock, but I don't think
that it's a big deal that client could (in this specific situation)
consider the lease "expired" when it's actually not.
The client just moves into INIT state, sends new DHCPDISCOVER
and the server *should* tell the client how the things really are
(i.e. give the client "new" lease).
Yes, the client should be using monotonic time, but to me it seems like a large change that could break a lot of things anywhere else and I doubt that we are able to completely test that this change doesn't break any other mechanism in dhcpd/dhclient.

 Dan Williams 2010-10-21 00:27:50 CEST
Yeah, fair enough.  Seems like we should somehow figure out why that's happening here first before trying to change dhclient's code.

Comment 3 Fabian Deutsch 2013-03-01 15:15:07 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > Then - if the dhcp server provides it - the script is also updating the  TZ/time.
> 
> Only if it's enabled with a setting in an ifcfg- file 
> or the /etc/sysconfig/network file:  DHCP_TIME_OFFSET_SETS_TIMEZONE=yes
> 
> > This can lead to situation where the script is first fetching and
> > setting the ip, sets the timezone/time (massively different from current),
> > drops the ip because the lease seems to be expired (b/c the tz/time
> > changed), dhclient fetches another ip.
> 
> Yes, something similar was once discussed in bug #631521 - marked as private
> so I'll post some snippets in next comment. Changing dhclient/dhcpd
> internals to use monotonic time is quite invasive change in my opinion, but
> I'll leave this request open. Does the behaviour you describe have any
> consequences/side effects ? (it seems quite harmless to me)

Thanks for the reference to the previous bug.

There were some consequences in our cases. A subsequent connection to another host got interrupted because the IP changed. For now we are working around this by waiting blindliy (sleep $N) a couple of seconds before we continue, give dhcpc some time to settle (settle reminds me of lvm ..).

I actually don't know how to handle the problem inside dhcpclient, I just wanted to raise this issue and see it discussed.
I see your point, that changing dhcpc code-base could have a wide effect on other tools.

Comment 4 Fedora End Of Life 2013-12-21 11:44:43 UTC
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Charles R. Anderson 2014-09-15 22:02:29 UTC
Has anyone tested what happens if the clock goes backwards?  Will dhclient (or dhcpd) treat the timeout as immediately expired?

I ask because it is a common bug in various networking software that incorrectly uses a time-of-day clock rather than a monotonic clock to calculate timeouts.  Many times this has lead to infinite loops, causing flooding of the network or DHCP server with hundreds of packets per second due to the lack of any delay between the sending of packets.  dhcpcd had this issue at one point [1], and Plex Media server had an issue as well in relation to UPnP discovery packets [2][3][4].  I just don't want this issue swept under the rug because "everything is working fine now" when we know that dhclient's use of a time-of-day clock DOES cause some issues.

[1] http://forums.gentoo.org/viewtopic-t-700220.html
[2] https://github.com/RasPlex/RasPlex/issues/95
[3] https://forums.plex.tv/index.php/topic/120434-plex-freenas-plugin-taking-down-network-via-udp-flood/
[4] https://forums.plex.tv/index.php/topic/108362-plex-flooding-32414-and-32412/

Comment 6 Charles R. Anderson 2014-11-13 14:37:57 UTC
*** Bug 1155719 has been marked as a duplicate of this bug. ***

Comment 7 Charles R. Anderson 2014-11-13 15:08:15 UTC
Bug filed with ISC:

ISC-Bugs #37797: dhclient should use monotonic clock

Comment 8 Charles R. Anderson 2015-02-05 13:25:35 UTC
(In reply to Charles R. Anderson from comment #5)
> Has anyone tested what happens if the clock goes backwards?  Will dhclient
> (or dhcpd) treat the timeout as immediately expired?

Basically, yes.

Related: bug #1093803:

"7. roll back the system time by 2 days
8. observe no further dhcp requests, notice ipv4 address removal after ~2 minutes, disconnecting any active ssh sessions"

Comment 9 Jan Kurik 2015-07-15 14:50:56 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23

Comment 10 Fedora End Of Life 2016-11-24 10:57:00 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 11 Fedora End Of Life 2016-12-20 12:36:18 UTC
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 12 Fedora End Of Life 2017-02-28 09:33:39 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Comment 13 Fedora Admin XMLRPC Client 2017-04-04 12:33:00 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 14 Pavel Zhukov 2017-08-09 05:37:13 UTC
*** Bug 1479593 has been marked as a duplicate of this bug. ***

Comment 15 Ognian Tschakalov 2018-02-20 12:42:17 UTC
Take a look at https://paste.fedoraproject.org/paste/CYHzemxngoSNGzukQlJzag where you can see what happens on a raspberry pi (there is no RTC !!), DHCP lease is requested and established BEFORE system time is set; wich leads to immediate lease expiration and network connection interruption...
Any chance to get this sooner than later fixed?
Thanks
Ognian

Comment 16 Tomáš Hozza 2018-02-20 13:16:35 UTC
(In reply to Ognian Tschakalov from comment #15)
> Take a look at https://paste.fedoraproject.org/paste/CYHzemxngoSNGzukQlJzag
> where you can see what happens on a raspberry pi (there is no RTC !!), DHCP
> lease is requested and established BEFORE system time is set; wich leads to
> immediate lease expiration and network connection interruption...
> Any chance to get this sooner than later fixed?
> Thanks
> Ognian

FYI, you can use chrony with "-s" option in order to have monotonous time even without RTC. You can add it to /etc/sysconfig/chronyd.

From chronyd man page:
This option will set the system clock from the computer’s real-time clock (RTC) or to the last modification time of the file specified by the driftfile directive. Real-time clocks are supported only on Linux.

If used in conjunction with the -r flag, chronyd will attempt to preserve the old samples after setting the system clock from the RTC. This can be used to allow chronyd to perform long term averaging of the gain or loss rate across system reboots, and is useful for systems with intermittent access to network that are shut down when not in use. For this to work well, it relies on chronyd having been able to determine accurate statistics for the difference between the RTC and system clock last time the computer was on.

If the last modification time of the drift file is later than both the current time and the RTC time, the system time will be set to it to restore the time when chronyd was previously stopped. This is useful on computers that have no RTC or the RTC is broken (e.g. it has no battery).

Comment 17 Pavel Zhukov 2019-07-12 07:15:30 UTC
Hello,

The issue with backward jump and lost IP should be fixed in dhcp-4.4.1-14.fc31 which is in rawhide now. Testing and feedback are more than welcomed!

Dhclient (isclib actually) tries to do it best using either monotonic clock (CLOCK_BOOTTIME) if available or gettimeofday() based on saved timestamp (if clock_boottime is not defined to some old unix systems upstream supports) to detect *backward* time jump and sends request message to renew the lease. NOTE: the issue with forward jump and new IP acquired is not addressed as I've not found way to recalculate the lease inside of client without too many dirty hacks and global variables. 
As Jiri mentioned switching to clock_gettime() completely is too invasive and upstream will not accept this as ISC DHCP is mostly in maintenance mode now. Basically it requires reverting back from isclib timers to ones implemented in dhcp.

Comment 18 Petr Menšík 2019-07-25 14:19:39 UTC
This change would introduce dependency on bind headers modified. In case of f30 and f29, it would require now the most recent headers to build. Previous bind-export-libs does not provide ISC_R_TIMESHIFTED.

Comment 19 Fedora Update System 2019-07-31 16:46:53 UTC
FEDORA-2019-578f65f444 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-578f65f444

Comment 20 Fedora Update System 2019-07-31 16:48:27 UTC
FEDORA-2019-5da166a4ce has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-5da166a4ce

Comment 21 Fedora Update System 2019-08-01 03:28:43 UTC
bind-9.11.9-1.fc30, dhcp-4.3.6-36.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-578f65f444

Comment 22 Fedora Update System 2019-08-01 05:33:48 UTC
bind-9.11.9-1.fc29, dhcp-4.3.6-33.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-5da166a4ce

Comment 23 Fedora Update System 2019-08-15 18:09:04 UTC
bind-9.11.9-1.fc30, dhcp-4.3.6-36.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 24 Fedora Update System 2019-08-28 21:25:15 UTC
FEDORA-2019-d04f66e595 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-d04f66e595

Comment 25 Fedora Update System 2019-08-30 00:25:45 UTC
bind-9.11.10-1.fc29, bind-dyndb-ldap-11.1-19.fc29, dhcp-4.3.6-34.fc29, dnsperf-2.3.2-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-d04f66e595

Comment 26 Fedora Update System 2019-09-14 01:54:14 UTC
bind-9.11.10-1.fc29, bind-dyndb-ldap-11.1-19.fc29, dhcp-4.3.6-34.fc29, dnsperf-2.3.2-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.