Bug 1732883

Summary: dhclient should use monotonic time
Product: [Fedora] Fedora Reporter: Petr Menšík <pemensik>
Component: bindAssignee: Petr Menšík <pemensik>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: anon.amish, barry.scott, cra, deeptik, extras-qa, fdeutsch, jpopelka, lagarcia, mruprich, msehnout, mvermaes, ognian.tschakalov, pbrobinson, pemensik, psppsn96, pzhukov, thozza, vonsch, zdohnal
Target Milestone: ---Keywords: Reopened, Tracking
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: bind-9.11.9-1.fc30 bind-9.11.10-1.fc29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 916116 Environment:
Last Closed: 2019-08-15 18:09:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 916116    
Bug Blocks: 245418, 1269538    

Description Petr Menšík 2019-07-24 15:23:44 UTC
dhcp package on releases f29 and f30 needs support in bind-export-libs. Enable monotonic time in them.
+++ This bug was initially created as a clone of Bug #916116 +++

Description of problem:
The dhcpclient-script is first fetching an IP from a server, setting the IP and taking care of the lease.
Then - if the dhcp server provides it - the script is also updating the TZ/time. This can lead to situation where the script is first fetching and setting the ip, sets the timezone/time (massively different from current), drops the ip because the lease seems to be expired (b/c the tz/time changed), dhclient fetches another ip.

--- Additional comment from Jiri Popelka on 2013-03-01 16:00:33 CET ---

(In reply to comment #0)
> Then - if the dhcp server provides it - the script is also updating the  TZ/time.

Only if it's enabled with a setting in an ifcfg- file 
or the /etc/sysconfig/network file:  DHCP_TIME_OFFSET_SETS_TIMEZONE=yes

> This can lead to situation where the script is first fetching and
> setting the ip, sets the timezone/time (massively different from current),
> drops the ip because the lease seems to be expired (b/c the tz/time
> changed), dhclient fetches another ip.

Yes, something similar was once discussed in bug #631521 - marked as private so I'll post some snippets in next comment. Changing dhclient/dhcpd internals to use monotonic time is quite invasive change in my opinion, but I'll leave this request open. Does the behaviour you describe have any consequences/side effects ? (it seems quite harmless to me)

--- Additional comment from Jiri Popelka on 2013-03-01 16:01:51 CET ---

some snippets from (private) bug #631521:

 Dan Williams 2010-09-20 21:14:29 CEST
Looking at the code, it seems most timeouts are handled with the "add_timeout()" function from common/dispatch.c.  That appears to make heavy use of gettimeofday() to determine whether timeouts have elapsed.  And AFAIK gettimeofday() does depend on the timezone and current system clock.  That explains why it gives up the least when the timezone changes.  One suggestion was to use clock_gettime(CLOCK_MONOTONIC) instead, but there's a lot of gettimeofday() calls sprinkled around dhclient.

 Dan Williams 2010-10-18 19:48:46 CEST
The problem is that dhclient uses gettimeofday() as it's timekeeping mechanism.  That call depends on the current timezone and system clock time.  So, if dhclient gets a lease, it internally registers a timeout using the value of gettimeofday().  Now, when the system clock changes because the user changed the timezone or advanced the date, the value of gettimeofday() will advance by whatever amount the timezone or user change is for.  That is often an hour or more.

dhclient periodically calls gettimeofday() and loops through the internal timeouts to see if any have expired.  Of course, since the return value of gettimeofday() is now a few hours after the value it returned a second or two ago (due to the timezone change), the timeout is past-due and the lease is considered "expired".  That's depsite the fact that maybe only a minute or less has actually passed since the lease was acquired.

The problem (IMHO) is that dhclient does not use a monotonic (ie, immutable since system boot) clock to determine timeouts.  It's highly unlikely that the DHCP server changes its timezone or clock at the same time the client does, so the server still thinks the client has a valid lease for the next hour or whatever.  But the dhclient, because it's using gettimeofday(), thinks the lease has expired even though it has not.

 Dan Williams 2010-10-18 19:51:05 CEST
The point here being that I believe dhclient should track lease times as "absolute elapsed seconds since lease was acquired" without taking the timezone or system clock time into account, which is what gettimeofday() does.  And the way to do that is via clock_gettime(CLOCK_MONOTONIC) instead of gettimeofday().

 Jiri Popelka 2010-10-19 12:30:37 CEST
Yes, I had been thinking about
gettimeofday() vs. clock_gettime(CLOCK_MONOTONIC)
before I added the comment.

Yes, It's not good to use not monotonic clock, but I don't think
that it's a big deal that client could (in this specific situation)
consider the lease "expired" when it's actually not.
The client just moves into INIT state, sends new DHCPDISCOVER
and the server *should* tell the client how the things really are
(i.e. give the client "new" lease).
Yes, the client should be using monotonic time, but to me it seems like a large change that could break a lot of things anywhere else and I doubt that we are able to completely test that this change doesn't break any other mechanism in dhcpd/dhclient.

 Dan Williams 2010-10-21 00:27:50 CEST
Yeah, fair enough.  Seems like we should somehow figure out why that's happening here first before trying to change dhclient's code.

--- Additional comment from Fabian Deutsch on 2013-03-01 16:15:07 CET ---

(In reply to comment #1)
> (In reply to comment #0)
> > Then - if the dhcp server provides it - the script is also updating the  TZ/time.
> 
> Only if it's enabled with a setting in an ifcfg- file 
> or the /etc/sysconfig/network file:  DHCP_TIME_OFFSET_SETS_TIMEZONE=yes
> 
> > This can lead to situation where the script is first fetching and
> > setting the ip, sets the timezone/time (massively different from current),
> > drops the ip because the lease seems to be expired (b/c the tz/time
> > changed), dhclient fetches another ip.
> 
> Yes, something similar was once discussed in bug #631521 - marked as private
> so I'll post some snippets in next comment. Changing dhclient/dhcpd
> internals to use monotonic time is quite invasive change in my opinion, but
> I'll leave this request open. Does the behaviour you describe have any
> consequences/side effects ? (it seems quite harmless to me)

Thanks for the reference to the previous bug.

There were some consequences in our cases. A subsequent connection to another host got interrupted because the IP changed. For now we are working around this by waiting blindliy (sleep $N) a couple of seconds before we continue, give dhcpc some time to settle (settle reminds me of lvm ..).

I actually don't know how to handle the problem inside dhcpclient, I just wanted to raise this issue and see it discussed.
I see your point, that changing dhcpc code-base could have a wide effect on other tools.

--- Additional comment from Fedora End Of Life on 2013-12-21 12:44:43 CET ---

This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

--- Additional comment from Charles R. Anderson on 2014-09-16 00:02:29 CEST ---

Has anyone tested what happens if the clock goes backwards?  Will dhclient (or dhcpd) treat the timeout as immediately expired?

I ask because it is a common bug in various networking software that incorrectly uses a time-of-day clock rather than a monotonic clock to calculate timeouts.  Many times this has lead to infinite loops, causing flooding of the network or DHCP server with hundreds of packets per second due to the lack of any delay between the sending of packets.  dhcpcd had this issue at one point [1], and Plex Media server had an issue as well in relation to UPnP discovery packets [2][3][4].  I just don't want this issue swept under the rug because "everything is working fine now" when we know that dhclient's use of a time-of-day clock DOES cause some issues.

[1] http://forums.gentoo.org/viewtopic-t-700220.html
[2] https://github.com/RasPlex/RasPlex/issues/95
[3] https://forums.plex.tv/index.php/topic/120434-plex-freenas-plugin-taking-down-network-via-udp-flood/
[4] https://forums.plex.tv/index.php/topic/108362-plex-flooding-32414-and-32412/

--- Additional comment from Charles R. Anderson on 2014-11-13 15:37:57 CET ---



--- Additional comment from Charles R. Anderson on 2014-11-13 16:08:15 CET ---

Bug filed with ISC:

ISC-Bugs #37797: dhclient should use monotonic clock

--- Additional comment from Charles R. Anderson on 2015-02-05 14:25:35 CET ---

(In reply to Charles R. Anderson from comment #5)
> Has anyone tested what happens if the clock goes backwards?  Will dhclient
> (or dhcpd) treat the timeout as immediately expired?

Basically, yes.

Related: bug #1093803:

"7. roll back the system time by 2 days
8. observe no further dhcp requests, notice ipv4 address removal after ~2 minutes, disconnecting any active ssh sessions"

--- Additional comment from Jan Kurik on 2015-07-15 16:50:56 CEST ---

This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23

--- Additional comment from Fedora End Of Life on 2016-11-24 11:57:00 CET ---

This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

--- Additional comment from Fedora End Of Life on 2016-12-20 13:36:18 CET ---

Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

--- Additional comment from Fedora End Of Life on 2017-02-28 10:33:39 CET ---

This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

--- Additional comment from Fedora Admin XMLRPC Client on 2017-04-04 14:33:00 CEST ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from Pavel Zhukov on 2017-08-09 07:37:13 CEST ---



--- Additional comment from Ognian Tschakalov on 2018-02-20 13:42:17 CET ---

Take a look at https://paste.fedoraproject.org/paste/CYHzemxngoSNGzukQlJzag where you can see what happens on a raspberry pi (there is no RTC !!), DHCP lease is requested and established BEFORE system time is set; wich leads to immediate lease expiration and network connection interruption...
Any chance to get this sooner than later fixed?
Thanks
Ognian

--- Additional comment from Tomáš Hozza 🤓 on 2018-02-20 14:16:35 CET ---

(In reply to Ognian Tschakalov from comment #15)
> Take a look at https://paste.fedoraproject.org/paste/CYHzemxngoSNGzukQlJzag
> where you can see what happens on a raspberry pi (there is no RTC !!), DHCP
> lease is requested and established BEFORE system time is set; wich leads to
> immediate lease expiration and network connection interruption...
> Any chance to get this sooner than later fixed?
> Thanks
> Ognian

FYI, you can use chrony with "-s" option in order to have monotonous time even without RTC. You can add it to /etc/sysconfig/chronyd.

From chronyd man page:
This option will set the system clock from the computer’s real-time clock (RTC) or to the last modification time of the file specified by the driftfile directive. Real-time clocks are supported only on Linux.

If used in conjunction with the -r flag, chronyd will attempt to preserve the old samples after setting the system clock from the RTC. This can be used to allow chronyd to perform long term averaging of the gain or loss rate across system reboots, and is useful for systems with intermittent access to network that are shut down when not in use. For this to work well, it relies on chronyd having been able to determine accurate statistics for the difference between the RTC and system clock last time the computer was on.

If the last modification time of the drift file is later than both the current time and the RTC time, the system time will be set to it to restore the time when chronyd was previously stopped. This is useful on computers that have no RTC or the RTC is broken (e.g. it has no battery).

--- Additional comment from Pavel Zhukov on 2019-07-12 09:15:30 CEST ---

Hello,

The issue with backward jump and lost IP should be fixed in dhcp-4.4.1-14.fc31 which is in rawhide now. Testing and feedback are more than welcomed!

Dhclient (isclib actually) tries to do it best using either monotonic clock (CLOCK_BOOTTIME) if available or gettimeofday() based on saved timestamp (if clock_boottime is not defined to some old unix systems upstream supports) to detect *backward* time jump and sends request message to renew the lease. NOTE: the issue with forward jump and new IP acquired is not addressed as I've not found way to recalculate the lease inside of client without too many dirty hacks and global variables. 
As Jiri mentioned switching to clock_gettime() completely is too invasive and upstream will not accept this as ISC DHCP is mostly in maintenance mode now. Basically it requires reverting back from isclib timers to ones implemented in dhcp.

Comment 1 Fedora Update System 2019-07-31 16:46:54 UTC
FEDORA-2019-578f65f444 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-578f65f444

Comment 2 Fedora Update System 2019-07-31 16:48:25 UTC
FEDORA-2019-5da166a4ce has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-5da166a4ce

Comment 3 Fedora Update System 2019-08-01 03:28:44 UTC
bind-9.11.9-1.fc30, dhcp-4.3.6-36.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-578f65f444

Comment 4 Fedora Update System 2019-08-01 05:33:47 UTC
bind-9.11.9-1.fc29, dhcp-4.3.6-33.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-5da166a4ce

Comment 5 Fedora Update System 2019-08-15 18:09:09 UTC
bind-9.11.9-1.fc30, dhcp-4.3.6-36.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 6 Fedora Update System 2019-08-28 21:25:14 UTC
FEDORA-2019-d04f66e595 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-d04f66e595

Comment 7 Fedora Update System 2019-08-30 00:25:44 UTC
bind-9.11.10-1.fc29, bind-dyndb-ldap-11.1-19.fc29, dhcp-4.3.6-34.fc29, dnsperf-2.3.2-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-d04f66e595

Comment 8 Fedora Update System 2019-09-14 01:54:11 UTC
bind-9.11.10-1.fc29, bind-dyndb-ldap-11.1-19.fc29, dhcp-4.3.6-34.fc29, dnsperf-2.3.2-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.