Bug 753642 - Strange RTC/UTC time manipulations at boot, seen symptom is first-born DHCP IP address sacrificed
Strange RTC/UTC time manipulations at boot, seen symptom is first-born DHCP I...
Status: CLOSED DUPLICATE of bug 816752
Product: Fedora
Classification: Fedora
Component: systemd (Show other bugs)
rawhide
All Unspecified
unspecified Severity medium
: ---
: ---
Assigned To: systemd-maint
Fedora Extras Quality Assurance
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-13 17:45 EST by Thomas L. Shinnick
Modified: 2013-01-14 17:14 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-01-14 17:14:06 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Thomas L. Shinnick 2011-11-13 17:45:01 EST
Description of problem:

Noticed symptom was that the initial DHCP lease IP address was being 
thrown away after every reboot, within 15 minutes of booting.  Every 
ssh session that was too quick would get disconnected and hosts file 
would have to be updated for new IP address.  No further IP address 
changes would occur until after next reboot.

After a couple days of characterizing this, I feel this is _not_ a 
DHCP fault, but rather something wonky about how the clock is set up 
by initial installation.  

The workaround is non-obvious.  If one runs system-config-date and
changes the timezone, such as from CST to EST and then back again,
any further reboots do *not* have the problem.  Something is being
done by the config app that is not being done by install.

After all the discussion below, I come to the conclusion:
  I think install is not correctly implementing "System clock uses UTC".


Version-Release number of selected component (if applicable):
Fedora 16 fresh installation


How reproducible:
Every time (5+ installs now)


Steps to Reproduce:
1. Install new F16, selecting a western timezone (EST,CST), 
   and leaving the defaults set to "System clock uses UTC"
2. On first and subsequent boots, logged time is seen to 
   change by several hours-worth during boot.  This causes 
   dhclient (and user) grief.
3. Run config date/time app and change timezones, such as
   CST->EST, or EST->CST, or even CST->EST->CST
4. Any further reboots do not demonstrate problem.  Initial
   logged time is correct and does not change drastically.
  

Actual results:
Every reboot causes a new DHCP lease to be acquired, discarding
any older leases, withing 15 minutes after rebooting.  IP address 
changes once for every reboot.  Quite irritating.


Expected results:
DHCP lease is _renewed_ during each reboot so that IP address 
does not change.  This is the behavior seen with F15.  This is
the behavior seen after the workaround of manually setting 
timezone using system-config-date.


Additional info:

Running under VMware Workstation 6.5 with Windows host.  (Hmm, 
such flexibility has cost me two+ days, given the nice ability
to keep retrying and characterizing - maybe not worth it....?)

On first boot, DHCP writes lease information into file
    /var/lib/dhclient/dhclient-5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03-eth0.lease
including timestamps of 'when' the lease information might or 
should expire, e.g. lines like:
      fixed-address 192.168.57.166;
      option dhcp-lease-time 1800;
      renew 0 2011/11/13 14:05:39;
      rebind 0 2011/11/13 14:19:15;
      expire 0 2011/11/13 14:23:00;
Later and on subsequent reboots it reads these timestamps and 
apparently gives priority to this info over and above whether the 
DHCP server says an IP address is okay.  If an expiration time
has "gone past" then the lease is thrown away and new one requested.

The problem arises in that dhclient first writes/reads the timestamps
*before* the clock is drastically changed.  With initial apparent 
boot time of 08:53, it will write out timestamps such as the above.

At a later point in boot processing chronyd suddenly decides the 
clock is off by hours and leaps to another time, e.g.
    Nov 13 08:54:44 tlsf16c chronyd[963]: chronyd version 1.26-20110831gitb088b7 starting
    Nov 13 08:54:44 tlsf16c chronyd[963]: Linux kernel major=3 minor=1 patch=0
    Nov 13 08:54:44 tlsf16c chronyd[963]: hz=100 shift_hz=7 freq_scale=1.00000000 nominal_tick=10000 slew_delta_tick=833 max_tick_bias=1000
    Nov 13 08:54:45 tlsf16c systemd[1]: Reloading.
    Nov 13 08:54:49 tlsf16c chronyd[963]: Selected source 67.18.187.111
    Nov 13 08:54:49 tlsf16c chronyd[963]: System clock wrong by 21598.511303 seconds, adjustment started
    Nov 13 14:54:48 tlsf16c chronyd[963]: System clock was stepped by 21598.511 seconds
    Nov 13 14:54:52 tlsf16c firstboot[787]: error reading information on service smolt: No such file or directory

After boot is done and after some more minutes, say 10 minutes, 
dhclient checks on the state of DHCP leases.  It finds that the 
lease is now 'stale' as the expiration time is long past 
(2011/11/13 14:23:00 vs current time of 15:05) and discards the
old IP address and requests a new one.
    Nov 13 15:05:38 tlsf16c NetworkManager[703]: <info> (eth0): DHCPv4 state changed bound -> expire
    Nov 13 15:05:38 tlsf16c NetworkManager[703]: <info> (eth0): DHCPv4 state changed expire -> preinit
    Nov 13 15:05:45 tlsf16c NetworkManager[703]: <info> (eth0): DHCPv4 state changed preinit -> bound
    Nov 13 15:05:45 tlsf16c NetworkManager[703]: bound to 192.168.57.167 -- renewal in 832 seconds.
    Nov 13 15:05:45 tlsf16c avahi-daemon[705]: Withdrawing address record for 192.168.57.166 on eth0.
    Nov 13 15:05:45 tlsf16c avahi-daemon[705]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.57.166.
    Nov 13 15:05:45 tlsf16c avahi-daemon[705]: Interface eth0.IPv4 no longer relevant for mDNS.
    Nov 13 15:05:45 tlsf16c avahi-daemon[705]: Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.57.167.
    Nov 13 15:05:45 tlsf16c avahi-daemon[705]: New relevant interface eth0.IPv4 for mDNS.
    Nov 13 15:05:45 tlsf16c avahi-daemon[705]: Registering new address record for 192.168.57.167 on eth0.IPv4.

It is the drastic time change between initial lease info file access and the second access that confuses dhclient.

I have various captured snippets of /var/log/messages from the
various test installations.  One bit stood out finally, the
boot-time log messages noting RTC clock values:
    Nov 13 08:52:53 tlsf16c kernel: [    0.161296] RTC time: 13:52:30, date: 11/13/11
    Nov 13 09:35:13 tlsf16c kernel: [    0.145052] RTC time: 14:34:37, date: 11/13/11
    Nov 13 15:09:53 tlsf16c kernel: [    0.081239] RTC time: 21:09:29, date: 11/13/11
The first is during initial boot.  The second during first reboot.
The third is the second reboot after I had used system-config-date 
to manually change timezones. 

In the first and second, the RTC is set to _correct_ local time, 
and logged times are... a back-formation assuming RTC is UTC time?

In the third, after timezone set manually, the RTC is apparently
set to UTC time, and the logged time is correct local time.

Also note that in first and second, installation-set timezone was
EST -5, and so logged time was RTC-5 hours, yet RTC was correct
local time for *CST*.  Only in the third instance do we see the 
correct time and delta for CST.  Very strange.

I think install is not correctly implementing "System clock uses UTC".


Possibly related is 
    https://bugzilla.redhat.com/show_bug.cgi?id=749516
    "kcm_clock (kcontrol/dateandtime) doesn't set persistent date"
which is impressive for the 45 comments and the arguing back and forth
about what config apps should do, libs should do, install should do, 
yeesh!  And again muttering about UTC....

At one point in those comments the assertion is made that the 
system-config-date app is doing the correct thing and that that should 
be the reference point for correctness.  If so, then initial installation 
for F16 is *not* doing the "correct thing".  (Whatever that is)
Comment 1 Miroslav Lichvar 2011-11-14 07:11:07 EST
This looks like system-config-date didn't call hwclock --systohc after the installation, possibly because NTP wasn't synced yet.

Can you please verify that calling "hwclock --systohc" manually fixes the problem?

In F15 and before it used to be called on shutdown, but it's not now and it needs to be called whenever the offset between the local and UTC time changes, see bug #750883.
Comment 2 Thomas L. Shinnick 2011-11-14 12:57:36 EST
Thank you for the suggested test.  It does change the situation to the desired bahavior.  

The change where someone dropped the setting of RTC at shutdown time is apparently the source of my problem.  And it *is* a problem.  Drastic 6 hour changes in clock time during boot processing certainly massively confuses components needing/comparing timestamps during boot.


Installed a new F16 selecting timezone Chicago (CST).  

  On first boot,
    Nov 14 04:46:25 tls16d kernel: imklog 5.8.5, log source = /proc/kmsg started.
    Nov 14 04:46:25 tls16d rsyslogd: [origin software="rsyslogd" swVersion="5.8.5" x-pid="719" x-info="http://www.rsyslog.com"] sta
    received new DHCP lease  .169
    start out as 04:52, jumps to 10:52
    Nov 14 04:52:24 tls16d chronyd[955]: chronyd version 1.26-20110831gitb088b7 starting
    Nov 14 04:52:24 tls16d chronyd[955]: Linux kernel major=3 minor=1 patch=1
    Nov 14 04:52:24 tls16d chronyd[955]: hz=100 shift_hz=7 freq_scale=1.00000000 nominal_tick=10000 slew_delta_tick=833 max_tick_bi
    Nov 14 04:52:30 tls16d chronyd[955]: Selected source 67.228.223.170
    Nov 14 04:52:30 tls16d chronyd[955]: System clock wrong by 21600.682082 seconds, adjustment started
    Nov 14 10:52:31 tls16d chronyd[955]: System clock was stepped by 21600.682 seconds
    still .169 after 12 minutes
    changes to .170 after 14 minutes
    sudo hwclock --show  
    Mon 14 Nov 2011 05:03:14 AM CST  -1.047933 seconds
>>> sudo hwclock --systohc
    date
    Mon Nov 14 11:03:50 CST 2011
    sudo hwclock --show
    Mon 14 Nov 2011 11:03:58 AM CST  -1.044210 seconds
    this is the correct current localtime
    restarted

  On second boot
    Nov 14 11:07:08 tls16d kernel: imklog 5.8.5, log source = /proc/kmsg started.
    Nov 14 11:07:08 tls16d rsyslogd: [origin software="rsyslogd" swVersion="5.8.5" x-pid="716" x-info="http://www.rsyslog.com"] sta
    keeps previous DHCP lease of .170
    Nov 14 11:07:23 tls16d chronyd[747]: Selected source 208.94.240.2
    Nov 14 11:07:23 tls16d chronyd[747]: System clock wrong by -0.739594 seconds, adjustment started
    date
    Mon Nov 14 11:08:10 CST 2011
    sudo hwclock --show
    Mon 14 Nov 2011 11:08:22 AM CST  -1.046785 seconds
    this is the correct localtime
    no sudden jumps later
    no throwing away DHCP leases  (bliss)


Again, the selections I made during installation were
    selecting timezone Chicago and defaulting to 
          "System clock uses UTC"
    when install moves the second phase, I select
          "Synchronize date and time over the network"
    defaulting to the x.fedora.pool.ntp.org domain names

If there is nothing wrong with me making those selections, then there
is something wrong with the installation, as the RTC is left unset,
and my problems with DHCP and the other problems noted by other 
people then occur after reboots.

I can't speak to the many scenarios cited in those other bug reports,
such as dual-boots and time zone daylight/standard changes.  This is
a much simpler case, I think, but which is not handled correctly.
Comment 3 Albert Strasheim 2011-11-25 07:59:26 EST
To some extent, this has been caused by systemd dropping hwclock-save.service as described here:

http://lists.freedesktop.org/archives/systemd-devel/2011-May/002526.html

If this had still been in place, the problems described here would at least disappear with a reboot. But that does start us down a slippery slope...
Comment 4 Steve Tyler 2012-10-06 09:32:52 EDT
(In reply to comment #3)
> To some extent, this has been caused by systemd dropping
> hwclock-save.service as described here:
> 
> http://lists.freedesktop.org/archives/systemd-devel/2011-May/002526.html
> 
> If this had still been in place, the problems described here would at least
> disappear with a reboot. But that does start us down a slippery slope...

Bug 816752, Comment 50 proposes adding an hwclock.service:
Bug 816752 - systemd v28 changes indirectly break date and ntpdate
Comment 5 Steve Tyler 2012-12-01 09:38:17 EST
hwclock reads /etc/adjtime, so what this does is ambiguous without knowing what is in /etc/adjtime:

# hwclock --systohc

The --noadjfile option disables reading /etc/adjtime:

# hwclock --systohc --noadjfile ---utc
# hwclock --systohc --noadjfile --localtime

The RTC can be read with:
$ cat /proc/driver/rtc
Comment 6 Nils Philippsen 2012-12-10 10:50:10 EST
This doesn't seem to be an issue with system-config-date (it calls hwclock), right? I'll change the component to systemd on account of that this needs to be dealt with at one place or another and systemd seems to be it right now (cf. comment #3, comment #4).
Comment 7 Steve Tyler 2012-12-10 12:20:30 EST
So when does the RTC get _initialized_ then?
Comment 8 Thomas L. Shinnick 2012-12-10 13:07:42 EST
(In reply to comment #6)
> This doesn't seem to be an issue with system-config-date (it calls hwclock),
> right? I'll change the component to systemd on account of that this needs to
> be dealt with at one place or another and systemd seems to be it right now
> (cf. comment #3, comment #4).

Perhaps it would be best to stay in sync with Bug #816752 , which is currently set to "Component: distribution" after my hissy fit there.

Is there a way to link this report to 816752, as that one seems to be productive, addresses the same problem, and sorta supersedes this report?
Comment 9 Steve Tyler 2012-12-10 13:58:56 EST
You can close this bug as a duplicate of Bug 816752, but I'm not sure that this is indeed a duplicate. I haven't been able to reproduce the problems you are reporting with F16 in a VM with the VM RTC set to UTC or local time. Here is what I already had written up:

(In reply to comment #0)
...
> Running under VMware Workstation 6.5 with Windows host.  (Hmm, 
...

Do you have your VM configured to start with the RTC set to UTC or local time?

With qemu, there is an option to set the RTC at startup:
-rtc base=localtime

Also, what hwclock shows depends on what is in /etc/adjtime.
This shows the RTC without any time zone offsets to confuse matters:
$ cat /proc/driver/rtc

BTW, what "System clock uses UTC" configures is the time zone of the RTC. That is recorded in /etc/adjtime as "UTC" or "LOCAL". The wording in system-config-date is wrong. The system clock is not the same as the RTC (aka hwclock). See "man adjtimex".

Last, in F18 there are problems when the RTC is on local time:

Bug 870535 - Regression in handling of localtime RTC: timezone offset applied twice 
(This appears to be fixed with systemd-195-10.fc18.)

Bug 881403 - Anaconda doesn't correctly infer if system clock shows UTC or local time
Comment 10 Lennart Poettering 2013-01-14 17:14:06 EST

*** This bug has been marked as a duplicate of bug 816752 ***

Note You need to log in before you can comment on or make changes to this bug.