Bug 1780165 - VM clocks do not resync quickly after being suspended
Summary: VM clocks do not resync quickly after being suspended
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.4
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.5.0
Assignee: Sohan Kunkerkar
QA Contact: Micah Abbott
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-05 14:22 UTC by Ryan Phillips
Modified: 2020-07-13 17:13 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:12:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1800901 0 medium CLOSED Make chrony use NTP settings from DHCP 2023-09-14 05:52:09 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:13:05 UTC

Description Ryan Phillips 2019-12-05 14:22:21 UTC
Description of problem:
I've been tracking down a bug and it has lead me to a code path where timestamps are being sorted. The use case I'm looking at is a suspended cluster. 

Chronyd takes quite a bit of time to resync (VM was suspended for 20 minutes, and it has not caught back up to real time after 1.5 hours :/)

chronyd can speed up or can slow the time by design. I suggest we reconfigure chronyd with `makestep 1 -1` to cause VMs to resync the time instantaneously. 

Ref: 
https://chrony.tuxfamily.org/faq.html#_is_code_chronyd_code_allowed_to_step_the_system_clock (edited) 

Version-Release number of selected component (if applicable):
4.x

How reproducible:
Every time

Steps to Reproduce:
1. Install a VM based cluster (AWS, or similar)
2. Pause the VMs for a specific duration (20 minutes)
3. Resume the cluster

Actual results:
Note the time on the VM taking a significant amount of time to get back to 'current time.'


Expected results:
The VM's time being brought up to date instantaneously would likely be better for the system. Many things rely on a current date/time: SSL certificate auth, metrics, and many other subsystems rely on a good time.

Additional info:

Comment 1 Colin Walters 2019-12-05 14:40:36 UTC
I'd prefer discussing this in https://github.com/openshift/machine-config-operator/issues/629 actually but it's also OK by me to keep this open if we want to target this more to "should RHCOS special case chrony for VMs".

Comment 3 Micah Abbott 2020-02-26 20:09:55 UTC
Interestingly, there is another request to change the `chrony` settings to use `makestep 1 -1` in BZ#1800901

It might be worthwhile to deviate from the default RHEL settings here.  Re-targeting for 4.5.

Comment 4 Colin Walters 2020-02-28 15:35:01 UTC
> It might be worthwhile to deviate from the default RHEL settings here.

Let's please not jump straight to "let's just patch RHCOS" because it undercuts the message that RHCOS *is* RHEL.  Further if we don't consider doing this in FCOS first too that increases the drift.

I think this is much more of a "should chrony behave differently in VMs" question.  Or actually, what is the rationale for the current chrony default at all?

Comment 5 Colin Walters 2020-02-28 15:42:50 UTC
Or even more broadly, does it make sense for us to run chrony *at all* in virtualized cases where the hypervisor is providing us an accurate wall clock?  This gets into platform specifics, but when I investigated this a bit it looked like kvm-clock gives us an accurate wall clock for example.

Comment 6 Miroslav Lichvar 2020-03-02 07:41:47 UTC
As I understand it KVM quests need an NTP client or something else running to synchronize their clock to the host or other clocks in network. (The most accurate synchronization is possible with the ptp_kvm module, which provides a virtual PTP clock and which can be used as a reference clock by chronyd.)

If suspending and resuming a guest causes a significant step in the offset of the clock, an infinite makestep may need to be allowed. The reason why it is not enabled by default is that backward steps may break some applications and there are also some security implications (MITM attackers can step the clock for the whole time the system is running and not just the boot).

How large is the offset after resume? Any chance it is caused by incorrect LOCAL/UTC setting of the RTC? I think fixing that would be preferred over enabling makestep.

Comment 9 Colin Walters 2020-05-13 18:18:04 UTC
We're updating the chrony config to use platform-specific providers here:
https://github.com/coreos/fedora-coreos-config/pull/393

If we're going to do any override of the config, it could make sense to do there too.

> The reason why it is not enabled by default is that backward steps may break some applications and there are also some security implications (MITM attackers can step the clock for the whole time the system is running and not just the boot).

One thing related to the above is that if we're using the link-local NTP server, there can't be any MITM.  And similarly on Azure, there's a virtual hardware clock.

Maybe chrony should gain some sort of "trusted server" flag that enables `makestep 1 -1`?  But in the short term we could consider doing it there by default.

This doesn't help libvirt though.

Comment 10 Colin Walters 2020-05-18 17:32:24 UTC
Hi Miroslav, (ab)using this bug as a "tangentially related" conversation forum:

> (The most accurate synchronization is possible with the ptp_kvm module, which provides a virtual PTP clock and which can be used as a reference clock by chronyd.)

I did some searching on that:
https://github.com/coreos/fedora-coreos-config/pull/393#issuecomment-630311876

I guess my question is - are there any downsides to that?  I can't think of any myself.  Why wouldn't we just exclusively use ptp_kvm if we detect ConditionVirtualization=kvm (in systemd terms)?  Are there e.g. known bugs in it?  Could it be turned off or not available on e.g. OpenStack?

Going forward, should the https://github.com/coreos/fedora-coreos-config/blob/ebbc3833c52f9392298b6f24c34a6dfc8f4222a1/overlay.d/20platform-chrony/usr/lib/systemd/system-generators/coreos-platform-chrony file become a "best practices for chrony configuration" or would it make sense for some of this to live in chrony itself?

Comment 11 Miroslav Lichvar 2020-05-19 07:29:58 UTC
I don't think the ptp_kvm module is guaranteed to work. It's currently specific to the x86 arch and it works only if the host is using the tsc clocksource. I'm not sure how exactly it works in OpenStack.

There are few downsides to using the PHC refclock for guest synchronization. One is that it relies on the host to be properly  synchronized. If you manage both, that's ok. There is also an issue that the NTP error estimates (root delay and root dispersion) are lost in the NTP->PHC->refclock conversion, so the guests underestimate the maximum error of their clock, which might be a problem in some applications.

Comment 15 Micah Abbott 2020-05-27 20:19:16 UTC
https://gitlab.cee.redhat.com/coreos/redhat-coreos/merge_requests/966

Merged today.  Will appear in 4.5 builds soon.

Comment 20 Micah Abbott 2020-06-18 18:28:19 UTC
Verified with 4.5.0-rc1

```
$ oc get clusterversion                                                 
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS                                                                                                                                                                                                                             
version   4.5.0-rc.1   True        False         25m     Cluster version is 4.5.0-rc.1                                                        
$ oc get nodes                                                          
NAME                                         STATUS   ROLES    AGE   VERSION                                                                  
ip-10-0-150-181.us-west-2.compute.internal   Ready    master   45m   v1.18.3+a637491                                                          
ip-10-0-151-102.us-west-2.compute.internal   Ready    worker   34m   v1.18.3+a637491                                                                                                                                                                                                        
ip-10-0-167-40.us-west-2.compute.internal    Ready    worker   35m   v1.18.3+a637491                                                                                                                                                                                                        
ip-10-0-188-101.us-west-2.compute.internal   Ready    master   45m   v1.18.3+a637491            
ip-10-0-207-15.us-west-2.compute.internal    Ready    worker   35m   v1.18.3+a637491                                                          
ip-10-0-210-125.us-west-2.compute.internal   Ready    master   45m   v1.18.3+a637491                                                          
```

Confirm `makestep` change in chrony generator:

```
$ oc debug node/ip-10-0-151-102.us-west-2.compute.internal              
Starting pod/ip-10-0-151-102us-west-2computeinternal-debug ...                                                                                
To use host binaries, run `chroot /host`                                                                                                      
Pod IP: 10.0.151.102                                                                                                                          
If you don't see a command prompt, try pressing enter.                                                                                        
sh-4.2# chroot /host                                                       
sh-4.4# cat /usr/lib/systemd/system-generators/coreos-platform-chrony | grep -C 5 makestep
    echo "$self: /etc/chrony.conf is modified; not changing the default"
    exit 0
fi

(echo "# Generated by $self - do not edit directly" 
 sed -e s,'^makestep,#makestep,' -e s,'^pool,#pool,' < /etc/chrony.conf 
cat <<EOF

# Allow the system clock step on any clock update. 
# It will avoid the time resynchronization issue when VMs are resumed from suspend.
# See https://bugzilla.redhat.com/show_bug.cgi?id=1780165 for more information.
makestep 1.0 -1

EOF
) > "${confpath}"
case "${platform}" in
    azure) 

sh-4.4# chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample                
===============================================================================
^* 169.254.169.123               3   4   377     5  -2674ns[-4030ns] +/-  402us
sh-4.4# date
Thu Jun 18 17:01:43 UTC 2020
```

Paused that node via AWS console (really stopped) for 20m:

```
$ oc get nodes
NAME                                         STATUS     ROLES    AGE    VERSION
ip-10-0-150-181.us-west-2.compute.internal   Ready      master   114m   v1.18.3+a637491
ip-10-0-151-102.us-west-2.compute.internal   NotReady   worker   104m   v1.18.3+a637491
ip-10-0-167-40.us-west-2.compute.internal    Ready      worker   104m   v1.18.3+a637491
ip-10-0-188-101.us-west-2.compute.internal   Ready      master   114m   v1.18.3+a637491
ip-10-0-207-15.us-west-2.compute.internal    Ready      worker   104m   v1.18.3+a637491
ip-10-0-210-125.us-west-2.compute.internal   Ready      master   114m   v1.18.3+a637491
```

Restarted node from AWS console and logged in as soon as it was Ready again:


```
$ oc debug node/ip-10-0-151-102.us-west-2.compute.internal
Starting pod/ip-10-0-151-102us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.151.102
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# date
Thu Jun 18 18:23:08 UTC 2020
sh-4.4# chronyc sources 
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^* 169.254.169.123               3   4    37     6    -86ns[ +362us] +/-  410us

sh-4.4# journalctl -b -u chronyd   
-- Logs begin at Thu 2020-06-18 16:15:02 UTC, end at Thu 2020-06-18 18:27:27 UTC. --
Jun 18 18:22:48 localhost systemd[1]: Starting NTP client/server...
Jun 18 18:22:48 localhost chronyd[1237]: chronyd version 3.5 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG)
Jun 18 18:22:48 localhost chronyd[1237]: Frequency -21.749 +/- 0.135 ppm read from /var/lib/chrony/drift
Jun 18 18:22:48 localhost chronyd[1237]: Using right/UTC timezone to obtain leap second data
Jun 18 18:22:48 localhost systemd[1]: Started NTP client/server.
Jun 18 18:22:55 ip-10-0-151-102 chronyd[1237]: Selected source 169.254.169.123
Jun 18 18:22:55 ip-10-0-151-102 chronyd[1237]: System clock TAI offset set to 37 seconds

```

By all accounts, it appears the VM gets its time synced nearly immediately.

Comment 22 errata-xmlrpc 2020-07-13 17:12:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.