Description of problem: I've been tracking down a bug and it has lead me to a code path where timestamps are being sorted. The use case I'm looking at is a suspended cluster. Chronyd takes quite a bit of time to resync (VM was suspended for 20 minutes, and it has not caught back up to real time after 1.5 hours :/) chronyd can speed up or can slow the time by design. I suggest we reconfigure chronyd with `makestep 1 -1` to cause VMs to resync the time instantaneously. Ref: https://chrony.tuxfamily.org/faq.html#_is_code_chronyd_code_allowed_to_step_the_system_clock (edited) Version-Release number of selected component (if applicable): 4.x How reproducible: Every time Steps to Reproduce: 1. Install a VM based cluster (AWS, or similar) 2. Pause the VMs for a specific duration (20 minutes) 3. Resume the cluster Actual results: Note the time on the VM taking a significant amount of time to get back to 'current time.' Expected results: The VM's time being brought up to date instantaneously would likely be better for the system. Many things rely on a current date/time: SSL certificate auth, metrics, and many other subsystems rely on a good time. Additional info:
I'd prefer discussing this in https://github.com/openshift/machine-config-operator/issues/629 actually but it's also OK by me to keep this open if we want to target this more to "should RHCOS special case chrony for VMs".
Interestingly, there is another request to change the `chrony` settings to use `makestep 1 -1` in BZ#1800901 It might be worthwhile to deviate from the default RHEL settings here. Re-targeting for 4.5.
> It might be worthwhile to deviate from the default RHEL settings here. Let's please not jump straight to "let's just patch RHCOS" because it undercuts the message that RHCOS *is* RHEL. Further if we don't consider doing this in FCOS first too that increases the drift. I think this is much more of a "should chrony behave differently in VMs" question. Or actually, what is the rationale for the current chrony default at all?
Or even more broadly, does it make sense for us to run chrony *at all* in virtualized cases where the hypervisor is providing us an accurate wall clock? This gets into platform specifics, but when I investigated this a bit it looked like kvm-clock gives us an accurate wall clock for example.
As I understand it KVM quests need an NTP client or something else running to synchronize their clock to the host or other clocks in network. (The most accurate synchronization is possible with the ptp_kvm module, which provides a virtual PTP clock and which can be used as a reference clock by chronyd.) If suspending and resuming a guest causes a significant step in the offset of the clock, an infinite makestep may need to be allowed. The reason why it is not enabled by default is that backward steps may break some applications and there are also some security implications (MITM attackers can step the clock for the whole time the system is running and not just the boot). How large is the offset after resume? Any chance it is caused by incorrect LOCAL/UTC setting of the RTC? I think fixing that would be preferred over enabling makestep.
We're updating the chrony config to use platform-specific providers here: https://github.com/coreos/fedora-coreos-config/pull/393 If we're going to do any override of the config, it could make sense to do there too. > The reason why it is not enabled by default is that backward steps may break some applications and there are also some security implications (MITM attackers can step the clock for the whole time the system is running and not just the boot). One thing related to the above is that if we're using the link-local NTP server, there can't be any MITM. And similarly on Azure, there's a virtual hardware clock. Maybe chrony should gain some sort of "trusted server" flag that enables `makestep 1 -1`? But in the short term we could consider doing it there by default. This doesn't help libvirt though.
Hi Miroslav, (ab)using this bug as a "tangentially related" conversation forum: > (The most accurate synchronization is possible with the ptp_kvm module, which provides a virtual PTP clock and which can be used as a reference clock by chronyd.) I did some searching on that: https://github.com/coreos/fedora-coreos-config/pull/393#issuecomment-630311876 I guess my question is - are there any downsides to that? I can't think of any myself. Why wouldn't we just exclusively use ptp_kvm if we detect ConditionVirtualization=kvm (in systemd terms)? Are there e.g. known bugs in it? Could it be turned off or not available on e.g. OpenStack? Going forward, should the https://github.com/coreos/fedora-coreos-config/blob/ebbc3833c52f9392298b6f24c34a6dfc8f4222a1/overlay.d/20platform-chrony/usr/lib/systemd/system-generators/coreos-platform-chrony file become a "best practices for chrony configuration" or would it make sense for some of this to live in chrony itself?
I don't think the ptp_kvm module is guaranteed to work. It's currently specific to the x86 arch and it works only if the host is using the tsc clocksource. I'm not sure how exactly it works in OpenStack. There are few downsides to using the PHC refclock for guest synchronization. One is that it relies on the host to be properly synchronized. If you manage both, that's ok. There is also an issue that the NTP error estimates (root delay and root dispersion) are lost in the NTP->PHC->refclock conversion, so the guests underestimate the maximum error of their clock, which might be a problem in some applications.
https://gitlab.cee.redhat.com/coreos/redhat-coreos/merge_requests/966 Merged today. Will appear in 4.5 builds soon.
Verified with 4.5.0-rc1 ``` $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-rc.1 True False 25m Cluster version is 4.5.0-rc.1 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-150-181.us-west-2.compute.internal Ready master 45m v1.18.3+a637491 ip-10-0-151-102.us-west-2.compute.internal Ready worker 34m v1.18.3+a637491 ip-10-0-167-40.us-west-2.compute.internal Ready worker 35m v1.18.3+a637491 ip-10-0-188-101.us-west-2.compute.internal Ready master 45m v1.18.3+a637491 ip-10-0-207-15.us-west-2.compute.internal Ready worker 35m v1.18.3+a637491 ip-10-0-210-125.us-west-2.compute.internal Ready master 45m v1.18.3+a637491 ``` Confirm `makestep` change in chrony generator: ``` $ oc debug node/ip-10-0-151-102.us-west-2.compute.internal Starting pod/ip-10-0-151-102us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.151.102 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# cat /usr/lib/systemd/system-generators/coreos-platform-chrony | grep -C 5 makestep echo "$self: /etc/chrony.conf is modified; not changing the default" exit 0 fi (echo "# Generated by $self - do not edit directly" sed -e s,'^makestep,#makestep,' -e s,'^pool,#pool,' < /etc/chrony.conf cat <<EOF # Allow the system clock step on any clock update. # It will avoid the time resynchronization issue when VMs are resumed from suspend. # See https://bugzilla.redhat.com/show_bug.cgi?id=1780165 for more information. makestep 1.0 -1 EOF ) > "${confpath}" case "${platform}" in azure) sh-4.4# chronyc sources 210 Number of sources = 1 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^* 169.254.169.123 3 4 377 5 -2674ns[-4030ns] +/- 402us sh-4.4# date Thu Jun 18 17:01:43 UTC 2020 ``` Paused that node via AWS console (really stopped) for 20m: ``` $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-150-181.us-west-2.compute.internal Ready master 114m v1.18.3+a637491 ip-10-0-151-102.us-west-2.compute.internal NotReady worker 104m v1.18.3+a637491 ip-10-0-167-40.us-west-2.compute.internal Ready worker 104m v1.18.3+a637491 ip-10-0-188-101.us-west-2.compute.internal Ready master 114m v1.18.3+a637491 ip-10-0-207-15.us-west-2.compute.internal Ready worker 104m v1.18.3+a637491 ip-10-0-210-125.us-west-2.compute.internal Ready master 114m v1.18.3+a637491 ``` Restarted node from AWS console and logged in as soon as it was Ready again: ``` $ oc debug node/ip-10-0-151-102.us-west-2.compute.internal Starting pod/ip-10-0-151-102us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.151.102 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# date Thu Jun 18 18:23:08 UTC 2020 sh-4.4# chronyc sources 210 Number of sources = 1 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^* 169.254.169.123 3 4 37 6 -86ns[ +362us] +/- 410us sh-4.4# journalctl -b -u chronyd -- Logs begin at Thu 2020-06-18 16:15:02 UTC, end at Thu 2020-06-18 18:27:27 UTC. -- Jun 18 18:22:48 localhost systemd[1]: Starting NTP client/server... Jun 18 18:22:48 localhost chronyd[1237]: chronyd version 3.5 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG) Jun 18 18:22:48 localhost chronyd[1237]: Frequency -21.749 +/- 0.135 ppm read from /var/lib/chrony/drift Jun 18 18:22:48 localhost chronyd[1237]: Using right/UTC timezone to obtain leap second data Jun 18 18:22:48 localhost systemd[1]: Started NTP client/server. Jun 18 18:22:55 ip-10-0-151-102 chronyd[1237]: Selected source 169.254.169.123 Jun 18 18:22:55 ip-10-0-151-102 chronyd[1237]: System clock TAI offset set to 37 seconds ``` By all accounts, it appears the VM gets its time synced nearly immediately.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409