Bug 1837039
| Summary: | bootimage: [Azure] Virtual machines are unable to reach public NTP servers | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Colin Walters <walters> |
| Component: | RHCOS | Assignee: | Colin Walters <walters> |
| Status: | CLOSED ERRATA | QA Contact: | Micah Abbott <miabbott> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.5 | CC: | amurdaca, bbreard, cmarches, dornelas, dustymabe, gpei, hongli, imcleod, jlebon, jligon, mharri, miabbott, mnguyen, nils, nstielau, skumari, smilner, stwalter, walters |
| Target Milestone: | --- | ||
| Target Release: | 4.5.0 | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1765609 | Environment: | |
| Last Closed: | 2020-07-13 17:39:55 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1765609 | ||
| Bug Blocks: | 1186913, 1828342, 1834565 | ||
|
Comment 1
Colin Walters
2020-05-18 17:52:53 UTC
The linked PR updates our bootimage to `45.81.202005181029-0`
Booting a single node of that version of RHCOS, we can see there is a generator in place for configuring NTP support for AWS, Azure, GCP:
```
[core@cosa-devsh ~]$ rpm-ostree status -b
State: idle
AutomaticUpdates: disabled
BootedDeployment:
* ostree://58a3f5500230bc9ec1048490b26b42aa5df349678edc22746265d2671aa6d00a
Version: 45.81.202005181029-0 (2020-05-18T10:33:37Z)
[core@cosa-devsh ~]$ ls -l /usr/lib/systemd/system-generators/coreos-platform-chrony
-rw-r--r--. 2 root root 2729 Jan 1 1970 /usr/lib/systemd/system-generators/coreos-platform-chrony
[core@cosa-devsh ~]$ cat /usr/lib/systemd/system-generators/coreos-platform-chrony
#!/bin/bash
set -euo pipefail
# Configuring the timeserver for the platform is often handled
# by pre-baking a config into a particular image for a platform, but
# that doesn't work for us because we have a single update stream. Hence
# this generator dynamically inspects the platform and reconfigures chrony.
#
# AWS: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html
# Azure: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/time-sync
# GCP: https://cloud.google.com/compute/docs/instances/managing-instances#configure-ntp
#
# Originally spawned from discussion in https://github.com/openshift/installer/pull/3513
# Generators don't have logging right now
# https://github.com/systemd/systemd/issues/15638
exec 1>/dev/kmsg; exec 2>&1
self=$(basename $0)
confpath=/run/coreos-platform-chrony.conf
# Yeah this isn't a completely accurate kernel argument parser but
# we don't have one shared across shell services at the moment.
platform="$(grep -Eo ' ignition.platform.id=[a-z]+' /proc/cmdline | cut -f 2 -d =)"
case "${platform}" in
azure|aws|gcp) ;; # OK, this is a platform we know how to support
*) exit 0 ;;
esac
if ! cmp {/usr,}/etc/chrony.conf >/dev/null; then
echo "$self: /etc/chrony.conf is modified; not changing the default"
exit 0
fi
echo "# Generated by $self - do not edit directly" > "${confpath}"
case "${platform}" in
azure)
(echo '# See also https://docs.microsoft.com/en-us/azure/virtual-machines/linux/time-sync'
sed -e s,'^pool,#pool,' < /etc/chrony.conf
echo 'refclock PHC /dev/ptp0 poll 3 dpoll -2 offset 0'
) >> "${confpath}" ;;
aws)
(echo '# See also https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html'
sed -e s,'^pool,#pool,' < /etc/chrony.conf
echo 'server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4'
) >> "${confpath}" ;;
gcp)
(echo '# See also https://cloud.google.com/compute/docs/instances/managing-instances#configure-ntp'
echo '# and https://cloud.google.com/compute/docs/images/configuring-imported-images'
sed -e s,'^pool,#pool,' < /etc/chrony.conf
echo 'server metadata.google.internal prefer iburst'
) >> "${confpath}" ;;
*) echo "should not be reached" 1>&2; exit 1 ;;
esac
# Policy doesn't allow chronyd to read run_t
chcon --reference=/etc/chrony.conf "${confpath}"
UNIT_DIR="${1:-/tmp}"
unitconfpath="${UNIT_DIR}/chronyd.service.d/coreos-platform-chrony.conf"
mkdir -p $(dirname "${unitconfpath}")
cat >"${unitconfpath}" << EOF
[Service]
ExecStart=
ExecStart=/usr/sbin/chronyd -f ${confpath} \$OPTIONS
EOF
echo "$self: Updated chrony to use ${platform} configuration ${confpath}"
```
Additionally, using clusterbot to launch an Azure cluster with the latest 4.5 is successful and shows that the generator is place:
```
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.5.0-rc.1 True False 17m Cluster version is 4.5.0-rc.1
[miabbott@toolbox (container) ~/openshift-cluster-installs ]$ oc get nodes
NAME STATUS ROLES AGE VERSION
ci-ln-fr7fdz2-002ac-qjrdw-master-0 Ready master 40m v1.18.3+a637491
ci-ln-fr7fdz2-002ac-qjrdw-master-1 Ready master 39m v1.18.3+a637491
ci-ln-fr7fdz2-002ac-qjrdw-master-2 Ready master 40m v1.18.3+a637491
ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl Ready worker 25m v1.18.3+a637491
ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus2-g74lk Ready worker 25m v1.18.3+a637491
ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus3-bfrzd Ready worker 25m v1.18.3+a637491
[miabbott@toolbox (container) ~/openshift-cluster-installs ]$ oc debug node/ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl
Starting pod/ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.32.4
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7c379ae47f0ba149124e44d298a74b22dfa6bb676a35af0146133a6e2ebb62f1
CustomOrigin: Managed by machine-config-operator
Version: 45.81.202006051300-0 (2020-06-05T13:03:53Z)
ostree://1e46236673938a570029e54117fff0c1a1eedb4a5e0ad12373d1a27407cfed3a
Version: 45.81.202005200134-0 (2020-05-20T01:37:25Z)
sh-4.4# ls -l /usr/lib/systemd/system-generators/coreos-platform-chrony
-rwxr-xr-x. 2 root root 2897 Jan 1 1970 /usr/lib/systemd/system-generators/coreos-platform-chrony
sh-4.4# cat /usr/lib/systemd/system-generators/coreos-platform-chrony
#!/bin/bash
set -euo pipefail
# Configuring the timeserver for the platform is often handled
# by pre-baking a config into a particular image for a platform, but
# that doesn't work for us because we have a single update stream. Hence
# this generator dynamically inspects the platform and reconfigures chrony.
#
# AWS: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html
# Azure: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/time-sync
# GCP: https://cloud.google.com/compute/docs/instances/managing-instances#configure-ntp
#
# Originally spawned from discussion in https://github.com/openshift/installer/pull/3513
# Generators don't have logging right now
# https://github.com/systemd/systemd/issues/15638
exec 1>/dev/kmsg; exec 2>&1
self=$(basename $0)
confpath=/run/coreos-platform-chrony.conf
# Yeah this isn't a completely accurate kernel argument parser but
# we don't have one shared across shell services at the moment.
platform="$(grep -Eo ' ignition.platform.id=[a-z]+' /proc/cmdline | cut -f 2 -d =)"
case "${platform}" in
azure|aws|gcp) ;; # OK, this is a platform we know how to support
*) exit 0 ;;
esac
if ! cmp {/usr,}/etc/chrony.conf >/dev/null; then
echo "$self: /etc/chrony.conf is modified; not changing the default"
exit 0
fi
(echo "# Generated by $self - do not edit directly"
sed -e s,'^makestep,#makestep,' -e s,'^pool,#pool,' < /etc/chrony.conf
cat <<EOF
# Allow the system clock step on any clock update.
# It will avoid the time resynchronization issue when VMs are resumed from suspend.
# See https://bugzilla.redhat.com/show_bug.cgi?id=1780165 for more information.
makestep 1.0 -1
EOF
) > "${confpath}"
case "${platform}" in
azure)
(echo '# See also https://docs.microsoft.com/en-us/azure/virtual-machines/linux/time-sync'
echo 'refclock PHC /dev/ptp0 poll 3 dpoll -2 offset 0'
) >> "${confpath}" ;;
aws)
(echo '# See also https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html'
echo 'server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4'
) >> "${confpath}" ;;
gcp)
(echo '# See also https://cloud.google.com/compute/docs/instances/managing-instances#configure-ntp'
echo '# and https://cloud.google.com/compute/docs/images/configuring-imported-images'
echo 'server metadata.google.internal prefer iburst'
) >> "${confpath}" ;;
*) echo "should not be reached" 1>&2; exit 1 ;;
esac
# Policy doesn't allow chronyd to read run_t
chcon --reference=/etc/chrony.conf "${confpath}"
UNIT_DIR="${1:-/tmp}"
unitconfpath="${UNIT_DIR}/chronyd.service.d/coreos-platform-chrony.conf"
mkdir -p $(dirname "${unitconfpath}")
cat >"${unitconfpath}" << EOF
[Service]
ExecStart=
ExecStart=/usr/sbin/chronyd -f ${confpath} \$OPTIONS
EOF
echo "$self: Updated chrony to use ${platform} configuration ${confpath}"
sh-4.4# systemctl status chronyd
● chronyd.service - NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
Drop-In: /run/systemd/generator/chronyd.service.d
└─coreos-platform-chrony.conf
Active: active (running) since Wed 2020-06-17 17:36:30 UTC; 30min ago
Docs: man:chronyd(8)
man:chrony.conf(5)
Process: 1299 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
Process: 1287 ExecStart=/usr/sbin/chronyd -f /run/coreos-platform-chrony.conf $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 1295 (chronyd)
Tasks: 1
Memory: 2.0M
CPU: 328ms
CGroup: /system.slice/chronyd.service
└─1295 /usr/sbin/chronyd -f /run/coreos-platform-chrony.conf
Jun 17 17:36:30 ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl systemd[1]: Starting NTP client/server...
Jun 17 17:36:30 ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl chronyd[1295]: chronyd version 3.5 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG)
Jun 17 17:36:30 ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl chronyd[1295]: Frequency -39.139 +/- 0.583 ppm read from /var/lib/chrony/drift
Jun 17 17:36:30 ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl chronyd[1295]: Using right/UTC timezone to obtain leap second data
Jun 17 17:36:30 ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl systemd[1]: Started NTP client/server.
Jun 17 17:36:54 ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl chronyd[1295]: Selected source PHC0
Jun 17 17:36:54 ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl chronyd[1295]: System clock TAI offset set to 37 seconds
sh-4.4# tail /run/coreos-platform-chrony.conf
# Select which information is logged.
#log measurements statistics tracking
# Allow the system clock step on any clock update.
# It will avoid the time resynchronization issue when VMs are resumed from suspend.
# See https://bugzilla.redhat.com/show_bug.cgi?id=1780165 for more information.
makestep 1.0 -1
# See also https://docs.microsoft.com/en-us/azure/virtual-machines/linux/time-sync
refclock PHC /dev/ptp0 poll 3 dpoll -2 offset 0
sh-4.4# chronyc sources
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
#* PHC0 0 3 377 5 +10us[ +10us] +/- 9651ns
```
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |