Bug 1837039 - bootimage: [Azure] Virtual machines are unable to reach public NTP servers
Summary: bootimage: [Azure] Virtual machines are unable to reach public NTP servers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.5
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
: 4.5.0
Assignee: Colin Walters
QA Contact: Micah Abbott
URL:
Whiteboard:
Depends On: 1765609
Blocks: 1186913 1828342 1834565
TreeView+ depends on / blocked
 
Reported: 2020-05-18 17:52 UTC by Colin Walters
Modified: 2020-07-13 17:40 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1765609
Environment:
Last Closed: 2020-07-13 17:39:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 3613 0 None closed Bug 1837039: rhcos: Bump to 45.81.202005181029-0 2021-02-08 22:18:55 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:40:17 UTC

Comment 1 Colin Walters 2020-05-18 17:52:53 UTC
We shipped the fix in machine-os-content, but need to update the bootimage.

Comment 4 Micah Abbott 2020-06-17 18:08:33 UTC
The linked PR updates our bootimage to `45.81.202005181029-0`

Booting a single node of that version of RHCOS, we can see there is a generator in place for configuring NTP support for AWS, Azure, GCP:

```
[core@cosa-devsh ~]$ rpm-ostree status -b                               
State: idle
AutomaticUpdates: disabled
BootedDeployment:                                   
* ostree://58a3f5500230bc9ec1048490b26b42aa5df349678edc22746265d2671aa6d00a
                   Version: 45.81.202005181029-0 (2020-05-18T10:33:37Z)
[core@cosa-devsh ~]$ ls -l /usr/lib/systemd/system-generators/coreos-platform-chrony 
-rw-r--r--. 2 root root 2729 Jan  1  1970 /usr/lib/systemd/system-generators/coreos-platform-chrony
[core@cosa-devsh ~]$ cat /usr/lib/systemd/system-generators/coreos-platform-chrony                                                             
#!/bin/bash
set -euo pipefail
# Configuring the timeserver for the platform is often handled
# by pre-baking a config into a particular image for a platform, but
# that doesn't work for us because we have a single update stream.  Hence
# this generator dynamically inspects the platform and reconfigures chrony.
#
# AWS: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html
# Azure: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/time-sync
# GCP: https://cloud.google.com/compute/docs/instances/managing-instances#configure-ntp
#
# Originally spawned from discussion in https://github.com/openshift/installer/pull/3513

# Generators don't have logging right now
# https://github.com/systemd/systemd/issues/15638
exec 1>/dev/kmsg; exec 2>&1

self=$(basename $0)
confpath=/run/coreos-platform-chrony.conf

# Yeah this isn't a completely accurate kernel argument parser but
# we don't have one shared across shell services at the moment.
platform="$(grep -Eo ' ignition.platform.id=[a-z]+' /proc/cmdline | cut -f 2 -d =)"
case "${platform}" in
    azure|aws|gcp) ;;  # OK, this is a platform we know how to support
    *) exit 0 ;;
esac

if ! cmp {/usr,}/etc/chrony.conf >/dev/null; then
    echo "$self: /etc/chrony.conf is modified; not changing the default"
    exit 0
fi

echo "# Generated by $self - do not edit directly" > "${confpath}"
case "${platform}" in
    azure) 
        (echo '# See also https://docs.microsoft.com/en-us/azure/virtual-machines/linux/time-sync'
         sed -e s,'^pool,#pool,' < /etc/chrony.conf
         echo 'refclock PHC /dev/ptp0 poll 3 dpoll -2 offset 0'
        ) >> "${confpath}" ;;
    aws)
        (echo '# See also https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html'
         sed -e s,'^pool,#pool,' < /etc/chrony.conf
         echo 'server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4'
        ) >> "${confpath}" ;;
    gcp)
        (echo '# See also https://cloud.google.com/compute/docs/instances/managing-instances#configure-ntp'
         echo '# and https://cloud.google.com/compute/docs/images/configuring-imported-images'
         sed -e s,'^pool,#pool,' < /etc/chrony.conf
         echo 'server metadata.google.internal prefer iburst'
        ) >> "${confpath}" ;;
    *) echo "should not be reached" 1>&2; exit 1 ;;
esac
# Policy doesn't allow chronyd to read run_t
chcon --reference=/etc/chrony.conf "${confpath}"

UNIT_DIR="${1:-/tmp}"                              
                                                                       
unitconfpath="${UNIT_DIR}/chronyd.service.d/coreos-platform-chrony.conf"
mkdir -p $(dirname "${unitconfpath}")              
cat >"${unitconfpath}" << EOF
[Service]                                   
ExecStart=                                      
ExecStart=/usr/sbin/chronyd -f ${confpath} \$OPTIONS
EOF                  

echo "$self: Updated chrony to use ${platform} configuration ${confpath}"
```

Additionally, using clusterbot to launch an Azure cluster with the latest 4.5 is successful and shows that the generator is place:

```
$ oc get clusterversion                                                           
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS                                                                               
version   4.5.0-rc.1   True        False         17m     Cluster version is 4.5.0-rc.1
[miabbott@toolbox (container) ~/openshift-cluster-installs ]$ oc get nodes
NAME                                                STATUS   ROLES    AGE   VERSION
ci-ln-fr7fdz2-002ac-qjrdw-master-0                  Ready    master   40m   v1.18.3+a637491                                                   
ci-ln-fr7fdz2-002ac-qjrdw-master-1                  Ready    master   39m   v1.18.3+a637491
ci-ln-fr7fdz2-002ac-qjrdw-master-2                  Ready    master   40m   v1.18.3+a637491
ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl   Ready    worker   25m   v1.18.3+a637491
ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus2-g74lk   Ready    worker   25m   v1.18.3+a637491       
ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus3-bfrzd   Ready    worker   25m   v1.18.3+a637491
[miabbott@toolbox (container) ~/openshift-cluster-installs ]$ oc debug node/ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl
Starting pod/ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl-debug ...
To use host binaries, run `chroot /host`                                                                                                      
Pod IP: 10.0.32.4                                                                                                                             
If you don't see a command prompt, try pressing enter.       
sh-4.2# chroot /host
sh-4.4# rpm-ostree status                                                                                                                     
State: idle                                                                                                                                   
AutomaticUpdates: disabled                                   
Deployments:                 
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7c379ae47f0ba149124e44d298a74b22dfa6bb676a35af0146133a6e2ebb62f1
              CustomOrigin: Managed by machine-config-operator
                   Version: 45.81.202006051300-0 (2020-06-05T13:03:53Z)
                                                                       
  ostree://1e46236673938a570029e54117fff0c1a1eedb4a5e0ad12373d1a27407cfed3a
                   Version: 45.81.202005200134-0 (2020-05-20T01:37:25Z)       
sh-4.4# ls -l /usr/lib/systemd/system-generators/coreos-platform-chrony  
-rwxr-xr-x. 2 root root 2897 Jan  1  1970 /usr/lib/systemd/system-generators/coreos-platform-chrony
sh-4.4# cat /usr/lib/systemd/system-generators/coreos-platform-chrony
#!/bin/bash             
set -euo pipefail                                   
# Configuring the timeserver for the platform is often handled    
# by pre-baking a config into a particular image for a platform, but   
# that doesn't work for us because we have a single update stream.  Hence
# this generator dynamically inspects the platform and reconfigures chrony.
#                                                     
# AWS: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html
# Azure: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/time-sync
# GCP: https://cloud.google.com/compute/docs/instances/managing-instances#configure-ntp                                                                                                                                                                                                     
#                                                                                                                                                                                                                                                                                           
# Originally spawned from discussion in https://github.com/openshift/installer/pull/3513                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                            
# Generators don't have logging right now                                                                                                                                                                                                                                                   
# https://github.com/systemd/systemd/issues/15638                                                                                                                                                                                                                                           
exec 1>/dev/kmsg; exec 2>&1     
                                                                                                                                              
self=$(basename $0)                                                                                                                           
confpath=/run/coreos-platform-chrony.conf                                                                                                                                                                                                                                                   
                                                                       
# Yeah this isn't a completely accurate kernel argument parser but                                                                            
# we don't have one shared across shell services at the moment.
platform="$(grep -Eo ' ignition.platform.id=[a-z]+' /proc/cmdline | cut -f 2 -d =)"
case "${platform}" in                                                                                                                         
    azure|aws|gcp) ;;  # OK, this is a platform we know how to support                                                                        
    *) exit 0 ;;         
esac        
                                                                                                                                              
if ! cmp {/usr,}/etc/chrony.conf >/dev/null; then
    echo "$self: /etc/chrony.conf is modified; not changing the default"
    exit 0
fi

(echo "# Generated by $self - do not edit directly" 
 sed -e s,'^makestep,#makestep,' -e s,'^pool,#pool,' < /etc/chrony.conf 
cat <<EOF

# Allow the system clock step on any clock update. 
# It will avoid the time resynchronization issue when VMs are resumed from suspend.
# See https://bugzilla.redhat.com/show_bug.cgi?id=1780165 for more information.
makestep 1.0 -1

EOF
) > "${confpath}"
case "${platform}" in
    azure) 
        (echo '# See also https://docs.microsoft.com/en-us/azure/virtual-machines/linux/time-sync'
         echo 'refclock PHC /dev/ptp0 poll 3 dpoll -2 offset 0'
        ) >> "${confpath}" ;;
    aws)
        (echo '# See also https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html'
         echo 'server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4'
        ) >> "${confpath}" ;;
    gcp)
        (echo '# See also https://cloud.google.com/compute/docs/instances/managing-instances#configure-ntp'
         echo '# and https://cloud.google.com/compute/docs/images/configuring-imported-images'
         echo 'server metadata.google.internal prefer iburst'
        ) >> "${confpath}" ;;
    *) echo "should not be reached" 1>&2; exit 1 ;;
esac
# Policy doesn't allow chronyd to read run_t
chcon --reference=/etc/chrony.conf "${confpath}"

UNIT_DIR="${1:-/tmp}"

unitconfpath="${UNIT_DIR}/chronyd.service.d/coreos-platform-chrony.conf"
mkdir -p $(dirname "${unitconfpath}")
cat >"${unitconfpath}" << EOF
[Service]
ExecStart=
ExecStart=/usr/sbin/chronyd -f ${confpath} \$OPTIONS
EOF

echo "$self: Updated chrony to use ${platform} configuration ${confpath}"
sh-4.4# systemctl status chronyd
● chronyd.service - NTP client/server
   Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/chronyd.service.d
           └─coreos-platform-chrony.conf
   Active: active (running) since Wed 2020-06-17 17:36:30 UTC; 30min ago
     Docs: man:chronyd(8)
           man:chrony.conf(5)
  Process: 1299 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
  Process: 1287 ExecStart=/usr/sbin/chronyd -f /run/coreos-platform-chrony.conf $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 1295 (chronyd)
    Tasks: 1
   Memory: 2.0M
      CPU: 328ms
   CGroup: /system.slice/chronyd.service
           └─1295 /usr/sbin/chronyd -f /run/coreos-platform-chrony.conf

Jun 17 17:36:30 ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl systemd[1]: Starting NTP client/server...
Jun 17 17:36:30 ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl chronyd[1295]: chronyd version 3.5 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG)
Jun 17 17:36:30 ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl chronyd[1295]: Frequency -39.139 +/- 0.583 ppm read from /var/lib/chrony/drift
Jun 17 17:36:30 ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl chronyd[1295]: Using right/UTC timezone to obtain leap second data
Jun 17 17:36:30 ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl systemd[1]: Started NTP client/server.
Jun 17 17:36:54 ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl chronyd[1295]: Selected source PHC0
Jun 17 17:36:54 ci-ln-fr7fdz2-002ac-qjrdw-worker-centralus1-46jbl chronyd[1295]: System clock TAI offset set to 37 seconds
sh-4.4# tail /run/coreos-platform-chrony.conf
# Select which information is logged.
#log measurements statistics tracking

# Allow the system clock step on any clock update. 
# It will avoid the time resynchronization issue when VMs are resumed from suspend.
# See https://bugzilla.redhat.com/show_bug.cgi?id=1780165 for more information.
makestep 1.0 -1

# See also https://docs.microsoft.com/en-us/azure/virtual-machines/linux/time-sync
refclock PHC /dev/ptp0 poll 3 dpoll -2 offset 0

sh-4.4# chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
#* PHC0                          0   3   377     5    +10us[  +10us] +/- 9651ns
```

Comment 5 errata-xmlrpc 2020-07-13 17:39:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.