Bug 2060586 - [4.10.z] [RFE] use /dev/ptp_hyperv on Azure/AzureStack
Summary: [4.10.z] [RFE] use /dev/ptp_hyperv on Azure/AzureStack
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.9
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: ---
: 4.10.z
Assignee: Aashish Radhakrishnan
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 2037841
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-03 19:38 UTC by Micah Abbott
Modified: 2022-12-19 14:59 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2037841
Environment:
Last Closed: 2022-04-21 13:16:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift os pull 742 0 None open Bug 2060586: [4.10.0] [RFE] use /dev/ptp_hyperv on Azure/AzureStack 2022-03-11 19:39:18 UTC
Red Hat Product Errata RHSA-2022:1356 0 None None None 2022-04-21 13:16:19 UTC

Description Micah Abbott 2022-03-03 19:38:04 UTC
+++ This bug was initially created as a clone of Bug #2037841 +++

OCP Version at Install Time: 4.9
Platform: AzureStack, Azure
Architecture: x86_64


What are you trying to do? What is your use case?
configure Chrony as documented
https://docs.microsoft.com/en-us/azure/virtual-machines/linux/time-sync

What happened? What went wrong or what did you expect?
We need to define through butane machineconfig
- the udev rule to create /dev/ptp_hyperv (https://bugzilla.redhat.com/show_bug.cgi?id=1991834)
- the proper chrony configuration using /dev/ptp_hyperv and makestep
I expect it to work outside the box without any configuration needed.


What are the steps to reproduce your issue? Please try to reduce these steps to something that can be reproduced with a single RHCOS node.
- boot a rhcos VM on AzureStack
- chronyc tracking: shows NTP being used instead of the hardware clock.

(side note: there is no documentation on how to upgrade a "single RHCOS node")

Implemented on FCOS through
https://github.com/coreos/fedora-coreos-config/pull/1355
https://github.com/coreos/fedora-coreos-config/pull/1390

--- Additional comment from Micah Abbott on 2022-01-27 15:15:59 UTC ---

We are blocked on access to the AzureStack environment, so it is unlikely to get this change in by code freeze for OCP 4.10

--- Additional comment from Eric Paris on 2022-01-28 21:04:35 UTC ---

This bug sets Target Release equal to a z-stream but has no bug in the 'Depends On' field. As such this is not a valid bug state and the target release is being unset.

Any bug targeting 4.1.z must have a bug targeting 4.2 in 'Depends On.'
Similarly, any bug targeting 4.2.z must have a bug with Target Release of 4.3 in 'Depends On.'

--- Additional comment from Micah Abbott on 2022-03-03 19:29:23 UTC ---

The fix for this in RHCOS 4.11 landed as part of https://github.com/openshift/os/commit/f21cc4b8d4e61c9b92c8f77bf2e0284fb3991c4d

Moving to MODIFIED

Comment 3 Michael Nguyen 2022-03-15 16:16:49 UTC
Verified on Azure with OCP 4.10.0-0.nightly-2022-03-15-033337.  symlink for ptp_hyperv is created to the correct ptp device and chronyc is configured automatically.

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-03-15-033337   True        False         35s     Cluster version is 4.10.0-0.nightly-2022-03-15-033337
$ oc get nodes
NAME                                                STATUS   ROLES    AGE   VERSION
ci-ln-xrgcflk-1d09d-r6l42-master-0                  Ready    master   20m   v1.23.3+e419edf
ci-ln-xrgcflk-1d09d-r6l42-master-1                  Ready    master   19m   v1.23.3+e419edf
ci-ln-xrgcflk-1d09d-r6l42-master-2                  Ready    master   19m   v1.23.3+e419edf
ci-ln-xrgcflk-1d09d-r6l42-worker-centralus1-rmtk6   Ready    worker   13m   v1.23.3+e419edf
ci-ln-xrgcflk-1d09d-r6l42-worker-centralus2-c82s7   Ready    worker   13m   v1.23.3+e419edf
ci-ln-xrgcflk-1d09d-r6l42-worker-centralus3-p7hcl   Ready    worker   14m   v1.23.3+e419edf
$ oc debug node/ci-ln-xrgcflk-1d09d-r6l42-worker-centralus1-rmtk6
Starting pod/ci-ln-xrgcflk-1d09d-r6l42-worker-centralus1-rmtk6-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
#* PHC0                          0   3   377     9  -8147ns[-5385ns] +/- 7057ns
sh-4.4# cat /run/coreos-platform-chrony.conf
# Generated by coreos-platform-chrony - do not edit directly
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#pool 2.rhel.pool.ntp.org iburst

# Record the rate at which the system clock gains/losses time.
driftfile /var/lib/chrony/drift

# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 second.
#makestep 1.0 3

# Enable kernel synchronization of the real-time clock (RTC).
rtcsync

# Enable hardware timestamping on all interfaces that support it.
#hwtimestamp *

# Increase the minimum number of selectable sources required to adjust
# the system clock.
#minsources 2

# Allow NTP client access from local network.
#allow 192.168.0.0/16

# Serve time even if not synchronized to a time source.
#local stratum 10

# Specify file containing keys for NTP authentication.
keyfile /etc/chrony.keys

# Get TAI-UTC offset and leap seconds from the system tz database.
#leapsectz right/UTC

# Specify directory for log files.
logdir /var/log/chrony

# Select which information is logged.
#log measurements statistics tracking

# Allow the system clock step on any clock update.
# It will avoid the time resynchronization issue when VMs are resumed from suspend.
# See https://bugzilla.redhat.com/show_bug.cgi?id=1780165 for more information.
makestep 1.0 -1

# See also https://docs.microsoft.com/en-us/azure/virtual-machines/linux/time-sync
refclock PHC /dev/ptp_hyperv poll 3 dpoll -2 offset 0
leapsectz right/UTC
sh-4.4# ls -l /dev/ptp_hyperv 
lrwxrwxrwx. 1 root root 4 Mar 15 15:52 /dev/ptp_hyperv -> ptp0
sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:983fe981b05b1504cacbac3b76ff00ceca4e3a6066df7fa4a6f676a14dd962eb
              CustomOrigin: Managed by machine-config-operator
                   Version: 410.84.202203141348-0 (2022-03-14T13:51:27Z)

  ostree://b1529f891c792557fd28e040870ab4b8220e65c5416427032701d21147815293
                   Version: 410.84.202201251210-0 (2022-01-25T12:13:24Z)

Comment 8 Colin Walters 2022-03-24 15:20:26 UTC
I think a good SOP for bugs like this would be for us to extend the test suite in e.g. github.com/openshift/origin - that way this would automatically get tested when the regular OCP tests for Azure Stack are run.

https://github.com/openshift/origin/blob/master/test/extended/security/fips.go is an example bit of code that implements a "check node state".

(Obviously there's some overlap with kola tests here...perhaps in theory we could set things up so at least read-only kola tests are run as part of OCP tests?)

Comment 12 Michael Nguyen 2022-04-10 16:06:08 UTC
 Verified on RHCOS 410.84.202204081044-0 which is a part of registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-04-09-185509 on AzureStack.

Because there specific azurestack artifact created since the boot image, I used the boot image 410.84.202112040202-0 and manually pivoted:

# oc adm release -a pull-secret.json info --image-for=machine-os-content registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-04-09-185509
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:22ff8ba92ed853da37e88cc5759d5497b23931ae2b388a2e6ab3c6dbf7991c43
# oc image extract -a pull-secret.json  --path /:/run/mco-machine-os-content/os-content  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:22ff8ba92ed853da37e88cc5759d5497b23931ae2b388a2e6ab3c6dbf7991c43
# rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content/srv/repo:2ce5174c097c1144f17b206eaa401f59ed27da5fce72f6ad6e11619c7ff28bc7 --custom-origin-url pivot://2ce5174c097c1144f17b206eaa401f59ed27da5fce72f6ad6e11619c7ff28bc7 --custom-origin-description "Managed by machine-config-operator"

After I rebooted and verified the /dev/ptp_hyperv symlink to ptp0 and chrony is working.

# chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
#* PHC0                          0   3   377    10  +7087ns[+7613ns] +/- 2827ns
[root@redhatuser64954-rhcos-test ~]# cat /run/coreos-platform-chrony.conf
# Generated by coreos-platform-chrony - do not edit directly
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#pool 2.rhel.pool.ntp.org iburst

# Record the rate at which the system clock gains/losses time.
driftfile /var/lib/chrony/drift

# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 second.
#makestep 1.0 3

# Enable kernel synchronization of the real-time clock (RTC).
rtcsync

# Enable hardware timestamping on all interfaces that support it.
#hwtimestamp *

# Increase the minimum number of selectable sources required to adjust
# the system clock.
#minsources 2

# Allow NTP client access from local network.
#allow 192.168.0.0/16

# Serve time even if not synchronized to a time source.
#local stratum 10

# Specify file containing keys for NTP authentication.
keyfile /etc/chrony.keys

# Get TAI-UTC offset and leap seconds from the system tz database.
#leapsectz right/UTC

# Specify directory for log files.
logdir /var/log/chrony

# Select which information is logged.
#log measurements statistics tracking

# Allow the system clock step on any clock update.
# It will avoid the time resynchronization issue when VMs are resumed from suspend.
# See https://bugzilla.redhat.com/show_bug.cgi?id=1780165 for more information.
makestep 1.0 -1

# See also https://docs.microsoft.com/en-us/azure/virtual-machines/linux/time-sync
refclock PHC /dev/ptp_hyperv poll 3 dpoll -2 offset 0
leapsectz right/UTC
[root@redhatuser64954-rhcos-test ~]# ls -l /dev/ptp_hyperv 
lrwxrwxrwx. 1 root root 4 Apr 10 15:22 /dev/ptp_hyperv -> ptp0
[root@redhatuser64954-rhcos-test ~]# rpm-ostree status
State: idle
Deployments:
● pivot://1bfd49b9612e3ce7d1a3ababf70e9fe571a0438a05902e7934f16fd698ccc05c
              CustomOrigin: Managed by machine-config-operator
                   Version: 410.84.202204081044-0 (2022-04-08T10:48:09Z)

  ostree://658b35d30d5da7226bf2abeb9c318a92c1521de2ea65486bc47632f2eee4e6c6
                   Version: 410.84.202112040202-0 (2021-12-04T02:05:40Z)

Comment 17 errata-xmlrpc 2022-04-21 13:16:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.10 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1356

Comment 19 Aashish Radhakrishnan 2022-12-19 14:59:29 UTC
we havent decided on backporting it to OCP 4.9 yet.


Note You need to log in before you can comment on or make changes to this bug.