Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1840239

Summary: PTP operator can't find valid NICs
Product: OpenShift Container Platform Reporter: Sergio G. <sgarciam>
Component: DocumentationAssignee: Jason Boxman <jboxman>
Status: CLOSED NOTABUG QA Contact: Xiaoli Tian <xtian>
Severity: medium Docs Contact: Vikram Goyal <vigoyal>
Priority: unspecified    
Version: 4.4CC: aos-bugs, eparis, fpaoline, jboxman, jokerman, sscheink
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-23 20:12:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sergio G. 2020-05-26 15:39:04 UTC
Description of problem:
PTP operator is unable to find any valid NIC in Azure instances (default IPI installation). Not sure if this is expected or not. I am aware that the operator is a tech preview feature, but if it's expected to fail in Azure we would probably better add a warning in the documentation or write a KCS.


Version-Release number of selected component (if applicable):
4.4.4


How reproducible:
Always


Notes:
- According to Microsoft notes (https://docs.microsoft.com/en-us/azure/virtual-machines/linux/time-sync#check-for-ptp) the node should be PTP capable:
$ oc debug node/sgarcia-ocp444-4lld6-master-0
Starting pod/sgarcia-ocp444-4lld6-master-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.7
If you don't see a command prompt, try pressing enter.
sh-4.2#
sh-4.2# chroot /host
sh-4.4# lsmod | grep hv_utils
hv_utils               36864  0
hv_vmbus              110592  7 hv_balloon,hv_utils,hv_netvsc,hid_hyperv,hv_storvsc,hyperv_keyboard,hyperv_fb
sh-4.4# ps -ef | grep hv
root         712       2  0 13:24 ?        00:00:00 [hv_vmbus_con]
root         713       2  0 13:24 ?        00:00:00 [hv_pri_chan]
root         714       2  0 13:24 ?        00:00:00 [hv_sub_chan]
root        1068       2  0 13:24 ?        00:00:00 [hv_balloon]
root      279428  278889  0 15:23 ?        00:00:00 grep hv
sh-4.4# cat /sys/class/ptp/ptp0/clock_name
hyperv

- The operator deploys the pods but they're unable to find any capable NIC as seen in the logs:
[sgarcia@cloud clusters]$ oc logs linuxptp-daemon-7p9ff
I0526 14:56:44.638263  220911 main.go:43] resync period set to: 30 [s]
I0526 14:56:44.638712  220911 main.go:44] linuxptp profile path set to: /etc/linuxptp
I0526 14:56:44.638973  220911 main.go:51] successfully get kubeconfig
I0526 14:56:44.659082  220911 utils.go:67] grabbing NIC timestamp capability for br0
I0526 14:56:44.659914  220911 utils.go:33] cmd output for Time stamping parameters for br0:
Capabilities:
        software-receive      (SOF_TIMESTAMPING_RX_SOFTWARE)
        software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
PTP Hardware Clock: none
Hardware Transmit Timestamp Modes: none
Hardware Receive Filter Modes: none
I0526 14:56:44.659942  220911 utils.go:67] grabbing NIC timestamp capability for eth0
I0526 14:56:44.660783  220911 utils.go:33] cmd output for Time stamping parameters for eth0:
Capabilities:
        software-transmit     (SOF_TIMESTAMPING_TX_SOFTWARE)
        software-receive      (SOF_TIMESTAMPING_RX_SOFTWARE)
        software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
PTP Hardware Clock: none
Hardware Transmit Timestamp Modes: none
Hardware Receive Filter Modes: none
... more interfaces debugging information ...
I0526 14:56:44.720725  220911 ptpdev.go:15] PTP capable NICs: []

Comment 1 Sergio G. 2020-05-26 16:24:16 UTC
For whatever it's worth, as the node was apparently ready to use PTP I configured a MachineConfig resource to enable it in chrony and it worked, so this is probably some bug in the operator and not in the instances:

$ cat << EOF | base64 -w0
refclock PHC /dev/ptp0 poll 3 dpoll -2 offset 0
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF

$ cat << EOF | oc create -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-masters-chrony-configuration
spec:
  config:
    ignition:
      config: {}
      security:
        tls: {}
      timeouts: {}
      version: 2.2.0
    networkd: {}
    passwd: {}
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,<previous_base64_encoded_string>
          verification: {}
        filesystem: root
        mode: 420
        path: /etc/chrony.conf
  osImageURL: ""
EOF

$ oc debug node/sgarcia-ocp444-4lld6-master-1 -- chroot /host chronyc sources
Starting pod/sgarcia-ocp444-4lld6-master-1-debug ...
To use host binaries, run `chroot /host`
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
#* PHC0                          0   3   377    12  -4276ns[-7050ns] +/-  629ns

Removing debug pod ...
$ oc debug node/sgarcia-ocp444-4lld6-master-1 -- chroot /host chronyc tracking
Starting pod/sgarcia-ocp444-4lld6-master-1-debug ...
To use host binaries, run `chroot /host`
Reference ID    : 50484330 (PHC0)
Stratum         : 1
Ref time (UTC)  : Tue May 26 16:15:11 2020
System time     : 0.000001258 seconds slow of NTP time
Last offset     : -0.000004220 seconds
RMS offset      : 0.000004212 seconds
Frequency       : 0.311 ppm slow
Residual freq   : -0.006 ppm
Skew            : 0.276 ppm
Root delay      : 0.000000001 seconds
Root dispersion : 0.000015134 seconds
Update interval : 8.0 seconds
Leap status     : Normal

Removing debug pod ...

Comment 2 Sebastian Scheinkman 2020-05-26 16:50:59 UTC
Hi Sergio,

The PTP operator was made for Baremetal environments.

The list of interfaces is empty because the ptp operator uses the ptp4l and phc2sys that are for physical interfaces not the virtual device exposed by the hypervisor inside the guest.

do you recommend to add a note in the ptp page (https://docs.openshift.com/container-platform/4.3/networking/multiple_networks/configuring-ptp.html) that for virtual devices the user should use chrony with the following link (https://docs.openshift.com/container-platform/4.4/installing/install_config/installing-customizing.html#installation-special-config-crony_installing-customizing)?

Comment 3 Sergio G. 2020-05-26 16:57:14 UTC
Hi Sebastian.
>> do you recommend to add a note in the ptp page (https://docs.openshift.com/container-platform/4.3/networking/multiple_networks/configuring-ptp.html) that for virtual devices the user should use chrony with the following link (https://docs.openshift.com/container-platform/4.4/installing/install_config/installing-customizing.html#installation-special-config-crony_installing-customizing)?


Yeah, absolutely. If the PTP operator is not meant to be used for virtual machines I would add an example of a valid chrony.conf file using PTP (the one that I've added in my previous comment) so customers can make their choice whenever they want to configure chrony.

Comment 4 Eric Paris 2020-05-26 17:11:49 UTC
Why shouldn't the PTP operator just magically support machines on Azure? Can you please point me to the rational and the design decision?

Comment 5 Sebastian Scheinkman 2020-05-26 17:16:05 UTC
(In reply to Eric Paris from comment #4)
> Why shouldn't the PTP operator just magically support machines on Azure? Can
> you please point me to the rational and the design decision?

Hi Eric,

This is proposal https://github.com/openshift/enhancements/blob/master/enhancements/ptp-time-integration.md#non-goals


Thanks!
Sebastian

Comment 6 Federico Paolinelli 2020-06-10 10:50:34 UTC
Based upon the proposal linked by Sebastian and comment https://bugzilla.redhat.com/show_bug.cgi?id=1840239#c3 , I am moving this to documentation.

Comment 7 Jason Boxman 2020-11-06 20:18:20 UTC
I've created the following PR for this:

https://github.com/openshift/openshift-docs/pull/27167