Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2060492

Summary: Update PtpConfigSlave source-crs to use network_transport L2 instead of UDPv4
Product: OpenShift Container Platform Reporter: Marius Cornea <mcornea>
Component: NetworkingAssignee: Joseph Richard <josricha>
Networking sub component: ptp QA Contact: obochan <obochan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: achernet, imiller, keyoung, trozet, vgrinber
Version: 4.10   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
network_transport should be L2 and not UDPv4 in all documented ptp configs (e.g. https://docs.openshift.com/container-platform/4.9/networking/using-ptp.html#configuring-linuxptp-services-as-boundary-clock_using-ptp). Note that this applies in all versions as we do not support UDPv4 ptp.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 10:52:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
linuxptp-daemon.log none

Description Marius Cornea 2022-03-03 15:42:46 UTC
Created attachment 1864022 [details]
linuxptp-daemon.log

Description of problem:

linuxptp-daemon-container reports SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) errors on DU node deployed via ZTP process

Version-Release number of selected component (if applicable):
4.10.0-rc.6
ptp-operator.4.10.0-202202222110

How reproducible:
100%

Steps to Reproduce:
1. Deploy DU node via ZTP process, ptp config set in:

http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/kni-qe-1-4.10/policygentemplates/group-du-sno-ranGen.yaml#L48-L57

2. Wait for the deployment and configuration to complete

3. Check linuxptp-daemon-container logs:

oc -n openshift-ptp logs linuxptp-daemon-cwtmw -c linuxptp-daemon-container  -f

Actual results:

ptp4l[8376.097]: [ptp4l.0.config] port 1: FAULTY to LISTENING on INIT_COMPLETE
ptp4l[8376.137]: [ptp4l.0.config] port 1: new foreign master b47af1.fffe.7b20e2-1
ptp4l[8376.362]: [ptp4l.0.config] selected best master clock b47af1.fffe.7b20e2
ptp4l[8376.362]: [ptp4l.0.config] port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[8376.367]: [ptp4l.0.config] master offset -25954588102 s2 freq -900000000 path delay   3119884
ptp4l[8376.371]: [ptp4l.0.config] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[8376.423]: [ptp4l.0.config] master offset -25905432639 s2 freq -900000000 path delay   3119884
ptp4l[8376.480]: [ptp4l.0.config] master offset -25847641674 s2 freq -900000000 path delay   1656547
ptp4l[8376.500]: [ptp4l.0.config] timed out while polling for tx timestamp
ptp4l[8376.500]: [ptp4l.0.config] increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
ptp4l[8376.500]: [ptp4l.0.config] port 1: send delay request failed
ptp4l[8376.500]: [ptp4l.0.config] port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
phc2sys[8377.136]: [ptp4l.0.config] port b49691.fffe.a57b06-1 changed state
phc2sys[8377.136]: [ptp4l.0.config] port b49691.fffe.a57b06-1 changed state
phc2sys[8377.136]: [ptp4l.0.config] port b49691.fffe.a57b06-1 changed state
phc2sys[8377.136]: [ptp4l.0.config] reconfiguring after port state change
phc2sys[8377.137]: [ptp4l.0.config] selecting ens2f2 for synchronization
phc2sys[8377.137]: [ptp4l.0.config] nothing to synchronize


Expected results:

No faults

Additional info:

nic info:

[root@sno core]# ethtool -i ens2f2
driver: ice
version: 4.18.0-305.34.2.rt7.107.el8_4.x
firmware-version: 2.10 0x8000433d 1.2789.0
expansion-rom-version: 
bus-info: 0000:b2:00.2
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

 lspci -s b2:00.0 -v
b2:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-C for SFP (rev 02)
	Subsystem: Intel Corporation Ethernet Network Adapter E810-XXV-4
	Physical Slot: 2
	Flags: bus master, fast devsel, latency 0, IRQ 42, NUMA node 1, IOMMU group 93
	Memory at de000000 (64-bit, prefetchable) [size=32M]
	Memory at e6000000 (64-bit, prefetchable) [size=64K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
	Capabilities: [70] MSI-X: Enable+ Count=512 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [e0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [150] Device Serial Number b4-96-91-ff-ff-a5-7b-04
	Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
	Capabilities: [1a0] Transaction Processing Hints
	Capabilities: [1b0] Access Control Services
	Capabilities: [1d0] Secondary PCI Express
	Capabilities: [200] Data Link Feature <?>
	Capabilities: [210] Physical Layer 16.0 GT/s <?>
	Capabilities: [250] Lane Margining at the Receiver <?>
	Kernel driver in use: ice
	Kernel modules: ice

Comment 1 Ken Young 2022-03-07 19:23:23 UTC
Marius,

Have you set the mitigation required for https://bugzilla.redhat.com/show_bug.cgi?id=1992173 provisioned?  See https://bugzilla.redhat.com/show_bug.cgi?id=1992173#c19.

/KenY

Comment 2 Marius Cornea 2022-03-08 13:00:22 UTC
(In reply to Ken Young from comment #1)
> Marius,
> 
> Have you set the mitigation required for
> https://bugzilla.redhat.com/show_bug.cgi?id=1992173 provisioned?  See
> https://bugzilla.redhat.com/show_bug.cgi?id=1992173#c19.
> 
> /KenY

I haven't set the mitigation mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1992173#c19 . I tried changing the priority (chrt -f -p 65 $pid) of the existing ice-ptp and ptp4l processes but the linuxptp-daemon-container log shows the same error.

Nevertheless I see the BZ mentions a more recent NIC firmware than what I have on my system so I'll try updating the firmware and re-try.

Comment 3 Marius Cornea 2022-03-09 13:22:58 UTC
After the firmware update and adjusting the priorities I can no longer see the faults in the ptp logs.

Ofer also noticed that the ptp config set on my machine was using `network_transport UDPv4` while it should be `network_transport L2`. This config comes from the ZTP source CRs:

https://github.com/openshift-kni/cnf-features-deploy/blob/release-4.10/ztp/source-crs/PtpConfigSlave.yaml#L99
https://github.com/openshift-kni/cnf-features-deploy/blob/release-4.10/ztp/source-crs/PtpConfigSlaveCvl.yaml#L100

@Ken, I updated this BZ to keep track of updating the ptp configs source CRs to use `network_transport L2` instead of `network_transport UDPv4` as I understand L2 is the supported mode currently.

Comment 6 Vitaly Grinberg 2022-03-23 18:06:50 UTC
It(In reply to Marius Cornea from comment #3)
> After the firmware update and adjusting the priorities I can no longer see
> the faults in the ptp logs.
> 
> Ofer also noticed that the ptp config set on my machine was using
> `network_transport UDPv4` while it should be `network_transport L2`. This
> config comes from the ZTP source CRs:
> 
> https://github.com/openshift-kni/cnf-features-deploy/blob/release-4.10/ztp/
> source-crs/PtpConfigSlave.yaml#L99
> https://github.com/openshift-kni/cnf-features-deploy/blob/release-4.10/ztp/
> source-crs/PtpConfigSlaveCvl.yaml#L100
> 
> @Ken, I updated this BZ to keep track of updating the ptp configs source CRs
> to use `network_transport L2` instead of `network_transport UDPv4` as I
> understand L2 is the supported mode currently.

While the transport is set to UDP4 in the config file options, it is overridden by the command line options.
The command line options for ptp4l are selecting IEEE 802.3 transport:
https://github.com/openshift-kni/cnf-features-deploy/blob/d521e22a7c1a8dcd0a76f2c4659da8736defec49/ztp/source-crs/PtpConfigSlave.yaml#L13

ptp4lOpts: "-2 -s --summary_interval -4"
The "-2" is for selecting the IEEE 802.3 transport, according to https://linux.die.net/man/8/ptp4l
It's therefore possible that the observed behavior is not related to the ptp4l configuration.
Having said that, it's probably a good idea to remove duplicate and seemingly conflicting settings from ptp4lOpts and ptp4lConf to reduce confusion, but this is not a functional / performance issue.

Comment 8 obochan 2022-05-08 07:39:06 UTC
Issue is validated via the PR changed the configuration from UDP to L2.

Comment 10 errata-xmlrpc 2022-08-10 10:52:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069