Bug 1897897

Summary: ptp lose sync openshift 4.6
Product: OpenShift Container Platform Reporter: Sebastian Scheinkman <sscheink>
Component: NetworkingAssignee: Sebastian Scheinkman <sscheink>
Networking sub component: ptp QA Contact: Weibin Liang <weliang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: aos-bugs, dosmith, eparis, fsimonce, huirwang, jokerman, keyoung, mlichvar, msivak, pvaanane, vgrinber, william.caban, zshi
Version: 4.6.z   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:33:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1771572    
Attachments:
Description Flags
Behavior with RT Kernel none

Description Sebastian Scheinkman 2020-11-15 12:14:12 UTC
Description of problem:
ptp slaves lose synchronization


Version-Release number of selected component (if applicable):

Nic information

ethtool -i ens7f0
driver: i40e
version: 2.8.20-k
firmware-version: 7.10 0x800075e6 19.5.12
expansion-rom-version: 
bus-info: 0000:88:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

sofware version:
[root@cnfde4 /]# ptp4l -v
2.0
[root@cnfde4 /]# phc2sys -v
2.0

Kernel verison:

uname -a
Linux cnfde4.ptp.lab.eng.bos.redhat.com 4.18.0-193.28.1.rt13.77.el8_2.x86_64 #1 SMP PREEMPT RT Fri Oct 16 14:11:07 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:
100%

We are using ptp on top of openshift cluster for time synchronization and I am able to see drops in the logs.

The topology is 2 nodes one grandmaster and one slave connected via a switch.


Logs from Grandmaster - 

phc2sys[242399.800]: [ens7f0] CLOCK_REALTIME rms   16 max   16 freq -84548 +/-   0 delay  1015 +/-   0
ptp4l[242400.376]: [ens7f0] master offset         28 s2 freq   +2196 path delay      1495
phc2sys[242400.800]: [ens7f0] CLOCK_REALTIME rms    6 max    6 freq -84554 +/-   0 delay  1009 +/-   0
ptp4l[242401.376]: [ens7f0] master offset        -21 s2 freq   +2156 path delay      1496
phc2sys[242401.800]: [ens7f0] CLOCK_REALTIME rms    1 max    1 freq -84557 +/-   0 delay  1017 +/-   0
phc2sys[242402.801]: [ens7f0] CLOCK_REALTIME rms    0 max   -0 freq -84557 +/-   0 delay  1016 +/-   0
phc2sys[242403.801]: [ens7f0] CLOCK_REALTIME rms   18 max   18 freq -84575 +/-   0 delay  1009 +/-   0
phc2sys[242404.801]: [ens7f0] CLOCK_REALTIME rms   20 max   20 freq -84583 +/-   0 delay  1013 +/-   0
phc2sys[242405.801]: [ens7f0] CLOCK_REALTIME rms    0 max   -0 freq -84569 +/-   0 delay  1017 +/-   0
phc2sys[242406.801]: [ens7f0] CLOCK_REALTIME rms    0 max   -0 freq -84569 +/-   0 delay  1012 +/-   0
phc2sys[242407.801]: [ens7f0] CLOCK_REALTIME rms    5 max    5 freq -84574 +/-   0 delay  1014 +/-   0
phc2sys[242408.801]: [ens7f0] CLOCK_REALTIME rms    9 max    9 freq -84579 +/-   0 delay  1019 +/-   0
ptp4l[242408.920]: [ens7f0] port 1: SLAVE to LISTENING on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
ptp4l[242408.920]: [ens7f0] selected local clock 40a6b7.fffe.0e2b30 as best master
phc2sys[242409.801]: [ens7f0] port 40a6b7.fffe.0e2b30-1 changed state
phc2sys[242409.802]: [ens7f0] reconfiguring after port state change
phc2sys[242409.802]: [ens7f0] selecting ens7f0 for synchronization
phc2sys[242409.802]: [ens7f0] nothing to synchronize
ptp4l[242415.343]: [ens7f0] selected local clock 40a6b7.fffe.0e2b30 as best master
I1115 12:03:14.216277 2863254 main.go:98] ticker pull
ptp4l[242423.145]: [ens7f0] selected local clock 40a6b7.fffe.0e2b30 as best master
ptp4l[242429.167]: [ens7f0] selected local clock 40a6b7.fffe.0e2b30 as best master
ptp4l[242432.964]: [ens7f0] selected best master clock 40a6b7.fffe.0e43d0
ptp4l[242432.964]: [ens7f0] port 1: LISTENING to UNCALIBRATED on RS_SLAVE
phc2sys[242433.803]: [ens7f0] port 40a6b7.fffe.0e2b30-1 changed state
phc2sys[242433.803]: [ens7f0] reconfiguring after port state change
phc2sys[242433.803]: [ens7f0] master clock not ready, waiting...
ptp4l[242434.963]: [ens7f0] master offset        577 s2 freq   +2747 path delay      1492
ptp4l[242434.963]: [ens7f0] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
phc2sys[242435.804]: [ens7f0] port 40a6b7.fffe.0e2b30-1 changed state
phc2sys[242435.804]: [ens7f0] reconfiguring after port state change
phc2sys[242435.804]: [ens7f0] selecting CLOCK_REALTIME for synchronization
phc2sys[242435.804]: [ens7f0] selecting ens7f0 as the master clock
phc2sys[242435.804]: [ens7f0] CLOCK_REALTIME rms 1938345 max 1938345 freq -2022918 +/-   0 delay  1023 +/-   0
ptp4l[242435.963]: [ens7f0] master offset        -10 s2 freq   +2333 path delay      1492
phc2sys[242436.804]: [ens7f0] CLOCK_REALTIME rms 3472 max 3472 freq -669549 +/-   0 delay  1026 +/-   0
ptp4l[242436.965]: [ens7f0] master offset       -167 s2 freq   +2173 path delay      1492


Logs from slave:
phc2sys[242976.503]: [ens7f0] ens7f0 rms   16 max   16 freq  +1215 +/-   0 delay  1349 +/-   0
phc2sys[242977.503]: [ens7f0] ens7f0 rms   17 max   17 freq  +1186 +/-   0 delay  1354 +/-   0
phc2sys[242978.503]: [ens7f0] ens7f0 rms    9 max    9 freq  +1207 +/-   0 delay  1356 +/-   0
ptp4l[242979.485]: [ens7f0] port 1: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
phc2sys[242979.503]: [ens7f0] port 40a6b7.fffe.0e43d0-1 changed state
phc2sys[242979.613]: [ens7f0] reconfiguring after port state change
phc2sys[242979.613]: [ens7f0] selecting ens7f0 for synchronization
phc2sys[242979.613]: [ens7f0] selecting CLOCK_REALTIME as the master clock
phc2sys[242979.613]: [ens7f0] ens7f0 rms   11 max   11 freq  +1190 +/-   0 delay  1340 +/-   0
phc2sys[242980.613]: [ens7f0] ens7f0 rms   15 max   15 freq  +1213 +/-   0 delay  1362 +/-   0
phc2sys[242982.613]: [ens7f0] ens7f0 rms    1 max    1 freq  +1201 +/-   0 delay  1360 +/-   0
phc2sys[242983.614]: [ens7f0] ens7f0 rms   18 max   18 freq  +1184 +/-   0 delay  1344 +/-   0
phc2sys[242984.614]: [ens7f0] ens7f0 rms    5 max    5 freq  +1202 +/-   0 delay  1357 +/-   0
phc2sys[242985.614]: [ens7f0] ens7f0 rms   14 max   14 freq  +1212 +/-   0 delay  1365 +/-   0
phc2sys[242986.614]: [ens7f0] ens7f0 rms    1 max    1 freq  +1201 +/-   0 delay  1349 +/-   0
phc2sys[242987.614]: [ens7f0] ens7f0 rms   12 max   12 freq  +1214 +/-   0 delay  1448 +/-   0
phc2sys[242988.614]: [ens7f0] ens7f0 rms    7 max    7 freq  +1199 +/-   0 delay  1352 +/-   0
phc2sys[242989.615]: [ens7f0] ens7f0 rms   18 max   18 freq  +1185 +/-   0 delay  1348 +/-   0
phc2sys[242990.615]: [ens7f0] ens7f0 rms   12 max   12 freq  +1186 +/-   0 delay  1354 +/-   0
phc2sys[242991.615]: [ens7f0] ens7f0 rms    2 max    2 freq  +1196 +/-   0 delay  1366 +/-   0
phc2sys[242992.615]: [ens7f0] ens7f0 rms   22 max   22 freq  +1217 +/-   0 delay  1347 +/-   0
phc2sys[242993.615]: [ens7f0] ens7f0 rms   10 max   10 freq  +1192 +/-   0 delay  1354 +/-   0
phc2sys[242994.615]: [ens7f0] ens7f0 rms   14 max   14 freq  +1213 +/-   0 delay  1343 +/-   0
phc2sys[242995.616]: [ens7f0] ens7f0 rms    3 max    3 freq  +1200 +/-   0 delay  1347 +/-   0
phc2sys[242996.616]: [ens7f0] ens7f0 rms    3 max    3 freq  +1205 +/-   0 delay  1342 +/-   0
phc2sys[242997.616]: [ens7f0] ens7f0 rms   17 max   17 freq  +1186 +/-   0 delay  1351 +/-   0
phc2sys[242998.616]: [ens7f0] ens7f0 rms    5 max    5 freq  +1193 +/-   0 delay  1363 +/-   0
phc2sys[242999.616]: [ens7f0] ens7f0 rms    5 max    5 freq  +1201 +/-   0 delay  1344 +/-   0
phc2sys[243000.616]: [ens7f0] ens7f0 rms    1 max    1 freq  +1197 +/-   0 delay  1351 +/-   0
ptp4l[243000.635]: [ens7f0] port 1: FAULTY to LISTENING on INIT_COMPLETE
phc2sys[243001.617]: [ens7f0] ens7f0 rms   32 max   32 freq  +1229 +/-   0 delay  1344 +/-   0
phc2sys[243002.617]: [ens7f0] ens7f0 rms   20 max   20 freq  +1187 +/-   0 delay  1348 +/-   0
phc2sys[243003.617]: [ens7f0] ens7f0 rms    2 max    2 freq  +1199 +/-   0 delay  1341 +/-   0
phc2sys[243004.617]: [ens7f0] ens7f0 rms    5 max    5 freq  +1195 +/-   0 delay  1358 +/-   0
phc2sys[243005.617]: [ens7f0] ens7f0 rms    9 max    9 freq  +1190 +/-   0 delay  1375 +/-   0
phc2sys[243006.617]: [ens7f0] ens7f0 rms   33 max   33 freq  +1229 +/-   0 delay  1344 +/-   0
phc2sys[243007.618]: [ens7f0] ens7f0 rms    2 max    2 freq  +1208 +/-   0 delay  1344 +/-   0
ptp4l[243008.072]: [ens7f0] port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
ptp4l[243008.073]: [ens7f0] selected local clock 40a6b7.fffe.0e43d0 as best master
ptp4l[243008.073]: [ens7f0] assuming the grand master role
phc2sys[243008.618]: [ens7f0] port 40a6b7.fffe.0e43d0-1 changed state
phc2sys[243008.618]: [ens7f0] reconfiguring after port state change
phc2sys[243008.618]: [ens7f0] selecting ens7f0 for synchronization
phc2sys[243008.618]: [ens7f0] selecting CLOCK_REALTIME as the master clock
phc2sys[243008.618]: [ens7f0] ens7f0 rms   17 max   17 freq  +1208 +/-   0 delay  1357 +/-   0
phc2sys[243009.618]: [ens7f0] ens7f0 rms   29 max   29 freq  +1195 +/-   0 delay  1362 +/-   0
phc2sys[243010.618]: [ens7f0] ens7f0 rms   17 max   17 freq  +1178 +/-   0 delay  1353 +/-   0

Comment 1 Miroslav Lichvar 2020-11-23 14:15:51 UTC
I guess the grandmaster and slave logs are swapped? Can you post their configuration and command-line options?

The master's fault could be a missing transmit timestamp, but there should be an error message. Try setting logging_level to 7 and if that doesn't give an explanation for the fault, please try it again in strace.

Comment 2 Sebastian Scheinkman 2020-11-25 11:48:12 UTC
Hi Miroslav,

This is the config

GrandMaster:

/usr/sbin/phc2sys -a -r -r -n 24 -m -u 1 -z /var/run/ptp4l.0.socket -t [ens5f0]
/usr/sbin/ptp4l -f /var/run/ptp4l.0.config -i ens5f0 -2 --summary_interval 0 -m

[global]
#
# Default Data Set
#
twoStepFlag		1
slaveOnly		0
priority1		128
priority2		128
domainNumber		24
#utc_offset		37
clockClass		248
clockAccuracy		0xFE
offsetScaledLogVariance	0xFFFF
free_running		0
freq_est_interval	1
dscp_event		0
dscp_general		0
dataset_comparison	ieee1588
G.8275.defaultDS.localPriority	128
#
# Port Data Set
#
logAnnounceInterval	-3
logSyncInterval		-4
logMinDelayReqInterval	-4
logMinPdelayReqInterval	-4
announceReceiptTimeout	3
syncReceiptTimeout	0
delayAsymmetry		0
fault_reset_interval	4
neighborPropDelayThresh	20000000
masterOnly		0
G.8275.portDS.localPriority	128
#
# Run time options
#
assume_two_step		0
logging_level		7
path_trace_enabled	0
follow_up_info		0
hybrid_e2e		0
inhibit_multicast_service	0
net_sync_monitor	0
tc_spanning_tree	0
tx_timestamp_timeout	1
unicast_listen		0
unicast_master_table	0
unicast_req_duration	3600
use_syslog		1
verbose			0
summary_interval	0
kernel_leap		1
check_fup_sync		0
#
# Servo Options
#
pi_proportional_const	0.0
pi_integral_const	0.0
pi_proportional_scale	0.0
pi_proportional_exponent	-0.3
pi_proportional_norm_max	0.7
pi_integral_scale	0.0
pi_integral_exponent	0.4
pi_integral_norm_max	0.3
step_threshold		0.0
first_step_threshold	0.00002
max_frequency		900000000
clock_servo		pi
sanity_freq_limit	200000000
ntpshm_segment		0
#
# Transport options
#
transportSpecific	0x0
ptp_dst_mac		01:1B:19:00:00:00
p2p_dst_mac		01:80:C2:00:00:0E
udp_ttl			1
udp6_scope		0x0E
uds_address		/var/run/ptp4l
#
# Default interface options
#
clock_type		OC
network_transport	UDPv4
delay_mechanism		E2E
time_stamping		hardware
tsproc_mode		filter
delay_filter		moving_median
delay_filter_length	10
egressLatency		0
ingressLatency		0
boundary_clock_jbod	0
#
# Clock description
#
productDescription	;;
revisionData		;;
manufacturerIdentity	00:00:00
userDescription		;
timeSource		0xA0

uds_address /var/run/ptp4l.0.socket
message_tag [ens5f0]


Slave:

/usr/sbin/phc2sys -a -r -n 24 -m -u 1 -z /var/run/ptp4l.0.socket -t [ens5f0]
/usr/sbin/ptp4l -f /var/run/ptp4l.0.config -i ens5f0 -2 -s --summary_interval -4 -m

[global]
twoStepFlag     1
slaveOnly       0
priority1       128
priority2       90
domainNumber        24
#utc_offset     37
clockClass      248
clockAccuracy       0xFE
offsetScaledLogVariance 0xFFFF
free_running        0
freq_est_interval   1
dscp_event      0
dscp_general        0
dataset_comparison  ieee1588
G.8275.defaultDS.localPriority  200
#
# Port Data Set
#
logAnnounceInterval -3
logSyncInterval     -4
logMinDelayReqInterval  -4
logMinPdelayReqInterval -4
announceReceiptTimeout  3
syncReceiptTimeout  0
delayAsymmetry      0
fault_reset_interval    4
neighborPropDelayThresh 20000000
masterOnly      0
G.8275.portDS.localPriority 200
#
# Run time options
#
assume_two_step     0
logging_level       7
path_trace_enabled  0
follow_up_info      0
hybrid_e2e      0
inhibit_multicast_service   0
net_sync_monitor    0
tc_spanning_tree    0
tx_timestamp_timeout    1
unicast_listen      0
unicast_master_table    0
unicast_req_duration    3600
use_syslog      1
verbose         0
summary_interval    -4
kernel_leap     1
check_fup_sync      0
#
# Servo Options
#
pi_proportional_const   0.0
pi_integral_const   0.0
pi_proportional_scale   0.0
pi_proportional_exponent    -0.3
pi_proportional_norm_max    0.7
pi_integral_scale   0.0
pi_integral_exponent    0.4
pi_integral_norm_max    0.3
step_threshold      0.0
first_step_threshold    0.00002
max_frequency       900000000
clock_servo     pi
sanity_freq_limit   200000000
ntpshm_segment      0
#
# Transport options
#
transportSpecific   0x0
ptp_dst_mac     01:1B:19:00:00:00
p2p_dst_mac     01:80:C2:00:00:0E
udp_ttl         1
udp6_scope      0x0E
uds_address     /var/run/ptp4l
#
# Default interface options
#
clock_type      OC
#network_transport  UDPv4
network_transport   L2
delay_mechanism     E2E
time_stamping       hardware
tsproc_mode     filter
delay_filter        moving_median
delay_filter_length 10
egressLatency       0
ingressLatency      0
boundary_clock_jbod 0
#
# Clock description
#
productDescription  ;;
revisionData        ;;
manufacturerIdentity    00:00:00
userDescription     ;
timeSource      0xA0

uds_address /var/run/ptp4l.0.socket
message_tag [ens5f0]


I see this error from time to time on the slave

ptp4l[54659.120]: [ens5f0] delay   filtered        173   raw        173
ptp4l[54659.167]: [ens5f0] master offset         -9 s2 freq   -6845 path delay       173
phc2sys[54659.185]: [ens5f0] CLOCK_REALTIME rms   51 max   51 freq -86118 +/-   0 delay  1107 +/-   0
ptp4l[54659.230]: [ens5f0] master offset         -2 s2 freq   -6834 path delay       173
ptp4l[54659.241]: [ens5f0] port 1: delay timeout
ptp4l[54659.241]: [ens5f0] delay   filtered        173   raw        177
ptp4l[54659.280]: [ens5f0] port 1: delay timeout
ptp4l[54659.280]: [ens5f0] delay   filtered        174   raw        176
ptp4l[54659.292]: [ens5f0] master offset         -6 s2 freq   -6841 path delay       174
ptp4l[54659.355]: [ens5f0] master offset        -10 s2 freq   -6849 path delay       174
ptp4l[54659.370]: [ens5f0] port 1: delay timeout
ptp4l[54659.371]: [ens5f0] delay   filtered        173   raw        172
ptp4l[54659.404]: [ens5f0] port 1: delay timeout
ptp4l[54659.404]: [ens5f0] delay   filtered        173   raw        175
ptp4l[54659.417]: [ens5f0] port 1: delay timeout
ptp4l[54659.417]: [ens5f0] delay   filtered        174   raw        175
ptp4l[54659.417]: [ens5f0] master offset         -7 s2 freq   -6845 path delay       174
ptp4l[54659.480]: [ens5f0] master offset        -15 s2 freq   -6859 path delay       174
ptp4l[54659.504]: [ens5f0] port 1: delay timeout
ptp4l[54659.504]: [ens5f0] delay   filtered        174   raw        165
ptp4l[54659.542]: [ens5f0] master offset        -13 s2 freq   -6857 path delay       174
ptp4l[54659.605]: [ens5f0] master offset         -7 s2 freq   -6848 path delay       174
ptp4l[54659.613]: [ens5f0] port 1: delay timeout
ptp4l[54659.615]: [ens5f0] port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
phc2sys[54660.185]: [ens5f0] port 0c42a1.fffe.1df97e-1 changed state
ptp4l[54663.637]: [ens5f0] waiting 2^{4} seconds to clear fault on port 1
phc2sys[54664.188]: [ens5f0] reconfiguring after port state change
phc2sys[54664.188]: [ens5f0] selecting ens5f0 for synchronization
phc2sys[54664.188]: [ens5f0] nothing to synchronize
ptp4l[54679.637]: [ens5f0] clearing fault on port 1
ptp4l[54679.637]: [ens5f0] config item ens5f0.logMinDelayReqInterval is -4
ptp4l[54679.638]: [ens5f0] config item ens5f0.logAnnounceInterval is -3
ptp4l[54679.638]: [ens5f0] config item ens5f0.announceReceiptTimeout is 3
ptp4l[54679.638]: [ens5f0] config item ens5f0.syncReceiptTimeout is 0
ptp4l[54679.638]: [ens5f0] config item ens5f0.transportSpecific is 0
ptp4l[54679.638]: [ens5f0] config item ens5f0.ignore_transport_specific is 0
ptp4l[54679.638]: [ens5f0] config item ens5f0.masterOnly is 0
ptp4l[54679.638]: [ens5f0] config item ens5f0.G.8275.portDS.localPriority is 200
ptp4l[54679.638]: [ens5f0] config item ens5f0.logSyncInterval is -4
ptp4l[54679.638]: [ens5f0] config item ens5f0.logMinPdelayReqInterval is -4
ptp4l[54679.638]: [ens5f0] config item ens5f0.neighborPropDelayThresh is 20000000
ptp4l[54679.638]: [ens5f0] config item ens5f0.min_neighbor_prop_delay is -20000000
ptp4l[54679.638]: [ens5f0] config item ens5f0.ptp_dst_mac is '01:1B:19:00:00:00'
ptp4l[54679.638]: [ens5f0] config item ens5f0.p2p_dst_mac is '01:80:C2:00:00:0E'
ptp4l[54679.652]: [ens5f0] tx_type   1 not 1
ptp4l[54679.652]: [ens5f0] rx_filter 1 not 12
ptp4l[54679.652]: [ens5f0] port 1: FAULTY to LISTENING on INIT_COMPLETE
ptp4l[54679.652]: [ens5f0] port 1: ignoring message
ptp4l[54679.652]: [ens5f0] port 1: received link status notification
ptp4l[54679.652]: [ens5f0] interface index 4 is up
ptp4l[54679.669]: [ens5f0] port 1: setting asCapable
ptp4l[54679.669]: [ens5f0] port 1: new foreign master 0c42a1.fffe.1df99a-1
ptp4l[54679.919]: [ens5f0] selected best master clock 0c42a1.fffe.1df99a
ptp4l[54679.919]: [ens5f0] port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[54679.926]: [ens5f0] master offset        265 s2 freq   -6384 path delay       174
ptp4l[54679.926]: [ens5f0] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[54679.988]: [ens5f0] master offset        219 s2 freq   -6437 path delay       174
ptp4l[54680.049]: [ens5f0] port 1: delay timeout
ptp4l[54680.049]: [ens5f0] delay   filtered        174   raw        174
ptp4l[54680.051]: [ens5f0] master offset        218 s2 freq   -6417 path delay       174
ptp4l[54680.080]: [ens5f0] port 1: delay timeout
ptp4l[54680.080]: [ens5f0] delay   filtered        175   raw        189
ptp4l[54680.113]: [ens5f0] master offset        181 s2 freq   -6458 path delay       175
ptp4l[54680.176]: [ens5f0] master offset        147 s2 freq   -6498 path delay       175



On the master the "ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES" is only when I start the process for the first time only so I think is fine. here are the logs

ptp4l[53298.892]: [ens5f0] config item (null).assume_two_step is 0
ptp4l[53298.892]: [ens5f0] config item (null).check_fup_sync is 0
ptp4l[53298.892]: [ens5f0] config item (null).tx_timestamp_timeout is 1
ptp4l[53298.892]: [ens5f0] config item (null).clock_servo is 0
ptp4l[53298.892]: [ens5f0] config item (null).clock_type is 32768
ptp4l[53298.892]: [ens5f0] config item (null).clock_servo is 0
ptp4l[53298.892]: [ens5f0] config item (null).clockClass is 248
ptp4l[53298.892]: [ens5f0] config item (null).clockAccuracy is 254
ptp4l[53298.892]: [ens5f0] config item (null).offsetScaledLogVariance is 65535
ptp4l[53298.892]: [ens5f0] config item (null).productDescription is ';;'
ptp4l[53298.892]: [ens5f0] config item (null).revisionData is ';;'
ptp4l[53298.892]: [ens5f0] config item (null).userDescription is ';'
ptp4l[53298.892]: [ens5f0] config item (null).manufacturerIdentity is '00:00:00'
ptp4l[53298.892]: [ens5f0] config item (null).domainNumber is 24
ptp4l[53298.892]: [ens5f0] config item (null).slaveOnly is 0
ptp4l[53298.892]: [ens5f0] config item (null).gmCapable is 1
ptp4l[53298.892]: [ens5f0] config item (null).gmCapable is 1
ptp4l[53298.892]: [ens5f0] config item (null).G.8275.defaultDS.localPriority is 128
ptp4l[53298.892]: [ens5f0] config item (null).time_stamping is 1
ptp4l[53298.892]: [ens5f0] config item (null).twoStepFlag is 1
ptp4l[53298.892]: [ens5f0] config item (null).twoStepFlag is 1
ptp4l[53298.893]: [ens5f0] config item (null).time_stamping is 1
ptp4l[53298.893]: [ens5f0] config item (null).priority1 is 128
ptp4l[53298.893]: [ens5f0] config item (null).priority2 is 128
phc2sys[53298.893]: [ens5f0] Waiting for ptp4l...
ptp4l[53298.893]: [ens5f0] interface index 4 is up
ptp4l[53298.893]: [ens5f0] config item (null).free_running is 0
ptp4l[53298.893]: [ens5f0] selected /dev/ptp2 as PTP clock
ptp4l[53298.893]: [ens5f0] config item (null).uds_address is '/var/run/ptp4l.0.socket'
ptp4l[53298.893]: [ens5f0] section item /var/run/ptp4l.0.socket.announceReceiptTimeout now 0
ptp4l[53298.893]: [ens5f0] section item /var/run/ptp4l.0.socket.delay_mechanism now 0
ptp4l[53298.893]: [ens5f0] section item /var/run/ptp4l.0.socket.network_transport now 0
ptp4l[53298.893]: [ens5f0] section item /var/run/ptp4l.0.socket.delay_filter_length now 1
ptp4l[53298.893]: [ens5f0] config item (null).free_running is 0
ptp4l[53298.893]: [ens5f0] config item (null).freq_est_interval is 1
ptp4l[53298.893]: [ens5f0] config item (null).gmCapable is 1
ptp4l[53298.893]: [ens5f0] config item (null).kernel_leap is 1
ptp4l[53298.893]: [ens5f0] config item (null).utc_offset is 37
ptp4l[53298.893]: [ens5f0] config item (null).timeSource is 160
ptp4l[53298.893]: [ens5f0] config item (null).pi_proportional_const is 0.000000
ptp4l[53298.893]: [ens5f0] config item (null).pi_integral_const is 0.000000
ptp4l[53298.893]: [ens5f0] config item (null).pi_proportional_scale is 0.000000
ptp4l[53298.893]: [ens5f0] config item (null).pi_proportional_exponent is -0.300000
ptp4l[53298.893]: [ens5f0] config item (null).pi_proportional_norm_max is 0.700000
ptp4l[53298.893]: [ens5f0] config item (null).pi_integral_scale is 0.000000
ptp4l[53298.893]: [ens5f0] config item (null).pi_integral_exponent is 0.400000
ptp4l[53298.893]: [ens5f0] config item (null).pi_integral_norm_max is 0.300000
ptp4l[53298.893]: [ens5f0] config item (null).step_threshold is 0.000000
ptp4l[53298.893]: [ens5f0] config item (null).first_step_threshold is 0.000020
ptp4l[53298.893]: [ens5f0] config item (null).max_frequency is 900000000
ptp4l[53298.893]: [ens5f0] config item (null).dataset_comparison is 0
ptp4l[53298.893]: [ens5f0] config item (null).delay_filter_length is 10
ptp4l[53298.893]: [ens5f0] config item (null).delay_filter is 1
ptp4l[53298.893]: [ens5f0] config item (null).tsproc_mode is 0
ptp4l[53298.893]: [ens5f0] config item (null).initial_delay is 0
ptp4l[53298.893]: [ens5f0] config item (null).summary_interval is 0
ptp4l[53298.893]: [ens5f0] config item (null).sanity_freq_limit is 200000000
ptp4l[53298.893]: [ens5f0] PI servo: sync interval 1.000 kp 0.700 ki 0.300000
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.boundary_clock_jbod is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.network_transport is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.delayAsymmetry is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.follow_up_info is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.freq_est_interval is 1
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.net_sync_monitor is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.path_trace_enabled is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.tc_spanning_tree is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.ingressLatency is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.egressLatency is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.delay_mechanism is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.hybrid_e2e is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.fault_badpeernet_interval is 16
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.fault_reset_interval is 4
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.delay_filter_length is 1
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.delay_filter is 1
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.tsproc_mode is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.boundary_clock_jbod is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.network_transport is 3
ptp4l[53298.893]: [ens5f0] config item ens5f0.delayAsymmetry is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.follow_up_info is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.freq_est_interval is 1
ptp4l[53298.893]: [ens5f0] config item ens5f0.net_sync_monitor is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.path_trace_enabled is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.tc_spanning_tree is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.ingressLatency is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.egressLatency is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.delay_mechanism is 1
ptp4l[53298.893]: [ens5f0] config item ens5f0.unicast_master_table is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.unicast_listen is 0
ptp4l[53298.894]: [ens5f0] config item ens5f0.hybrid_e2e is 0
ptp4l[53298.894]: [ens5f0] config item ens5f0.fault_badpeernet_interval is 16
ptp4l[53298.894]: [ens5f0] config item ens5f0.fault_reset_interval is 4
ptp4l[53298.894]: [ens5f0] config item ens5f0.delay_filter_length is 10
ptp4l[53298.894]: [ens5f0] config item ens5f0.delay_filter is 1
ptp4l[53298.894]: [ens5f0] config item ens5f0.tsproc_mode is 0
ptp4l[53298.894]: [ens5f0] config item ens5f0.logMinDelayReqInterval is -4
ptp4l[53298.894]: [ens5f0] config item ens5f0.logAnnounceInterval is -3
ptp4l[53298.894]: [ens5f0] config item ens5f0.announceReceiptTimeout is 3
ptp4l[53298.894]: [ens5f0] config item ens5f0.syncReceiptTimeout is 0
ptp4l[53298.894]: [ens5f0] config item ens5f0.transportSpecific is 0
ptp4l[53298.894]: [ens5f0] config item ens5f0.ignore_transport_specific is 0
ptp4l[53298.894]: [ens5f0] config item ens5f0.masterOnly is 0
ptp4l[53298.894]: [ens5f0] config item ens5f0.G.8275.portDS.localPriority is 128
ptp4l[53298.894]: [ens5f0] config item ens5f0.logSyncInterval is -4
ptp4l[53298.894]: [ens5f0] config item ens5f0.logMinPdelayReqInterval is -4
ptp4l[53298.894]: [ens5f0] config item ens5f0.neighborPropDelayThresh is 20000000
ptp4l[53298.894]: [ens5f0] config item ens5f0.min_neighbor_prop_delay is -20000000
ptp4l[53298.894]: [ens5f0] config item ens5f0.ptp_dst_mac is '01:1B:19:00:00:00'
ptp4l[53298.894]: [ens5f0] config item ens5f0.p2p_dst_mac is '01:80:C2:00:00:0E'
ptp4l[53298.902]: [ens5f0] tx_type   1 not 1
ptp4l[53298.902]: [ens5f0] rx_filter 1 not 12
ptp4l[53298.902]: [ens5f0] port 1: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.logMinDelayReqInterval is -4
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.logAnnounceInterval is -3
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.announceReceiptTimeout is 0
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.syncReceiptTimeout is 0
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.transportSpecific is 0
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.ignore_transport_specific is 0
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.masterOnly is 0
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.G.8275.portDS.localPriority is 128
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.logSyncInterval is -4
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.logMinPdelayReqInterval is -4
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.neighborPropDelayThresh is 20000000
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.min_neighbor_prop_delay is -20000000
ptp4l[53298.902]: [ens5f0] config item (null).uds_address is '/var/run/ptp4l.0.socket'
ptp4l[53298.902]: [ens5f0] port 0: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[53298.902]: [ens5f0] port 1: received link status notification
ptp4l[53298.902]: [ens5f0] interface index 4 is up
ptp4l[53299.400]: [ens5f0] port 1: announce timeout
ptp4l[53299.400]: [ens5f0] port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
ptp4l[53299.400]: [ens5f0] selected local clock 0c42a1.fffe.1df99a as best master
ptp4l[53299.400]: [ens5f0] assuming the grand master role
ptp4l[53299.401]: [ens5f0] port 1: master tx announce timeout
ptp4l[53299.401]: [ens5f0] port 1: setting asCapable
ptp4l[53299.462]: [ens5f0] port 1: master sync timeout
ptp4l[53299.525]: [ens5f0] port 1: master sync timeout
ptp4l[53299.526]: [ens5f0] port 1: master tx announce timeout
ptp4l[53299.588]: [ens5f0] port 1: master sync timeout
ptp4l[53299.650]: [ens5f0] port 1: master sync timeout
ptp4l[53299.651]: [ens5f0] port 1: master tx announce timeout
ptp4l[53299.713]: [ens5f0] port 1: master sync timeout
ptp4l[53299.775]: [ens5f0] port 1: master sync timeout
ptp4l[53299.776]: [ens5f0] port 1: master tx announce timeout
ptp4l[53299.838]: [ens5f0] port 1: master sync timeout
phc2sys[53299.894]: [ens5f0] Waiting for ptp4l...
ptp4l[53299.894]: [ens5f0] port 0: setting asCapable
ptp4l[53299.900]: [ens5f0] port 1: master sync timeout
ptp4l[53299.901]: [ens5f0] port 1: master tx announce timeout
ptp4l[53299.963]: [ens5f0] port 1: master sync timeout
ptp4l[53300.025]: [ens5f0] port 1: master sync timeout
ptp4l[53300.026]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.088]: [ens5f0] port 1: master sync timeout
ptp4l[53300.150]: [ens5f0] port 1: master sync timeout
ptp4l[53300.151]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.213]: [ens5f0] port 1: master sync timeout
ptp4l[53300.275]: [ens5f0] port 1: master sync timeout
ptp4l[53300.276]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.338]: [ens5f0] port 1: master sync timeout
ptp4l[53300.400]: [ens5f0] port 1: master sync timeout
ptp4l[53300.401]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.463]: [ens5f0] port 1: master sync timeout
ptp4l[53300.526]: [ens5f0] port 1: master sync timeout
ptp4l[53300.526]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.588]: [ens5f0] port 1: master sync timeout
ptp4l[53300.651]: [ens5f0] port 1: master sync timeout
ptp4l[53300.651]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.713]: [ens5f0] port 1: master sync timeout
ptp4l[53300.776]: [ens5f0] port 1: master sync timeout
ptp4l[53300.776]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.838]: [ens5f0] port 1: master sync timeout
phc2sys[53300.894]: [ens5f0] reconfiguring after port state change
phc2sys[53300.894]: [ens5f0] selecting ens5f0 for synchronization
phc2sys[53300.894]: [ens5f0] selecting CLOCK_REALTIME as the master clock
phc2sys[53300.894]: [ens5f0] ens5f0 rms  275 max  275 freq  -2008 +/-   0 delay  1402 +/-   0
ptp4l[53300.901]: [ens5f0] port 1: master sync timeout
ptp4l[53300.901]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.963]: [ens5f0] port 1: master sync timeout
ptp4l[53301.026]: [ens5f0] port 1: master sync timeout
ptp4l[53301.026]: [ens5f0] port 1: master tx announce timeout
ptp4l[53301.088]: [ens5f0] port 1: master sync timeout
ptp4l[53301.151]: [ens5f0] port 1: master sync timeout
ptp4l[53301.151]: [ens5f0] port 1: master tx announce timeout
ptp4l[53301.213]: [ens5f0] port 1: master sync timeout
ptp4l[53301.276]: [ens5f0] port 1: master sync timeout
ptp4l[53301.276]: [ens5f0] port 1: master tx announce timeout
ptp4l[53301.338]: [ens5f0] port 1: master sync timeout
ptp4l[53301.401]: [ens5f0] port 1: master sync timeout
ptp4l[53301.401]: [ens5f0] port 1: master tx announce timeout
ptp4l[53301.463]: [ens5f0] port 1: master sync timeout
ptp4l[53301.526]: [ens5f0] port 1: master sync timeout
ptp4l[53301.526]: [ens5f0] port 1: master tx announce timeout
ptp4l[53301.588]: [ens5f0] port 1: master sync timeout
ptp4l[53301.651]: [ens5f0] port 1: master sync timeout
ptp4l[53301.651]: [ens5f0] port 1: master tx announce timeout
ptp4l[53301.713]: [ens5f0] port 1: master sync timeout
ptp4l[53301.776]: [ens5f0] port 1: master sync timeout
ptp4l[53301.776]: [ens5f0] port 1: master tx announce timeout
ptp4l[53301.839]: [ens5f0] port 1: master sync timeout
phc2sys[53301.894]: [ens5f0] ens5f0 rms  274 max  274 freq  -2007 +/-   0 delay  1395 +/-   0

Comment 3 Pasi Vaananen 2020-11-25 13:10:23 UTC
Sebastian, what are the (presumably Fortville) NICs involved and at what link rate are you running at ? What is the switch model in between, and has it been properly configured as G.8275.1 T-BC ?

Comment 4 Miroslav Lichvar 2020-11-26 09:25:59 UTC
I have troubles finding a code path that would give this strange output with no specific error message. I assume there are no issues with memory allocation.

Can you please run ptp4l in strace and post the output around that "SLAVE/MASTER to FAULTY" message? Can you please also capture all received and transmitted PTP packets around that point?

Comment 5 Ken Young 2020-11-26 23:25:00 UTC
Miroslav,

We are getting delay timeout messages:

ptp4l[73182.878]: [ens5f0] port 1: delay timeout
ptp4l[73182.888]: [ens5f0] port 1: delay timeout
ptp4l[73182.966]: [ens5f0] port 1: delay timeout
ptp4l[73183.073]: [ens5f0] port 1: delay timeout
ptp4l[73183.148]: [ens5f0] port 1: delay timeout
ptp4l[73183.161]: [ens5f0] port 1: delay timeout
ptp4l[73183.249]: [ens5f0] port 1: delay timeout
ptp4l[73183.343]: [ens5f0] port 1: delay timeout
ptp4l[73183.345]: [ens5f0] port 1: delay timeout
ptp4l[73183.380]: [ens5f0] port 1: delay timeout
ptp4l[73183.496]: [ens5f0] port 1: delay timeout
ptp4l[73183.529]: [ens5f0] port 1: delay timeout
ptp4l[73183.585]: [ens5f0] port 1: delay timeout
ptp4l[73183.605]: [ens5f0] port 1: delay timeout
ptp4l[73183.705]: [ens5f0] port 1: delay timeout
ptp4l[73183.819]: [ens5f0] port 1: delay timeout

What could cause this?

Comment 6 Miroslav Lichvar 2020-11-27 09:05:17 UTC
Those are expected debug messages, indicating timeouts for ptp4l sending delay requests.

Comment 7 Ken Young 2020-11-27 22:45:15 UTC
Created attachment 1734255 [details]
Behavior with RT Kernel

In this file, I think interesting occurrence might give us a clue on the issues we are having with the Kernel: 4.18.0-193.24.1.rt13.74.el8_2.dt1.x86_64.  You see regular delay timeouts; approximately every 63ms.  Then, ptp4l spends 4.5 seconds consuming delay responses.  We consume 72 packets without handling a delay timeout.  It says to me that we are struggling to schedule the timeout and handle packets.  It also says to me that we are not having "network" issues.  All the packets are there; the daemon is not processing them.

Comment 11 Sebastian Scheinkman 2020-11-29 18:53:43 UTC
Hi Pasi,

This is the nic

driver: i40e
version: 2.8.20-k
firmware-version: 7.10 0x800075e6 19.5.12
expansion-rom-version: 
bus-info: 0000:86:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
[root@cnfde4 /]# exit
[root@cnfde4 core]# lspci -vvv 0000:86:00.0


[root@cnfde4 core]# lspci -vvv -s 0000:86:00.0
86:00.0 Ethernet controller: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
	Subsystem: Intel Corporation Ethernet 25G 2P XXV710 Adapter
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 66
	NUMA node: 1
	Region 0: Memory at d8000000 (64-bit, prefetchable) [size=16M]
	Region 3: Memory at d9008000 (64-bit, prefetchable) [size=32K]
	Expansion ROM at d3800000 [disabled] [size=512K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [70] MSI-X: Enable+ Count=129 Masked-
		Vector table: BAR=3 offset=00000000
		PBA: BAR=3 offset=00001000
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <16us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [e0] Vital Product Data
		Product Name: XXV710 25GbE Controller
		Read-only fields:
			[V0] Vendor specific: FFV19.5.12
			[PN] Part number: 00M95
			[MN] Manufacture ID: 1028
			[V1] Vendor specific: DSV1028VPDR.VER2.1
			[V3] Vendor specific: DTINIC
			[V4] Vendor specific: DCM1001FFFFFF2101FFFFFF1202FFFFFF2302FFFFFF1403FFFFFF2503FFFFFF1604FFFFFF2704FFFFFF1805FFFFFF2905FFFFFF1A06FFFFFF2B06FFFFFF1C07FFFFFF2D07FFFFFF1E08FFFFFF2F08FFFFFF
			[V5] Vendor specific: NPY2
			[V6] Vendor specific: PMTD
			[V7] Vendor specific: NMVIntel Corp
			[V8] Vendor specific: L1D0
			[RV] Reserved: checksum good, 1 byte(s) reserved
		Read/write fields:
			[Y1] System specific: CCF1
		End
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [140 v1] Device Serial Number 30-25-0e-ff-ff-b7-a6-40
	Capabilities: [1a0 v1] Transaction Processing Hints
		Device specific mode supported
		No steering table available
	Capabilities: [1b0 v1] Access Control Services
		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [1d0 v1] #19
	Kernel driver in use: i40e
	Kernel modules: i40e

[root@cnfde4 core]# toolbox 
Container 'toolbox-root' already exists. Trying to start...
(To remove the container and start with a fresh toolbox, run: sudo podman rm 'toolbox-root')
toolbox-root
Container started successfully. To exit, type 'exit'.
[root@cnfde4 /]# ethtool ens5f0
Settings for ens5f0:
	Supported ports: [ FIBRE ]
	Supported link modes:   25000baseCR/Full 
	Supported pause frame use: Symmetric
	Supports auto-negotiation: Yes
	Supported FEC modes: None BaseR RS
	Advertised link modes:  25000baseCR/Full 
	Advertised pause frame use: No
	Advertised auto-negotiation: Yes
	Advertised FEC modes: None BaseR RS
	Speed: 25000Mb/s
	Duplex: Full
	Port: Direct Attach Copper
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: d
	Wake-on: d
	Current message level: 0x00000007 (7)
			       drv probe link
	Link detected: yes


We enable ptp support on the switch

Dell EMC Networking OS10 Enterprise
Copyright (c) 1999-2020 by Dell Inc. All Rights Reserved.
OS Version: 10.5.1.4
Build Version: 10.5.1.4.249
Build Time: 2020-07-22T21:29:21+0000
System Type: S5248F-ON
Architecture: x86_64
Up Time: 4 weeks 3 days 21:50:48

Comment 12 Sebastian Scheinkman 2020-12-08 11:16:17 UTC
Adding a request resource to the linuxptp-daemon helped make the synchronization more stable, there are other kernel patches for the RT kernel that also improve the synchronization.

Comment 17 Vitaly Grinberg 2020-12-21 09:05:11 UTC
@sscheink 
I can run for several days without a single FAULT or UNCALIBRATED. However sometimes there are sync losses like:
2020-12-21T04:11:19.079286820+00:00 stdout F ptp4l[44415.855]: port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
There are not many of these. It looks something different, probably related to the network / switch. I think we can move this to "verified".

Comment 20 errata-xmlrpc 2021-02-24 15:33:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633