Bug 1897897
| Summary: | ptp lose sync openshift 4.6 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Sebastian Scheinkman <sscheink> | ||||
| Component: | Networking | Assignee: | Sebastian Scheinkman <sscheink> | ||||
| Networking sub component: | ptp | QA Contact: | Weibin Liang <weliang> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | urgent | CC: | aos-bugs, dosmith, eparis, fsimonce, huirwang, jokerman, keyoung, mlichvar, msivak, pvaanane, vgrinber, william.caban, zshi | ||||
| Version: | 4.6.z | ||||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.7.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-02-24 15:33:27 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1771572 | ||||||
| Attachments: |
|
||||||
|
Description
Sebastian Scheinkman
2020-11-15 12:14:12 UTC
I guess the grandmaster and slave logs are swapped? Can you post their configuration and command-line options? The master's fault could be a missing transmit timestamp, but there should be an error message. Try setting logging_level to 7 and if that doesn't give an explanation for the fault, please try it again in strace. Hi Miroslav,
This is the config
GrandMaster:
/usr/sbin/phc2sys -a -r -r -n 24 -m -u 1 -z /var/run/ptp4l.0.socket -t [ens5f0]
/usr/sbin/ptp4l -f /var/run/ptp4l.0.config -i ens5f0 -2 --summary_interval 0 -m
[global]
#
# Default Data Set
#
twoStepFlag 1
slaveOnly 0
priority1 128
priority2 128
domainNumber 24
#utc_offset 37
clockClass 248
clockAccuracy 0xFE
offsetScaledLogVariance 0xFFFF
free_running 0
freq_est_interval 1
dscp_event 0
dscp_general 0
dataset_comparison ieee1588
G.8275.defaultDS.localPriority 128
#
# Port Data Set
#
logAnnounceInterval -3
logSyncInterval -4
logMinDelayReqInterval -4
logMinPdelayReqInterval -4
announceReceiptTimeout 3
syncReceiptTimeout 0
delayAsymmetry 0
fault_reset_interval 4
neighborPropDelayThresh 20000000
masterOnly 0
G.8275.portDS.localPriority 128
#
# Run time options
#
assume_two_step 0
logging_level 7
path_trace_enabled 0
follow_up_info 0
hybrid_e2e 0
inhibit_multicast_service 0
net_sync_monitor 0
tc_spanning_tree 0
tx_timestamp_timeout 1
unicast_listen 0
unicast_master_table 0
unicast_req_duration 3600
use_syslog 1
verbose 0
summary_interval 0
kernel_leap 1
check_fup_sync 0
#
# Servo Options
#
pi_proportional_const 0.0
pi_integral_const 0.0
pi_proportional_scale 0.0
pi_proportional_exponent -0.3
pi_proportional_norm_max 0.7
pi_integral_scale 0.0
pi_integral_exponent 0.4
pi_integral_norm_max 0.3
step_threshold 0.0
first_step_threshold 0.00002
max_frequency 900000000
clock_servo pi
sanity_freq_limit 200000000
ntpshm_segment 0
#
# Transport options
#
transportSpecific 0x0
ptp_dst_mac 01:1B:19:00:00:00
p2p_dst_mac 01:80:C2:00:00:0E
udp_ttl 1
udp6_scope 0x0E
uds_address /var/run/ptp4l
#
# Default interface options
#
clock_type OC
network_transport UDPv4
delay_mechanism E2E
time_stamping hardware
tsproc_mode filter
delay_filter moving_median
delay_filter_length 10
egressLatency 0
ingressLatency 0
boundary_clock_jbod 0
#
# Clock description
#
productDescription ;;
revisionData ;;
manufacturerIdentity 00:00:00
userDescription ;
timeSource 0xA0
uds_address /var/run/ptp4l.0.socket
message_tag [ens5f0]
Slave:
/usr/sbin/phc2sys -a -r -n 24 -m -u 1 -z /var/run/ptp4l.0.socket -t [ens5f0]
/usr/sbin/ptp4l -f /var/run/ptp4l.0.config -i ens5f0 -2 -s --summary_interval -4 -m
[global]
twoStepFlag 1
slaveOnly 0
priority1 128
priority2 90
domainNumber 24
#utc_offset 37
clockClass 248
clockAccuracy 0xFE
offsetScaledLogVariance 0xFFFF
free_running 0
freq_est_interval 1
dscp_event 0
dscp_general 0
dataset_comparison ieee1588
G.8275.defaultDS.localPriority 200
#
# Port Data Set
#
logAnnounceInterval -3
logSyncInterval -4
logMinDelayReqInterval -4
logMinPdelayReqInterval -4
announceReceiptTimeout 3
syncReceiptTimeout 0
delayAsymmetry 0
fault_reset_interval 4
neighborPropDelayThresh 20000000
masterOnly 0
G.8275.portDS.localPriority 200
#
# Run time options
#
assume_two_step 0
logging_level 7
path_trace_enabled 0
follow_up_info 0
hybrid_e2e 0
inhibit_multicast_service 0
net_sync_monitor 0
tc_spanning_tree 0
tx_timestamp_timeout 1
unicast_listen 0
unicast_master_table 0
unicast_req_duration 3600
use_syslog 1
verbose 0
summary_interval -4
kernel_leap 1
check_fup_sync 0
#
# Servo Options
#
pi_proportional_const 0.0
pi_integral_const 0.0
pi_proportional_scale 0.0
pi_proportional_exponent -0.3
pi_proportional_norm_max 0.7
pi_integral_scale 0.0
pi_integral_exponent 0.4
pi_integral_norm_max 0.3
step_threshold 0.0
first_step_threshold 0.00002
max_frequency 900000000
clock_servo pi
sanity_freq_limit 200000000
ntpshm_segment 0
#
# Transport options
#
transportSpecific 0x0
ptp_dst_mac 01:1B:19:00:00:00
p2p_dst_mac 01:80:C2:00:00:0E
udp_ttl 1
udp6_scope 0x0E
uds_address /var/run/ptp4l
#
# Default interface options
#
clock_type OC
#network_transport UDPv4
network_transport L2
delay_mechanism E2E
time_stamping hardware
tsproc_mode filter
delay_filter moving_median
delay_filter_length 10
egressLatency 0
ingressLatency 0
boundary_clock_jbod 0
#
# Clock description
#
productDescription ;;
revisionData ;;
manufacturerIdentity 00:00:00
userDescription ;
timeSource 0xA0
uds_address /var/run/ptp4l.0.socket
message_tag [ens5f0]
I see this error from time to time on the slave
ptp4l[54659.120]: [ens5f0] delay filtered 173 raw 173
ptp4l[54659.167]: [ens5f0] master offset -9 s2 freq -6845 path delay 173
phc2sys[54659.185]: [ens5f0] CLOCK_REALTIME rms 51 max 51 freq -86118 +/- 0 delay 1107 +/- 0
ptp4l[54659.230]: [ens5f0] master offset -2 s2 freq -6834 path delay 173
ptp4l[54659.241]: [ens5f0] port 1: delay timeout
ptp4l[54659.241]: [ens5f0] delay filtered 173 raw 177
ptp4l[54659.280]: [ens5f0] port 1: delay timeout
ptp4l[54659.280]: [ens5f0] delay filtered 174 raw 176
ptp4l[54659.292]: [ens5f0] master offset -6 s2 freq -6841 path delay 174
ptp4l[54659.355]: [ens5f0] master offset -10 s2 freq -6849 path delay 174
ptp4l[54659.370]: [ens5f0] port 1: delay timeout
ptp4l[54659.371]: [ens5f0] delay filtered 173 raw 172
ptp4l[54659.404]: [ens5f0] port 1: delay timeout
ptp4l[54659.404]: [ens5f0] delay filtered 173 raw 175
ptp4l[54659.417]: [ens5f0] port 1: delay timeout
ptp4l[54659.417]: [ens5f0] delay filtered 174 raw 175
ptp4l[54659.417]: [ens5f0] master offset -7 s2 freq -6845 path delay 174
ptp4l[54659.480]: [ens5f0] master offset -15 s2 freq -6859 path delay 174
ptp4l[54659.504]: [ens5f0] port 1: delay timeout
ptp4l[54659.504]: [ens5f0] delay filtered 174 raw 165
ptp4l[54659.542]: [ens5f0] master offset -13 s2 freq -6857 path delay 174
ptp4l[54659.605]: [ens5f0] master offset -7 s2 freq -6848 path delay 174
ptp4l[54659.613]: [ens5f0] port 1: delay timeout
ptp4l[54659.615]: [ens5f0] port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
phc2sys[54660.185]: [ens5f0] port 0c42a1.fffe.1df97e-1 changed state
ptp4l[54663.637]: [ens5f0] waiting 2^{4} seconds to clear fault on port 1
phc2sys[54664.188]: [ens5f0] reconfiguring after port state change
phc2sys[54664.188]: [ens5f0] selecting ens5f0 for synchronization
phc2sys[54664.188]: [ens5f0] nothing to synchronize
ptp4l[54679.637]: [ens5f0] clearing fault on port 1
ptp4l[54679.637]: [ens5f0] config item ens5f0.logMinDelayReqInterval is -4
ptp4l[54679.638]: [ens5f0] config item ens5f0.logAnnounceInterval is -3
ptp4l[54679.638]: [ens5f0] config item ens5f0.announceReceiptTimeout is 3
ptp4l[54679.638]: [ens5f0] config item ens5f0.syncReceiptTimeout is 0
ptp4l[54679.638]: [ens5f0] config item ens5f0.transportSpecific is 0
ptp4l[54679.638]: [ens5f0] config item ens5f0.ignore_transport_specific is 0
ptp4l[54679.638]: [ens5f0] config item ens5f0.masterOnly is 0
ptp4l[54679.638]: [ens5f0] config item ens5f0.G.8275.portDS.localPriority is 200
ptp4l[54679.638]: [ens5f0] config item ens5f0.logSyncInterval is -4
ptp4l[54679.638]: [ens5f0] config item ens5f0.logMinPdelayReqInterval is -4
ptp4l[54679.638]: [ens5f0] config item ens5f0.neighborPropDelayThresh is 20000000
ptp4l[54679.638]: [ens5f0] config item ens5f0.min_neighbor_prop_delay is -20000000
ptp4l[54679.638]: [ens5f0] config item ens5f0.ptp_dst_mac is '01:1B:19:00:00:00'
ptp4l[54679.638]: [ens5f0] config item ens5f0.p2p_dst_mac is '01:80:C2:00:00:0E'
ptp4l[54679.652]: [ens5f0] tx_type 1 not 1
ptp4l[54679.652]: [ens5f0] rx_filter 1 not 12
ptp4l[54679.652]: [ens5f0] port 1: FAULTY to LISTENING on INIT_COMPLETE
ptp4l[54679.652]: [ens5f0] port 1: ignoring message
ptp4l[54679.652]: [ens5f0] port 1: received link status notification
ptp4l[54679.652]: [ens5f0] interface index 4 is up
ptp4l[54679.669]: [ens5f0] port 1: setting asCapable
ptp4l[54679.669]: [ens5f0] port 1: new foreign master 0c42a1.fffe.1df99a-1
ptp4l[54679.919]: [ens5f0] selected best master clock 0c42a1.fffe.1df99a
ptp4l[54679.919]: [ens5f0] port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[54679.926]: [ens5f0] master offset 265 s2 freq -6384 path delay 174
ptp4l[54679.926]: [ens5f0] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[54679.988]: [ens5f0] master offset 219 s2 freq -6437 path delay 174
ptp4l[54680.049]: [ens5f0] port 1: delay timeout
ptp4l[54680.049]: [ens5f0] delay filtered 174 raw 174
ptp4l[54680.051]: [ens5f0] master offset 218 s2 freq -6417 path delay 174
ptp4l[54680.080]: [ens5f0] port 1: delay timeout
ptp4l[54680.080]: [ens5f0] delay filtered 175 raw 189
ptp4l[54680.113]: [ens5f0] master offset 181 s2 freq -6458 path delay 175
ptp4l[54680.176]: [ens5f0] master offset 147 s2 freq -6498 path delay 175
On the master the "ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES" is only when I start the process for the first time only so I think is fine. here are the logs
ptp4l[53298.892]: [ens5f0] config item (null).assume_two_step is 0
ptp4l[53298.892]: [ens5f0] config item (null).check_fup_sync is 0
ptp4l[53298.892]: [ens5f0] config item (null).tx_timestamp_timeout is 1
ptp4l[53298.892]: [ens5f0] config item (null).clock_servo is 0
ptp4l[53298.892]: [ens5f0] config item (null).clock_type is 32768
ptp4l[53298.892]: [ens5f0] config item (null).clock_servo is 0
ptp4l[53298.892]: [ens5f0] config item (null).clockClass is 248
ptp4l[53298.892]: [ens5f0] config item (null).clockAccuracy is 254
ptp4l[53298.892]: [ens5f0] config item (null).offsetScaledLogVariance is 65535
ptp4l[53298.892]: [ens5f0] config item (null).productDescription is ';;'
ptp4l[53298.892]: [ens5f0] config item (null).revisionData is ';;'
ptp4l[53298.892]: [ens5f0] config item (null).userDescription is ';'
ptp4l[53298.892]: [ens5f0] config item (null).manufacturerIdentity is '00:00:00'
ptp4l[53298.892]: [ens5f0] config item (null).domainNumber is 24
ptp4l[53298.892]: [ens5f0] config item (null).slaveOnly is 0
ptp4l[53298.892]: [ens5f0] config item (null).gmCapable is 1
ptp4l[53298.892]: [ens5f0] config item (null).gmCapable is 1
ptp4l[53298.892]: [ens5f0] config item (null).G.8275.defaultDS.localPriority is 128
ptp4l[53298.892]: [ens5f0] config item (null).time_stamping is 1
ptp4l[53298.892]: [ens5f0] config item (null).twoStepFlag is 1
ptp4l[53298.892]: [ens5f0] config item (null).twoStepFlag is 1
ptp4l[53298.893]: [ens5f0] config item (null).time_stamping is 1
ptp4l[53298.893]: [ens5f0] config item (null).priority1 is 128
ptp4l[53298.893]: [ens5f0] config item (null).priority2 is 128
phc2sys[53298.893]: [ens5f0] Waiting for ptp4l...
ptp4l[53298.893]: [ens5f0] interface index 4 is up
ptp4l[53298.893]: [ens5f0] config item (null).free_running is 0
ptp4l[53298.893]: [ens5f0] selected /dev/ptp2 as PTP clock
ptp4l[53298.893]: [ens5f0] config item (null).uds_address is '/var/run/ptp4l.0.socket'
ptp4l[53298.893]: [ens5f0] section item /var/run/ptp4l.0.socket.announceReceiptTimeout now 0
ptp4l[53298.893]: [ens5f0] section item /var/run/ptp4l.0.socket.delay_mechanism now 0
ptp4l[53298.893]: [ens5f0] section item /var/run/ptp4l.0.socket.network_transport now 0
ptp4l[53298.893]: [ens5f0] section item /var/run/ptp4l.0.socket.delay_filter_length now 1
ptp4l[53298.893]: [ens5f0] config item (null).free_running is 0
ptp4l[53298.893]: [ens5f0] config item (null).freq_est_interval is 1
ptp4l[53298.893]: [ens5f0] config item (null).gmCapable is 1
ptp4l[53298.893]: [ens5f0] config item (null).kernel_leap is 1
ptp4l[53298.893]: [ens5f0] config item (null).utc_offset is 37
ptp4l[53298.893]: [ens5f0] config item (null).timeSource is 160
ptp4l[53298.893]: [ens5f0] config item (null).pi_proportional_const is 0.000000
ptp4l[53298.893]: [ens5f0] config item (null).pi_integral_const is 0.000000
ptp4l[53298.893]: [ens5f0] config item (null).pi_proportional_scale is 0.000000
ptp4l[53298.893]: [ens5f0] config item (null).pi_proportional_exponent is -0.300000
ptp4l[53298.893]: [ens5f0] config item (null).pi_proportional_norm_max is 0.700000
ptp4l[53298.893]: [ens5f0] config item (null).pi_integral_scale is 0.000000
ptp4l[53298.893]: [ens5f0] config item (null).pi_integral_exponent is 0.400000
ptp4l[53298.893]: [ens5f0] config item (null).pi_integral_norm_max is 0.300000
ptp4l[53298.893]: [ens5f0] config item (null).step_threshold is 0.000000
ptp4l[53298.893]: [ens5f0] config item (null).first_step_threshold is 0.000020
ptp4l[53298.893]: [ens5f0] config item (null).max_frequency is 900000000
ptp4l[53298.893]: [ens5f0] config item (null).dataset_comparison is 0
ptp4l[53298.893]: [ens5f0] config item (null).delay_filter_length is 10
ptp4l[53298.893]: [ens5f0] config item (null).delay_filter is 1
ptp4l[53298.893]: [ens5f0] config item (null).tsproc_mode is 0
ptp4l[53298.893]: [ens5f0] config item (null).initial_delay is 0
ptp4l[53298.893]: [ens5f0] config item (null).summary_interval is 0
ptp4l[53298.893]: [ens5f0] config item (null).sanity_freq_limit is 200000000
ptp4l[53298.893]: [ens5f0] PI servo: sync interval 1.000 kp 0.700 ki 0.300000
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.boundary_clock_jbod is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.network_transport is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.delayAsymmetry is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.follow_up_info is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.freq_est_interval is 1
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.net_sync_monitor is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.path_trace_enabled is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.tc_spanning_tree is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.ingressLatency is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.egressLatency is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.delay_mechanism is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.hybrid_e2e is 0
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.fault_badpeernet_interval is 16
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.fault_reset_interval is 4
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.delay_filter_length is 1
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.delay_filter is 1
ptp4l[53298.893]: [ens5f0] config item /var/run/ptp4l.0.socket.tsproc_mode is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.boundary_clock_jbod is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.network_transport is 3
ptp4l[53298.893]: [ens5f0] config item ens5f0.delayAsymmetry is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.follow_up_info is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.freq_est_interval is 1
ptp4l[53298.893]: [ens5f0] config item ens5f0.net_sync_monitor is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.path_trace_enabled is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.tc_spanning_tree is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.ingressLatency is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.egressLatency is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.delay_mechanism is 1
ptp4l[53298.893]: [ens5f0] config item ens5f0.unicast_master_table is 0
ptp4l[53298.893]: [ens5f0] config item ens5f0.unicast_listen is 0
ptp4l[53298.894]: [ens5f0] config item ens5f0.hybrid_e2e is 0
ptp4l[53298.894]: [ens5f0] config item ens5f0.fault_badpeernet_interval is 16
ptp4l[53298.894]: [ens5f0] config item ens5f0.fault_reset_interval is 4
ptp4l[53298.894]: [ens5f0] config item ens5f0.delay_filter_length is 10
ptp4l[53298.894]: [ens5f0] config item ens5f0.delay_filter is 1
ptp4l[53298.894]: [ens5f0] config item ens5f0.tsproc_mode is 0
ptp4l[53298.894]: [ens5f0] config item ens5f0.logMinDelayReqInterval is -4
ptp4l[53298.894]: [ens5f0] config item ens5f0.logAnnounceInterval is -3
ptp4l[53298.894]: [ens5f0] config item ens5f0.announceReceiptTimeout is 3
ptp4l[53298.894]: [ens5f0] config item ens5f0.syncReceiptTimeout is 0
ptp4l[53298.894]: [ens5f0] config item ens5f0.transportSpecific is 0
ptp4l[53298.894]: [ens5f0] config item ens5f0.ignore_transport_specific is 0
ptp4l[53298.894]: [ens5f0] config item ens5f0.masterOnly is 0
ptp4l[53298.894]: [ens5f0] config item ens5f0.G.8275.portDS.localPriority is 128
ptp4l[53298.894]: [ens5f0] config item ens5f0.logSyncInterval is -4
ptp4l[53298.894]: [ens5f0] config item ens5f0.logMinPdelayReqInterval is -4
ptp4l[53298.894]: [ens5f0] config item ens5f0.neighborPropDelayThresh is 20000000
ptp4l[53298.894]: [ens5f0] config item ens5f0.min_neighbor_prop_delay is -20000000
ptp4l[53298.894]: [ens5f0] config item ens5f0.ptp_dst_mac is '01:1B:19:00:00:00'
ptp4l[53298.894]: [ens5f0] config item ens5f0.p2p_dst_mac is '01:80:C2:00:00:0E'
ptp4l[53298.902]: [ens5f0] tx_type 1 not 1
ptp4l[53298.902]: [ens5f0] rx_filter 1 not 12
ptp4l[53298.902]: [ens5f0] port 1: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.logMinDelayReqInterval is -4
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.logAnnounceInterval is -3
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.announceReceiptTimeout is 0
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.syncReceiptTimeout is 0
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.transportSpecific is 0
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.ignore_transport_specific is 0
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.masterOnly is 0
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.G.8275.portDS.localPriority is 128
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.logSyncInterval is -4
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.logMinPdelayReqInterval is -4
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.neighborPropDelayThresh is 20000000
ptp4l[53298.902]: [ens5f0] config item /var/run/ptp4l.0.socket.min_neighbor_prop_delay is -20000000
ptp4l[53298.902]: [ens5f0] config item (null).uds_address is '/var/run/ptp4l.0.socket'
ptp4l[53298.902]: [ens5f0] port 0: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[53298.902]: [ens5f0] port 1: received link status notification
ptp4l[53298.902]: [ens5f0] interface index 4 is up
ptp4l[53299.400]: [ens5f0] port 1: announce timeout
ptp4l[53299.400]: [ens5f0] port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
ptp4l[53299.400]: [ens5f0] selected local clock 0c42a1.fffe.1df99a as best master
ptp4l[53299.400]: [ens5f0] assuming the grand master role
ptp4l[53299.401]: [ens5f0] port 1: master tx announce timeout
ptp4l[53299.401]: [ens5f0] port 1: setting asCapable
ptp4l[53299.462]: [ens5f0] port 1: master sync timeout
ptp4l[53299.525]: [ens5f0] port 1: master sync timeout
ptp4l[53299.526]: [ens5f0] port 1: master tx announce timeout
ptp4l[53299.588]: [ens5f0] port 1: master sync timeout
ptp4l[53299.650]: [ens5f0] port 1: master sync timeout
ptp4l[53299.651]: [ens5f0] port 1: master tx announce timeout
ptp4l[53299.713]: [ens5f0] port 1: master sync timeout
ptp4l[53299.775]: [ens5f0] port 1: master sync timeout
ptp4l[53299.776]: [ens5f0] port 1: master tx announce timeout
ptp4l[53299.838]: [ens5f0] port 1: master sync timeout
phc2sys[53299.894]: [ens5f0] Waiting for ptp4l...
ptp4l[53299.894]: [ens5f0] port 0: setting asCapable
ptp4l[53299.900]: [ens5f0] port 1: master sync timeout
ptp4l[53299.901]: [ens5f0] port 1: master tx announce timeout
ptp4l[53299.963]: [ens5f0] port 1: master sync timeout
ptp4l[53300.025]: [ens5f0] port 1: master sync timeout
ptp4l[53300.026]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.088]: [ens5f0] port 1: master sync timeout
ptp4l[53300.150]: [ens5f0] port 1: master sync timeout
ptp4l[53300.151]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.213]: [ens5f0] port 1: master sync timeout
ptp4l[53300.275]: [ens5f0] port 1: master sync timeout
ptp4l[53300.276]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.338]: [ens5f0] port 1: master sync timeout
ptp4l[53300.400]: [ens5f0] port 1: master sync timeout
ptp4l[53300.401]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.463]: [ens5f0] port 1: master sync timeout
ptp4l[53300.526]: [ens5f0] port 1: master sync timeout
ptp4l[53300.526]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.588]: [ens5f0] port 1: master sync timeout
ptp4l[53300.651]: [ens5f0] port 1: master sync timeout
ptp4l[53300.651]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.713]: [ens5f0] port 1: master sync timeout
ptp4l[53300.776]: [ens5f0] port 1: master sync timeout
ptp4l[53300.776]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.838]: [ens5f0] port 1: master sync timeout
phc2sys[53300.894]: [ens5f0] reconfiguring after port state change
phc2sys[53300.894]: [ens5f0] selecting ens5f0 for synchronization
phc2sys[53300.894]: [ens5f0] selecting CLOCK_REALTIME as the master clock
phc2sys[53300.894]: [ens5f0] ens5f0 rms 275 max 275 freq -2008 +/- 0 delay 1402 +/- 0
ptp4l[53300.901]: [ens5f0] port 1: master sync timeout
ptp4l[53300.901]: [ens5f0] port 1: master tx announce timeout
ptp4l[53300.963]: [ens5f0] port 1: master sync timeout
ptp4l[53301.026]: [ens5f0] port 1: master sync timeout
ptp4l[53301.026]: [ens5f0] port 1: master tx announce timeout
ptp4l[53301.088]: [ens5f0] port 1: master sync timeout
ptp4l[53301.151]: [ens5f0] port 1: master sync timeout
ptp4l[53301.151]: [ens5f0] port 1: master tx announce timeout
ptp4l[53301.213]: [ens5f0] port 1: master sync timeout
ptp4l[53301.276]: [ens5f0] port 1: master sync timeout
ptp4l[53301.276]: [ens5f0] port 1: master tx announce timeout
ptp4l[53301.338]: [ens5f0] port 1: master sync timeout
ptp4l[53301.401]: [ens5f0] port 1: master sync timeout
ptp4l[53301.401]: [ens5f0] port 1: master tx announce timeout
ptp4l[53301.463]: [ens5f0] port 1: master sync timeout
ptp4l[53301.526]: [ens5f0] port 1: master sync timeout
ptp4l[53301.526]: [ens5f0] port 1: master tx announce timeout
ptp4l[53301.588]: [ens5f0] port 1: master sync timeout
ptp4l[53301.651]: [ens5f0] port 1: master sync timeout
ptp4l[53301.651]: [ens5f0] port 1: master tx announce timeout
ptp4l[53301.713]: [ens5f0] port 1: master sync timeout
ptp4l[53301.776]: [ens5f0] port 1: master sync timeout
ptp4l[53301.776]: [ens5f0] port 1: master tx announce timeout
ptp4l[53301.839]: [ens5f0] port 1: master sync timeout
phc2sys[53301.894]: [ens5f0] ens5f0 rms 274 max 274 freq -2007 +/- 0 delay 1395 +/- 0
Sebastian, what are the (presumably Fortville) NICs involved and at what link rate are you running at ? What is the switch model in between, and has it been properly configured as G.8275.1 T-BC ? I have troubles finding a code path that would give this strange output with no specific error message. I assume there are no issues with memory allocation. Can you please run ptp4l in strace and post the output around that "SLAVE/MASTER to FAULTY" message? Can you please also capture all received and transmitted PTP packets around that point? Miroslav, We are getting delay timeout messages: ptp4l[73182.878]: [ens5f0] port 1: delay timeout ptp4l[73182.888]: [ens5f0] port 1: delay timeout ptp4l[73182.966]: [ens5f0] port 1: delay timeout ptp4l[73183.073]: [ens5f0] port 1: delay timeout ptp4l[73183.148]: [ens5f0] port 1: delay timeout ptp4l[73183.161]: [ens5f0] port 1: delay timeout ptp4l[73183.249]: [ens5f0] port 1: delay timeout ptp4l[73183.343]: [ens5f0] port 1: delay timeout ptp4l[73183.345]: [ens5f0] port 1: delay timeout ptp4l[73183.380]: [ens5f0] port 1: delay timeout ptp4l[73183.496]: [ens5f0] port 1: delay timeout ptp4l[73183.529]: [ens5f0] port 1: delay timeout ptp4l[73183.585]: [ens5f0] port 1: delay timeout ptp4l[73183.605]: [ens5f0] port 1: delay timeout ptp4l[73183.705]: [ens5f0] port 1: delay timeout ptp4l[73183.819]: [ens5f0] port 1: delay timeout What could cause this? Those are expected debug messages, indicating timeouts for ptp4l sending delay requests. Created attachment 1734255 [details]
Behavior with RT Kernel
In this file, I think interesting occurrence might give us a clue on the issues we are having with the Kernel: 4.18.0-193.24.1.rt13.74.el8_2.dt1.x86_64. You see regular delay timeouts; approximately every 63ms. Then, ptp4l spends 4.5 seconds consuming delay responses. We consume 72 packets without handling a delay timeout. It says to me that we are struggling to schedule the timeout and handle packets. It also says to me that we are not having "network" issues. All the packets are there; the daemon is not processing them.
Hi Pasi, This is the nic driver: i40e version: 2.8.20-k firmware-version: 7.10 0x800075e6 19.5.12 expansion-rom-version: bus-info: 0000:86:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes [root@cnfde4 /]# exit [root@cnfde4 core]# lspci -vvv 0000:86:00.0 [root@cnfde4 core]# lspci -vvv -s 0000:86:00.0 86:00.0 Ethernet controller: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02) Subsystem: Intel Corporation Ethernet 25G 2P XXV710 Adapter Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 66 NUMA node: 1 Region 0: Memory at d8000000 (64-bit, prefetchable) [size=16M] Region 3: Memory at d9008000 (64-bit, prefetchable) [size=32K] Expansion ROM at d3800000 [disabled] [size=512K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] MSI-X: Enable+ Count=129 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00001000 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <16us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- Capabilities: [e0] Vital Product Data Product Name: XXV710 25GbE Controller Read-only fields: [V0] Vendor specific: FFV19.5.12 [PN] Part number: 00M95 [MN] Manufacture ID: 1028 [V1] Vendor specific: DSV1028VPDR.VER2.1 [V3] Vendor specific: DTINIC [V4] Vendor specific: DCM1001FFFFFF2101FFFFFF1202FFFFFF2302FFFFFF1403FFFFFF2503FFFFFF1604FFFFFF2704FFFFFF1805FFFFFF2905FFFFFF1A06FFFFFF2B06FFFFFF1C07FFFFFF2D07FFFFFF1E08FFFFFF2F08FFFFFF [V5] Vendor specific: NPY2 [V6] Vendor specific: PMTD [V7] Vendor specific: NMVIntel Corp [V8] Vendor specific: L1D0 [RV] Reserved: checksum good, 1 byte(s) reserved Read/write fields: [Y1] System specific: CCF1 End Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+ MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [140 v1] Device Serial Number 30-25-0e-ff-ff-b7-a6-40 Capabilities: [1a0 v1] Transaction Processing Hints Device specific mode supported No steering table available Capabilities: [1b0 v1] Access Control Services ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- Capabilities: [1d0 v1] #19 Kernel driver in use: i40e Kernel modules: i40e [root@cnfde4 core]# toolbox Container 'toolbox-root' already exists. Trying to start... (To remove the container and start with a fresh toolbox, run: sudo podman rm 'toolbox-root') toolbox-root Container started successfully. To exit, type 'exit'. [root@cnfde4 /]# ethtool ens5f0 Settings for ens5f0: Supported ports: [ FIBRE ] Supported link modes: 25000baseCR/Full Supported pause frame use: Symmetric Supports auto-negotiation: Yes Supported FEC modes: None BaseR RS Advertised link modes: 25000baseCR/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Advertised FEC modes: None BaseR RS Speed: 25000Mb/s Duplex: Full Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: d Wake-on: d Current message level: 0x00000007 (7) drv probe link Link detected: yes We enable ptp support on the switch Dell EMC Networking OS10 Enterprise Copyright (c) 1999-2020 by Dell Inc. All Rights Reserved. OS Version: 10.5.1.4 Build Version: 10.5.1.4.249 Build Time: 2020-07-22T21:29:21+0000 System Type: S5248F-ON Architecture: x86_64 Up Time: 4 weeks 3 days 21:50:48 Adding a request resource to the linuxptp-daemon helped make the synchronization more stable, there are other kernel patches for the RT kernel that also improve the synchronization. @sscheink I can run for several days without a single FAULT or UNCALIBRATED. However sometimes there are sync losses like: 2020-12-21T04:11:19.079286820+00:00 stdout F ptp4l[44415.855]: port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) There are not many of these. It looks something different, probably related to the network / switch. I think we can move this to "verified". Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |