Bug 2109518

Summary: PTP Event| Events 4.10 - metric for status of phs2sys show wrong information
Product: OpenShift Container Platform Reporter: obochan <obochan>
Component: NetworkingAssignee: Aneesh Puttur <aputtur>
Networking sub component: ptp QA Contact: obochan <obochan>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: keyoung
Version: 4.10   
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-23 18:29:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2091599    
Bug Blocks:    

Description obochan 2022-07-21 13:18:06 UTC
Description of problem:

When the slave interface goes down, ptp4l physical interface goes to faulty and as result there 2 events happens:

1. ptp4l > HANDOVER - timeout > FREERUN
2. phc2sys > FREERUN

on the metric we see the that phc2sys moves to HANDOVER instead of FREERUN stays like that.
Version-Release number of selected component (if applicable):

[obochan@obochan ~]$ oc version 
Client Version: 4.10.18
Server Version: 4.10.23
Kubernetes Version: v1.23.5+012e945

How reproducible:
 always

Steps to Reproduce:
1.shutdown the slave interface
2.check the metric
3. - oc exec -it linuxptp-daemon-8b6rz -n openshift-ptp -c cloud-event-proxy -- curl 127.0.0.1:9091/metrics 

Actual results:
phs2sy shows HANDOVER

Expected results:
phs2sys shows FREERUN



Additional info:
# HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER
# TYPE openshift_ptp_clock_state gauge
openshift_ptp_clock_state{iface="",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 0
openshift_ptp_clock_state{iface="CLOCK_REALTIME",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 2
openshift_ptp_clock_state{iface="ens1f0",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 0
openshift_ptp_clock_state{iface="ens1fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 2
# HELP openshift_ptp_delay_ns 

time="2022-07-21T10:09:18Z" level=info msg="update interface ens1f0 with portid 1 from role SLAVE to  role FAULTY"
time="2022-07-21T10:09:18Z" level=warning msg="amqp disabled,no action taken(can't send to a desitination): logging new event {\n    \"id\": \"93707d29-b4f8-408d-8858-bf163bf71191\",\n    \"type\": \"event.sync.ptp-status.ptp-state-change\",\n    \"source\": \"/cluster/cnfde4.ptp.lab.eng.bos.redhat.com/ptp/ens1fx/master\",\n    \"dataContentType\": \"application/json\",\n    \"time\": \"2022-07-21T10:09:18.530215966Z\",\n    \"data\": {\n      \"version\": \"v1\",\n      \"values\": [\n        {\n          \"resource\": \"/sync/sync-status/sync-state\",\n          \"dataType\": \"notification\",\n          \"valueType\": \"enumeration\",\n          \"value\": \"HOLDOVER\"\n        },\n        {\n          \"resource\": \"/sync/sync-status/sync-state\",\n          \"dataType\": \"metric\",\n          \"valueType\": \"decimal64.3\",\n          \"value\": \"-6\"\n        }\n      ]\n    }\n  }\n"
time="2022-07-21T10:09:18Z" level=debug msg="posting event status SUCCESS to publisher /cluster/node/cnfde4.ptp.lab.eng.bos.redhat.com/ptp"
time="2022-07-21T10:09:18Z" level=debug msg="event sent {\n    \"id\": \"fb487dd9-264f-4a4b-8fd6-af4bb7a73409\",\n    \"type\": \"event.sync.ptp-status.ptp-state-change\",\n    \"source\": \"/cluster/cnfde4.ptp.lab.eng.bos.redhat.com/ptp/ens1fx/master\",\n    \"dataContentType\": \"application/json\",\n    \"time\": \"2022-07-21T10:09:18.530215966Z\",\n    \"data\": {\n      \"version\": \"v1\",\n      \"values\": [\n        {\n          \"resource\": \"/sync/sync-status/sync-state\",\n          \"dataType\": \"notification\",\n          \"valueType\": \"enumeration\",\n          \"value\": \"HOLDOVER\"\n        },\n        {\n          \"resource\": \"/sync/sync-status/sync-state\",\n          \"dataType\": \"metric\",\n          \"valueType\": \"decimal64.3\",\n          \"value\": \"-6\"\n        }\n      ]\n    }\n  }"
time="2022-07-21T10:09:18Z" level=warning msg="amqp disabled,no action taken(can't send to a desitination): logging new event {\n    \"id\": \"9839050d-15cd-415f-80b4-a7d21f93c84a\",\n    \"type\": \"event.sync.ptp-status.ptp-state-change\",\n    \"source\": \"/cluster/cnfde4.ptp.lab.eng.bos.redhat.com/ptp/CLOCK_REALTIME\",\n    \"dataContentType\": \"application/json\",\n    \"time\": \"2022-07-21T10:09:18.536048668Z\",\n    \"data\": {\n      \"version\": \"v1\",\n      \"values\": [\n        {\n          \"resource\": \"/sync/sync-status/sync-state\",\n          \"dataType\": \"notification\",\n          \"valueType\": \"enumeration\",\n          \"value\": \"FREERUN\"\n        },\n        {\n          \"resource\": \"/sync/sync-status/sync-state\",\n          \"dataType\": \"metric\",\n          \"valueType\": \"decimal64.3\",\n          \"value\": \"2\"\n        }\n      ]\n    }\n  }\n"

Comment 10 obochan 2022-08-18 15:37:20 UTC
issue is verified in :

Client Version: 4.10.18
Server Version: 4.10.0-0.nightly-2022-08-16-180211
Kubernetes Version: v1.23.5+012e945


cne_events_ack{status="failed",type="/cluster/node/cnfde4.ptp.lab.eng.bos.redhat.com/ptp"} 23
# HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER
# TYPE openshift_ptp_clock_state gauge
openshift_ptp_clock_state{iface="CLOCK_REALTIME",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 0
openshift_ptp_clock_state{iface="ens1fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 0
# HELP openshift_ptp_delay_ns

Comment 12 errata-xmlrpc 2022-08-23 18:29:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.28 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6095