Bug 1914303 - linuxptp-daemon is not forwarding ptp4l stderr output to openshift logs
Summary: linuxptp-daemon is not forwarding ptp4l stderr output to openshift logs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Sebastian Scheinkman
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-08 15:02 UTC by Philippe HUET
Modified: 2021-02-24 15:51 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:51:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift linuxptp-daemon pull 31 0 None open Bug 1914303: Add cmd stderr redirection to os.Stderr 2021-01-17 13:23:40 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:51:51 UTC

Description Philippe HUET 2021-01-08 15:02:37 UTC
Description of problem:

I tried to troubleshoot ptp on Openshift for hours and I discovered that ptp4l is writing some errors on stderr and not on stdout and this errors were missing from openshift logs.
As ptp is not trivial to setup on different use-cases it could be helpful to get this stderr logs too in container logs to be able to troubleshoot with oc get logs.

Looking at https://github.com/openshift/linuxptp-daemon/blob/master/pkg/daemon/daemon.go#L239 it seems that Stdout is piped and output to container stdout (used to parsed metrics for Prometheus) but stderr is not configured and stays discarded.

Something like:

cmd.Stderr = os.Stderr

around line 246 could probably help.

Version-Release number of selected component (if applicable):

Openshift 4.6.x. But should be present in all versions.


How reproducible:

Always.

Steps to Reproduce:
1. Deploy ptp-operator
2. Create a buggy ptpConfig resource
3. check linuxptp-daemon-container logs in linuxptp-daemon pod to see stderr error message.

Actual results:

Nothing appears in oc logs output. Message can be seen by using strace connected to the running ptp4l process (write(2, ...))


Expected results:

Errors written to stderr by ptp4l (and by phc2sys too) output to oc logs.

Comment 1 Sebastian Scheinkman 2021-01-10 12:13:11 UTC
Hi Philippe,

thanks for the report!
is it possible to request a ptpConfig resource so I can test it?

Thanks!
Sebastian

Comment 2 Philippe HUET 2021-01-11 17:09:29 UTC
Hi Sebastien,

I have no ptp capable devices accessible right now. But what you could try using something like:
oc create -f-  <<_EOF_
apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: slave
  namespace: openshift-ptp
spec:
  profile:
    - name: "slave"
      interface: "eth1"
      ptp4lOpts: "-2 -s -S"
      phc2sysOpts: "-a -r -n 24"
      ptp4lConf: |
         [global]
         #
         # Default Data Set
         #
         slaveOnly               1
         priority1 128
         domainNumber 24
         boundary_clock_jbod 1
         logAnnounceInterval -3
         delay_mechanism E2E
         network_transport L2
         tx_timestamp_timeout 20
         dataset_comparison G.8275.x
         time_stamping hardware
         clockClass 255
         logSyncInterval -4
         logMinDelayReqInterval -4
         summary_interval 6
         priority2 128
         clock_servo linreg
         logging_level        6
  recommend:
    - profile: "slave"
      priority: 1
      match:
        - nodeLabel: "node-role.kubernetes.io/worker"
_EOF_

If you restart the pod on worker nodes, you should have something like:

```
I0111 16:51:42.642015 4165308 daemon.go:210] Starting ptp4l...
I0111 16:51:42.642077 4165308 daemon.go:211] ptp4l cmd: /usr/sbin/ptp4l -f /ptp4l-conf/slave.conf -i eth1 -2 -s -S -m --summary_interval 1
phc2sys[457366.272]: Waiting for ptp4l...
E0111 16:51:42.671987 4165308 daemon.go:245] cmdRun() error waiting for ptp4l: exit status 255
phc2sys[457367.274]: Waiting for ptp4l...
phc2sys[457368.275]: Waiting for ptp4l...
phc2sys[457369.278]: Waiting for ptp4l...
phc2sys[457370.279]: Waiting for ptp4l...
```

But you don't get the real error output of ptp4l which is:
```
oc exec -c linuxptp-daemon-container linuxptp-daemon-vn6ld  -- /usr/sbin/ptp4l -f /ptp4l-conf/slave.conf -i eth1 -2 -s -S -m --summary_interval 1
ptp4l[458329.321]: ioctl SIOCETHTOOL failed: No such device
ptp4l[458329.325]: ioctl SIOCGIFHWADDR failed: No such device
ptp4l[458329.325]: failed to generate a clock identity
failed to create a clock
command terminated with exit code 255
```

Most of the error messages produced by ptp4l are written on stderr and not stdout and linuxptp-daemon is only relaying stdout from ptp4l to container stdout.

Philippe

Comment 8 errata-xmlrpc 2021-02-24 15:51:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.