Description of problem: I tried to troubleshoot ptp on Openshift for hours and I discovered that ptp4l is writing some errors on stderr and not on stdout and this errors were missing from openshift logs. As ptp is not trivial to setup on different use-cases it could be helpful to get this stderr logs too in container logs to be able to troubleshoot with oc get logs. Looking at https://github.com/openshift/linuxptp-daemon/blob/master/pkg/daemon/daemon.go#L239 it seems that Stdout is piped and output to container stdout (used to parsed metrics for Prometheus) but stderr is not configured and stays discarded. Something like: cmd.Stderr = os.Stderr around line 246 could probably help. Version-Release number of selected component (if applicable): Openshift 4.6.x. But should be present in all versions. How reproducible: Always. Steps to Reproduce: 1. Deploy ptp-operator 2. Create a buggy ptpConfig resource 3. check linuxptp-daemon-container logs in linuxptp-daemon pod to see stderr error message. Actual results: Nothing appears in oc logs output. Message can be seen by using strace connected to the running ptp4l process (write(2, ...)) Expected results: Errors written to stderr by ptp4l (and by phc2sys too) output to oc logs.
Hi Philippe, thanks for the report! is it possible to request a ptpConfig resource so I can test it? Thanks! Sebastian
Hi Sebastien, I have no ptp capable devices accessible right now. But what you could try using something like: oc create -f- <<_EOF_ apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: slave namespace: openshift-ptp spec: profile: - name: "slave" interface: "eth1" ptp4lOpts: "-2 -s -S" phc2sysOpts: "-a -r -n 24" ptp4lConf: | [global] # # Default Data Set # slaveOnly 1 priority1 128 domainNumber 24 boundary_clock_jbod 1 logAnnounceInterval -3 delay_mechanism E2E network_transport L2 tx_timestamp_timeout 20 dataset_comparison G.8275.x time_stamping hardware clockClass 255 logSyncInterval -4 logMinDelayReqInterval -4 summary_interval 6 priority2 128 clock_servo linreg logging_level 6 recommend: - profile: "slave" priority: 1 match: - nodeLabel: "node-role.kubernetes.io/worker" _EOF_ If you restart the pod on worker nodes, you should have something like: ``` I0111 16:51:42.642015 4165308 daemon.go:210] Starting ptp4l... I0111 16:51:42.642077 4165308 daemon.go:211] ptp4l cmd: /usr/sbin/ptp4l -f /ptp4l-conf/slave.conf -i eth1 -2 -s -S -m --summary_interval 1 phc2sys[457366.272]: Waiting for ptp4l... E0111 16:51:42.671987 4165308 daemon.go:245] cmdRun() error waiting for ptp4l: exit status 255 phc2sys[457367.274]: Waiting for ptp4l... phc2sys[457368.275]: Waiting for ptp4l... phc2sys[457369.278]: Waiting for ptp4l... phc2sys[457370.279]: Waiting for ptp4l... ``` But you don't get the real error output of ptp4l which is: ``` oc exec -c linuxptp-daemon-container linuxptp-daemon-vn6ld -- /usr/sbin/ptp4l -f /ptp4l-conf/slave.conf -i eth1 -2 -s -S -m --summary_interval 1 ptp4l[458329.321]: ioctl SIOCETHTOOL failed: No such device ptp4l[458329.325]: ioctl SIOCGIFHWADDR failed: No such device ptp4l[458329.325]: failed to generate a clock identity failed to create a clock command terminated with exit code 255 ``` Most of the error messages produced by ptp4l are written on stderr and not stdout and linuxptp-daemon is only relaying stdout from ptp4l to container stdout. Philippe
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633