Red Hat Bugzilla – Bug 1008902
ptp: phc2sys sys offset suddenly increasing very large
Last modified: 2014-03-16 21:48:34 EDT
Created attachment 915768 [details]
(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).
Could you attach the full ptp4l log from the slave?
Created attachment 799074 [details]
ptp4l and phc2sys logs from slave
In the phc2sys log the ~35 second offset appears around second 1926, but in the ptp4l log there doesn't seem to be anything interesting around that second. This looks like some other process may be setting the system clock.
Any chance there is ntpd running or is ntpdate/hwclock/rdate called periodically?
Ok, ntpd probably just stepped the clock after its stepout interval (900 seconds).
But there is a strange offset in the slave ptp4l log at 1438. Is the master running ntpd. Is the PHC synchronized by phc2sys from the system clock?
Also, any explanation why the master is dropping out? Was it restarted or blocked by the firewall?
(In reply to Miroslav Lichvar from comment #6)
> But there is a strange offset in the slave ptp4l log at 1438.
Looking at the three logs, it seems likely that at that time, Sync message was delayed by a switch by 1.7 ms, subsequent Delay_Req and Delay_Resp messages were delayed by unknown time and no further traffic got through until ~21 seconds later.
Are you doing anything with the communication path (switch, network cables, etc.) during the testing?
Stopping ntpd helped with one of the problems (the one described in comment 4).
The problem described in comment 7 still remains but it's a separate one. Miroslav captured packets on both machines and the captures confirm my theory (see comment 7). This can be caused by hardware at the master not sending the frames, the hardware at the slave not receiving the frames properly, or switch discarding the frames.
Tried to find out which case (see previous comment) it is but it seems the problem does not reproduce with a ping running in parallel.
There doesn't seem to be anything wrong with ptp4l/phc2sys. If this is a Linux issue, then the only point that could be wrong is the NIC driver. I suspect more a hardware problem, though.
Could you try with a different switch? Or with a master running on a different NIC?
Thanks for doing the testing. I'm very much inclined to say this was a problem with the switch and its handling of multicast packets. The only thing preventing me from saying for sure this is not a RHEL bug is Jimmy Pan reproducing the problem with igb cards and Cisco Catalyst 3750 switch.
I'll try a few things with a modified linuxptp.
For the record, cannot reproduce it anymore on the machines that showed the problem originally.
For the record, Jimmy Pan experienced the problem on the same machines.
*** Bug 1011356 has been marked as a duplicate of this bug. ***
*** Bug 1011367 has been marked as a duplicate of this bug. ***
*** Bug 1011363 has been marked as a duplicate of this bug. ***
*** Bug 1011368 has been marked as a duplicate of this bug. ***