Bug 1392797

Summary: chronyd crashes when performing server leap smear
Product: Red Hat Enterprise Linux 6 Reporter: Miroslav Lichvar <mlichvar>
Component: chronyAssignee: Miroslav Lichvar <mlichvar>
Status: CLOSED CURRENTRELEASE QA Contact: qe-baseos-daemons
Severity: high Docs Contact:
Priority: urgent    
Version: 6.8CC: cww, jprokes, jreznik, mkolaja, mlichvar, qe-baseos-daemons, vanhoof
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: chrony-2.1.1-2.el6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1392793
: 1401533 (view as bug list) Environment:
Last Closed: 2017-05-24 14:31:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1269194, 1401533    

Description Miroslav Lichvar 2016-11-08 09:39:03 UTC
+++ This bug was initially created as a clone of Bug #1392793 +++

Description of problem:
When chronyd is configured with the smoothtime directive and the smoothing process is updated with an extremely small offset, it may not be able to select a direction in which the offset needs to be smoothed out due to numerical errors in floating-point operations and this causes an assertion failure.

Normally, the offset is large enough to not hit this problem, but with the leaponly option (which can be used to perform a synchronized leap smear on multiple servers) the smoothing process is updated with zero offset after the leap second is inserted, which creates ideal conditions for hitting this bug. The chances of crash during whole leap smear depends on the update interval, which depends on the polling interval. With 0.001 ppm/s smoothtime wander and polling interval of 1024 seconds (the default maximum) the probability seems to be about 1%. With 1 second polling interval it's about 50%.

Version-Release number of selected component (if applicable):
chrony-2.1.1-3.el7

How reproducible:
Occasionally

Steps to Reproduce:
1. prepare an NTP server that will simulate a leap second

2. configure chronyd as a client of the server using 1-second polling interval and performing a leap smear for its own clients, e.g.:

      server ntp.example.com minpoll 0 maxpoll 0
      leapsecmode slew
      maxslewrate 1000
      smoothtime 400 0.001 leaponly

3. wait for the simulated leap second and then wait until the leap smear is finished (the progress can be monitored with "chronyc smoothing")

Actual results:
chronyd crashes with "smooth.c:164: update_stages: Assertion `dir <= 1 && l1 >= 0.0 && l3 >= 0.0' failed."

Expected results:
no crash

Additional info:
This bug was fixed upstream in commit https://git.tuxfamily.org/chrony/chrony.git/commit/?id=c0a8afdb68694a31045111c6d7481c2cced34a34