Bug 1188074

Summary: adjtimex fails with kernel-3.18.5 when ca
Product: [Fedora] Fedora Reporter: Stuart D Gathman <stuart>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 21CC: devin, gansalmon, itamar, john.stultz, jonathan, kernel-maint, madhu.chinakonda, mchehab, mlichvar, pbrobinson, sasha.levin, watanabe.yu
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-3.18.5-101.fc20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-06 04:00:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Potential fix to avoid false positives on 32bit systems none

Description Stuart D Gathman 2015-02-02 00:31:11 UTC
Description of problem:
Fatal error : adjtimex failed for set_frequency, freq_ppm=-3.5554e+01 required_freq=3.5554e+01 required_tick=10000

Version-Release number of selected component (if applicable):
kernel-3.18.5-200.fc21.i686
chrony-1.31-1.fc21.i686

How reproducible:
always

Steps to Reproduce:
1. systemctl start chronyd, or reboot
2.
3.

Actual results:
adjtimex fails, causing chronyd to refuse to start.

Expected results:
chronyd and adjtimex work as in previous kernels

Additional info:
While chronyd is reporting the error - it only starting failing with this kernel.  Also, this is an updates-testing kernel.  I'm not sure if I should be reporting bugs against testing kernels - but how else will you know what is broken?  

Jan 31 08:33:17 elissa.gathman.org chronyd[722]: Frequency -35.554 +/- 1.838 ppm read from /var/lib/chrony/drift
Jan 31 08:33:17 elissa.gathman.org chronyd[722]: Fatal error : adjtimex failed for set_frequency, freq_ppm=-3.5554e+01 required_freq=3.5554e+01 required_tick=10000
Jan 31 08:33:17 elissa.gathman.org chronyd[718]: adjtimex failed for set_frequency, freq_ppm=-3.5554e+01 required_freq=3.5554e+01 required_tick=10000

Comment 1 Stuart D Gathman 2015-02-02 00:38:47 UTC
It is of course possible that chronyd has been broken all along, and previous kernels were just ignoring the garbage.

Comment 2 Josh Boyer 2015-02-02 13:58:03 UTC
(In reply to Stuart D Gathman from comment #1)
> It is of course possible that chronyd has been broken all along, and
> previous kernels were just ignoring the garbage.

I've emailed upstream.  They included:

commit e8e3b6ddd0a6afa267029141946918392b425cf4
Author: Sasha Levin <sasha.levin>
Date:   Wed Dec 3 19:25:05 2014 -0500

    time: adjtimex: Validate the ADJ_FREQUENCY values
    
    commit 5e5aeb4367b450a28f447f6d5ab57d8f2ab16a5f upstream.
    
    Verify that the frequency value from userspace is valid and makes sense.
    
    Unverified values can cause overflows later on.
    
    Cc: Thomas Gleixner <tglx>
    Cc: Ingo Molnar <mingo>
    Signed-off-by: Sasha Levin <sasha.levin>
    [jstultz: Fix up bug for negative values and drop redunent cap check]
    Signed-off-by: John Stultz <john.stultz>
    Signed-off-by: Greg Kroah-Hartman <gregkh>

in 3.18.5, which is very likely hitting this.  Your comment about it being broken all along might be true, but getting it unbroken is something we should figure out either way.

Comment 3 Miroslav Lichvar 2015-02-02 15:40:36 UTC
To me it looks like that commit should used MAXFREQ_SCALE instead of LONG_MAX/MIN. The frequency tolerance is 500 ppm, that's 500 * (1<<16) in the timex freq field.

--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -634,9 +634,9 @@ int ntp_validate_timex(struct timex *txc)
                return -EPERM;
 
        if (txc->modes & ADJ_FREQUENCY) {
-               if (LONG_MIN / PPM_SCALE > txc->freq)
+               if (-MAXFREQ_SCALED / PPM_SCALE > txc->freq)
                        return -EINVAL;
-               if (LONG_MAX / PPM_SCALE < txc->freq)
+               if (MAXFREQ_SCALED / PPM_SCALE < txc->freq)
                        return -EINVAL;
        }

Comment 4 Stuart D Gathman 2015-02-02 15:51:36 UTC
ntpd works ok, and I assume it also uses adjtimex.

Comment 5 Miroslav Lichvar 2015-02-02 15:59:45 UTC
ntpd sets the frequency directly only on start (after that the kernel PLL takes over) and it doesn't check for adjtimex errors.

Try ntptime -f 100, or any value between -500 and 500.

Comment 6 Sasha Levin 2015-02-02 16:09:54 UTC
(In reply to Stuart D Gathman from comment #0)
> Jan 31 08:33:17 elissa.gathman.org chronyd[722]: Frequency -35.554 +/- 1.838
> ppm read from /var/lib/chrony/drift
> Jan 31 08:33:17 elissa.gathman.org chronyd[722]: Fatal error : adjtimex
> failed for set_frequency, freq_ppm=-3.5554e+01 required_freq=3.5554e+01
> required_tick=10000
> Jan 31 08:33:17 elissa.gathman.org chronyd[718]: adjtimex failed for
> set_frequency, freq_ppm=-3.5554e+01 required_freq=3.5554e+01
> required_tick=10000

Could you share your /var/lib/chrony/drift please?

Comment 7 Stuart D Gathman 2015-02-02 16:33:44 UTC
# cat /var/lib/chrony/drift
          -35.554442             1.838092
# ntptime -f -35.554442
ntp_gettime() returns code 0 (OK)
  time d87a2511.ebc62160  Mon, Feb  2 2015 11:32:17.920, (.920992747),
  maximum error 82579 us, estimated error 25421 us, TAI offset 0
ntp_adjtime() call fails: Invalid argument

Comment 8 Stuart D Gathman 2015-02-02 17:20:19 UTC
Feb 02 10:44:33 localhost ntpd[9491]: ntpd 4.2.6p5 Fri Dec 19 19:02:22 UTC 2014 (1)
Feb 02 10:44:33 localhost ntpd[9492]: proto: precision = 1.467 usec
Feb 02 10:44:33 localhost ntpd[9492]: 0.0.0.0 c01d 0d kern kernel time sync enabled
...

Feb 02 10:44:34 localhost ntpd[9492]: 0.0.0.0 c016 06 restart
Feb 02 10:44:34 localhost ntpd[9492]: 0.0.0.0 c012 02 freq_set kernel 0.000 PPM
Feb 02 10:44:34 localhost ntpd[9492]: 0.0.0.0 c011 01 freq_not_set
Feb 02 10:44:34 localhost ntpd[9492]: 0.0.0.0 c61c 0c clock_step +0.159777 s
Feb 02 10:44:34 localhost ntpd[9492]: 0.0.0.0 c614 04 freq_mode
Feb 02 10:44:34 localhost systemd[1443]: Time has been changed
Feb 02 10:44:34 localhost systemd[1321]: Time has been changed
Feb 02 10:44:35 localhost ntpd[9492]: 0.0.0.0 c618 08 no_sys_peer
Feb 02 11:00:16 localhost ntpd[9492]: 0.0.0.0 c612 02 freq_set kernel 38.555 PPM
Feb 02 11:00:16 localhost ntpd[9492]: 0.0.0.0 c615 05 clock_sync

Comment 9 John Stultz 2015-02-02 18:47:09 UTC
So I've also heard issues on arm. Since the report is against i686, just wanted to confirm that this hasn't been reported on x86_64?

thanks
-john

Comment 10 Josh Boyer 2015-02-02 18:50:45 UTC
Looking at the Fedora update filed, I think that is correct, John.  Both reports I've seen have been i686 thus far.  That being said, this particular kernel is still in updates-testing at the moment, so it isn't wide spread.

Comment 11 John Stultz 2015-02-02 18:54:31 UTC
Yea, looks like it should have been LLONG_MAX / PPM_SCALE.

Comment 12 John Stultz 2015-02-02 19:09:23 UTC
Created attachment 987224 [details]
Potential fix to avoid false positives on 32bit systems

Can folks try this patch to validate it resolves the issue for them?

Comment 13 Peter Robinson 2015-02-02 23:35:29 UTC
I'm seeing this on ARMv7 with 3.19rc7 too btw:

01:00 AM root@cubietruck:~ $ uname -a
Linux cubietruck.home.roving-it.com 3.19.0-0.rc7.git0.1.fc22.armv7hl+lpae #1 SMP Mon Feb 2 15:17:41 UTC 2015 armv7l armv7l armv7l GNU/Linux
01:00 AM root@cubietruck:~ $ systemctl status chronyd.service -l
● chronyd.service - NTP client/server
   Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled)
   Active: failed (Result: exit-code) since Thu 1970-01-01 01:00:16 BST; 46s ago
  Process: 389 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=1/FAILURE)

Jan 01 01:00:16 cubietruck.home.roving-it.com systemd[1]: Starting NTP client/server...
Jan 01 01:00:16 cubietruck.home.roving-it.com chronyd[389]: adjtimex failed for set_frequency, freq_ppm=-7.0716e+00 required_freq=7.0716e+00 required_tick=10000
Jan 01 01:00:16 cubietruck.home.roving-it.com systemd[1]: chronyd.service: control process exited, code=exited status=1
Jan 01 01:00:16 cubietruck.home.roving-it.com systemd[1]: Failed to start NTP client/server.
Jan 01 01:00:16 cubietruck.home.roving-it.com systemd[1]: Unit chronyd.service entered failed state.
Jan 01 01:00:16 cubietruck.home.roving-it.com systemd[1]: chronyd.service failed.

Comment 14 John Stultz 2015-02-03 02:34:03 UTC
Peter: Yes, 32bit systems are affected, and the patch attached to this bug should resolve it. Let me know if it does not.

Comment 15 Fedora Update System 2015-02-03 03:49:11 UTC
kernel-3.18.5-201.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/kernel-3.18.5-201.fc21

Comment 16 Fedora Update System 2015-02-03 03:49:57 UTC
kernel-3.18.5-101.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.18.5-101.fc20

Comment 17 John Stultz 2015-02-03 04:11:08 UTC
So I assume this means the patch is working for you? I was hoping to get some tested-by lines to add to the submission.

Comment 18 Yu Watanabe 2015-02-03 12:22:44 UTC
The bug seems fixed on 3.18.5-201. Now chronyd works correctly for me on my i686 laptop.

Below is journal logs of chronyd.service on 3.18.5-200 and 3.18.5-201.

*3.18.5-200
%%%%%%%%%%%%%%%%%%%%%%%
Feb 03 21:12:42 vaiox.localdomain systemd[1]: Starting NTP client/server...
Feb 03 21:12:42 vaiox.localdomain chronyd[1690]: chronyd version 1.31 starting
Feb 03 21:12:42 vaiox.localdomain chronyd[1690]: Frequency 0.000 +/- 1000000.000 ppm read from /var/lib/chrony/drift
Feb 03 21:12:42 vaiox.localdomain systemd[1]: Started NTP client/server.
Feb 03 21:13:08 vaiox.localdomain chronyd[1690]: Selected source 157.7.154.23
Feb 03 21:13:08 vaiox.localdomain chronyd[1690]: Fatal error : adjtimex failed for set_frequency, freq_ppm=8.3333e+04 required_freq=-3.3333e+01 required_tick=9167
Feb 03 21:13:08 vaiox.localdomain systemd[1]: chronyd.service: main process exited, code=exited, status=1/FAILURE
Feb 03 21:13:08 vaiox.localdomain systemd[1]: Unit chronyd.service entered failed state.
Feb 03 21:13:08 vaiox.localdomain systemd[1]: chronyd.service failed.
%%%%%%%%%%%%%%%%%%%%%%%

*3.18.5-201
%%%%%%%%%%%%%%%%%%%%%%%
Feb 03 21:16:53 vaiox.localdomain systemd[1]: Starting NTP client/server...
Feb 03 21:16:53 vaiox.localdomain chronyd[1692]: chronyd version 1.31 starting
Feb 03 21:16:53 vaiox.localdomain chronyd[1692]: Frequency 0.000 +/- 1000000.000 ppm read from /var/lib/chrony/drift
Feb 03 21:16:53 vaiox.localdomain systemd[1]: Started NTP client/server.
Feb 03 21:17:09 vaiox.localdomain chronyd[1692]: Selected source 106.186.114.89
Feb 03 21:17:09 vaiox.localdomain chronyd[1692]: System clock wrong by 0.745615 seconds, adjustment started
%%%%%%%%%%%%%%%%%%%%%%%

Comment 19 Josh Boyer 2015-02-03 15:18:11 UTC
John, sorry for the miscommunication.  We looked at the patch and it looked correct so we added it to a build we were doing for another bug.  Happily, Yu has confirmed it does indeed work.

Comment 20 Fedora Update System 2015-02-04 08:01:10 UTC
Package kernel-3.18.5-201.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.18.5-201.fc21'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-1657/kernel-3.18.5-201.fc21
then log in and leave karma (feedback).

Comment 21 Fedora Update System 2015-02-06 04:00:08 UTC
kernel-3.18.5-201.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 22 Fedora Update System 2015-02-07 04:00:54 UTC
kernel-3.18.5-101.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.