Bug 1188074 - adjtimex fails with kernel-3.18.5 when ca
Summary: adjtimex fails with kernel-3.18.5 when ca
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 21
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-02 00:31 UTC by Stuart D Gathman
Modified: 2015-02-07 04:00 UTC (History)
12 users (show)

Fixed In Version: kernel-3.18.5-101.fc20
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-06 04:00:08 UTC
Type: Bug


Attachments (Terms of Use)
Potential fix to avoid false positives on 32bit systems (1.51 KB, application/mbox)
2015-02-02 19:09 UTC, John Stultz
no flags Details

Description Stuart D Gathman 2015-02-02 00:31:11 UTC
Description of problem:
Fatal error : adjtimex failed for set_frequency, freq_ppm=-3.5554e+01 required_freq=3.5554e+01 required_tick=10000

Version-Release number of selected component (if applicable):
kernel-3.18.5-200.fc21.i686
chrony-1.31-1.fc21.i686

How reproducible:
always

Steps to Reproduce:
1. systemctl start chronyd, or reboot
2.
3.

Actual results:
adjtimex fails, causing chronyd to refuse to start.

Expected results:
chronyd and adjtimex work as in previous kernels

Additional info:
While chronyd is reporting the error - it only starting failing with this kernel.  Also, this is an updates-testing kernel.  I'm not sure if I should be reporting bugs against testing kernels - but how else will you know what is broken?  

Jan 31 08:33:17 elissa.gathman.org chronyd[722]: Frequency -35.554 +/- 1.838 ppm read from /var/lib/chrony/drift
Jan 31 08:33:17 elissa.gathman.org chronyd[722]: Fatal error : adjtimex failed for set_frequency, freq_ppm=-3.5554e+01 required_freq=3.5554e+01 required_tick=10000
Jan 31 08:33:17 elissa.gathman.org chronyd[718]: adjtimex failed for set_frequency, freq_ppm=-3.5554e+01 required_freq=3.5554e+01 required_tick=10000

Comment 1 Stuart D Gathman 2015-02-02 00:38:47 UTC
It is of course possible that chronyd has been broken all along, and previous kernels were just ignoring the garbage.

Comment 2 Josh Boyer 2015-02-02 13:58:03 UTC
(In reply to Stuart D Gathman from comment #1)
> It is of course possible that chronyd has been broken all along, and
> previous kernels were just ignoring the garbage.

I've emailed upstream.  They included:

commit e8e3b6ddd0a6afa267029141946918392b425cf4
Author: Sasha Levin <sasha.levin@oracle.com>
Date:   Wed Dec 3 19:25:05 2014 -0500

    time: adjtimex: Validate the ADJ_FREQUENCY values
    
    commit 5e5aeb4367b450a28f447f6d5ab57d8f2ab16a5f upstream.
    
    Verify that the frequency value from userspace is valid and makes sense.
    
    Unverified values can cause overflows later on.
    
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
    [jstultz: Fix up bug for negative values and drop redunent cap check]
    Signed-off-by: John Stultz <john.stultz@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

in 3.18.5, which is very likely hitting this.  Your comment about it being broken all along might be true, but getting it unbroken is something we should figure out either way.

Comment 3 Miroslav Lichvar 2015-02-02 15:40:36 UTC
To me it looks like that commit should used MAXFREQ_SCALE instead of LONG_MAX/MIN. The frequency tolerance is 500 ppm, that's 500 * (1<<16) in the timex freq field.

--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -634,9 +634,9 @@ int ntp_validate_timex(struct timex *txc)
                return -EPERM;
 
        if (txc->modes & ADJ_FREQUENCY) {
-               if (LONG_MIN / PPM_SCALE > txc->freq)
+               if (-MAXFREQ_SCALED / PPM_SCALE > txc->freq)
                        return -EINVAL;
-               if (LONG_MAX / PPM_SCALE < txc->freq)
+               if (MAXFREQ_SCALED / PPM_SCALE < txc->freq)
                        return -EINVAL;
        }

Comment 4 Stuart D Gathman 2015-02-02 15:51:36 UTC
ntpd works ok, and I assume it also uses adjtimex.

Comment 5 Miroslav Lichvar 2015-02-02 15:59:45 UTC
ntpd sets the frequency directly only on start (after that the kernel PLL takes over) and it doesn't check for adjtimex errors.

Try ntptime -f 100, or any value between -500 and 500.

Comment 6 Sasha Levin 2015-02-02 16:09:54 UTC
(In reply to Stuart D Gathman from comment #0)
> Jan 31 08:33:17 elissa.gathman.org chronyd[722]: Frequency -35.554 +/- 1.838
> ppm read from /var/lib/chrony/drift
> Jan 31 08:33:17 elissa.gathman.org chronyd[722]: Fatal error : adjtimex
> failed for set_frequency, freq_ppm=-3.5554e+01 required_freq=3.5554e+01
> required_tick=10000
> Jan 31 08:33:17 elissa.gathman.org chronyd[718]: adjtimex failed for
> set_frequency, freq_ppm=-3.5554e+01 required_freq=3.5554e+01
> required_tick=10000

Could you share your /var/lib/chrony/drift please?

Comment 7 Stuart D Gathman 2015-02-02 16:33:44 UTC
# cat /var/lib/chrony/drift
          -35.554442             1.838092
# ntptime -f -35.554442
ntp_gettime() returns code 0 (OK)
  time d87a2511.ebc62160  Mon, Feb  2 2015 11:32:17.920, (.920992747),
  maximum error 82579 us, estimated error 25421 us, TAI offset 0
ntp_adjtime() call fails: Invalid argument

Comment 8 Stuart D Gathman 2015-02-02 17:20:19 UTC
Feb 02 10:44:33 localhost ntpd[9491]: ntpd 4.2.6p5@1.2349-o Fri Dec 19 19:02:22 UTC 2014 (1)
Feb 02 10:44:33 localhost ntpd[9492]: proto: precision = 1.467 usec
Feb 02 10:44:33 localhost ntpd[9492]: 0.0.0.0 c01d 0d kern kernel time sync enabled
...

Feb 02 10:44:34 localhost ntpd[9492]: 0.0.0.0 c016 06 restart
Feb 02 10:44:34 localhost ntpd[9492]: 0.0.0.0 c012 02 freq_set kernel 0.000 PPM
Feb 02 10:44:34 localhost ntpd[9492]: 0.0.0.0 c011 01 freq_not_set
Feb 02 10:44:34 localhost ntpd[9492]: 0.0.0.0 c61c 0c clock_step +0.159777 s
Feb 02 10:44:34 localhost ntpd[9492]: 0.0.0.0 c614 04 freq_mode
Feb 02 10:44:34 localhost systemd[1443]: Time has been changed
Feb 02 10:44:34 localhost systemd[1321]: Time has been changed
Feb 02 10:44:35 localhost ntpd[9492]: 0.0.0.0 c618 08 no_sys_peer
Feb 02 11:00:16 localhost ntpd[9492]: 0.0.0.0 c612 02 freq_set kernel 38.555 PPM
Feb 02 11:00:16 localhost ntpd[9492]: 0.0.0.0 c615 05 clock_sync

Comment 9 John Stultz 2015-02-02 18:47:09 UTC
So I've also heard issues on arm. Since the report is against i686, just wanted to confirm that this hasn't been reported on x86_64?

thanks
-john

Comment 10 Josh Boyer 2015-02-02 18:50:45 UTC
Looking at the Fedora update filed, I think that is correct, John.  Both reports I've seen have been i686 thus far.  That being said, this particular kernel is still in updates-testing at the moment, so it isn't wide spread.

Comment 11 John Stultz 2015-02-02 18:54:31 UTC
Yea, looks like it should have been LLONG_MAX / PPM_SCALE.

Comment 12 John Stultz 2015-02-02 19:09:23 UTC
Created attachment 987224 [details]
Potential fix to avoid false positives on 32bit systems

Can folks try this patch to validate it resolves the issue for them?

Comment 13 Peter Robinson 2015-02-02 23:35:29 UTC
I'm seeing this on ARMv7 with 3.19rc7 too btw:

01:00 AM root@cubietruck:~ $ uname -a
Linux cubietruck.home.roving-it.com 3.19.0-0.rc7.git0.1.fc22.armv7hl+lpae #1 SMP Mon Feb 2 15:17:41 UTC 2015 armv7l armv7l armv7l GNU/Linux
01:00 AM root@cubietruck:~ $ systemctl status chronyd.service -l
● chronyd.service - NTP client/server
   Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled)
   Active: failed (Result: exit-code) since Thu 1970-01-01 01:00:16 BST; 46s ago
  Process: 389 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=1/FAILURE)

Jan 01 01:00:16 cubietruck.home.roving-it.com systemd[1]: Starting NTP client/server...
Jan 01 01:00:16 cubietruck.home.roving-it.com chronyd[389]: adjtimex failed for set_frequency, freq_ppm=-7.0716e+00 required_freq=7.0716e+00 required_tick=10000
Jan 01 01:00:16 cubietruck.home.roving-it.com systemd[1]: chronyd.service: control process exited, code=exited status=1
Jan 01 01:00:16 cubietruck.home.roving-it.com systemd[1]: Failed to start NTP client/server.
Jan 01 01:00:16 cubietruck.home.roving-it.com systemd[1]: Unit chronyd.service entered failed state.
Jan 01 01:00:16 cubietruck.home.roving-it.com systemd[1]: chronyd.service failed.

Comment 14 John Stultz 2015-02-03 02:34:03 UTC
Peter: Yes, 32bit systems are affected, and the patch attached to this bug should resolve it. Let me know if it does not.

Comment 15 Fedora Update System 2015-02-03 03:49:11 UTC
kernel-3.18.5-201.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/kernel-3.18.5-201.fc21

Comment 16 Fedora Update System 2015-02-03 03:49:57 UTC
kernel-3.18.5-101.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.18.5-101.fc20

Comment 17 John Stultz 2015-02-03 04:11:08 UTC
So I assume this means the patch is working for you? I was hoping to get some tested-by lines to add to the submission.

Comment 18 Yu Watanabe 2015-02-03 12:22:44 UTC
The bug seems fixed on 3.18.5-201. Now chronyd works correctly for me on my i686 laptop.

Below is journal logs of chronyd.service on 3.18.5-200 and 3.18.5-201.

*3.18.5-200
%%%%%%%%%%%%%%%%%%%%%%%
Feb 03 21:12:42 vaiox.localdomain systemd[1]: Starting NTP client/server...
Feb 03 21:12:42 vaiox.localdomain chronyd[1690]: chronyd version 1.31 starting
Feb 03 21:12:42 vaiox.localdomain chronyd[1690]: Frequency 0.000 +/- 1000000.000 ppm read from /var/lib/chrony/drift
Feb 03 21:12:42 vaiox.localdomain systemd[1]: Started NTP client/server.
Feb 03 21:13:08 vaiox.localdomain chronyd[1690]: Selected source 157.7.154.23
Feb 03 21:13:08 vaiox.localdomain chronyd[1690]: Fatal error : adjtimex failed for set_frequency, freq_ppm=8.3333e+04 required_freq=-3.3333e+01 required_tick=9167
Feb 03 21:13:08 vaiox.localdomain systemd[1]: chronyd.service: main process exited, code=exited, status=1/FAILURE
Feb 03 21:13:08 vaiox.localdomain systemd[1]: Unit chronyd.service entered failed state.
Feb 03 21:13:08 vaiox.localdomain systemd[1]: chronyd.service failed.
%%%%%%%%%%%%%%%%%%%%%%%

*3.18.5-201
%%%%%%%%%%%%%%%%%%%%%%%
Feb 03 21:16:53 vaiox.localdomain systemd[1]: Starting NTP client/server...
Feb 03 21:16:53 vaiox.localdomain chronyd[1692]: chronyd version 1.31 starting
Feb 03 21:16:53 vaiox.localdomain chronyd[1692]: Frequency 0.000 +/- 1000000.000 ppm read from /var/lib/chrony/drift
Feb 03 21:16:53 vaiox.localdomain systemd[1]: Started NTP client/server.
Feb 03 21:17:09 vaiox.localdomain chronyd[1692]: Selected source 106.186.114.89
Feb 03 21:17:09 vaiox.localdomain chronyd[1692]: System clock wrong by 0.745615 seconds, adjustment started
%%%%%%%%%%%%%%%%%%%%%%%

Comment 19 Josh Boyer 2015-02-03 15:18:11 UTC
John, sorry for the miscommunication.  We looked at the patch and it looked correct so we added it to a build we were doing for another bug.  Happily, Yu has confirmed it does indeed work.

Comment 20 Fedora Update System 2015-02-04 08:01:10 UTC
Package kernel-3.18.5-201.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.18.5-201.fc21'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-1657/kernel-3.18.5-201.fc21
then log in and leave karma (feedback).

Comment 21 Fedora Update System 2015-02-06 04:00:08 UTC
kernel-3.18.5-201.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 22 Fedora Update System 2015-02-07 04:00:54 UTC
kernel-3.18.5-101.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.