Bug 434947 - adjtime() doesn't work correctly on ppc
Summary: adjtime() doesn't work correctly on ppc
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.6
Hardware: powerpc
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Ameet Paranjape
QA Contact: Martin Jenner
URL:
Whiteboard: rhts
Depends On:
Blocks: 431728 461297
TreeView+ depends on / blocked
 
Reported: 2008-02-26 14:19 UTC by Miroslav Lichvar
Modified: 2009-07-28 14:52 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-07-28 14:52:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Fix detection of need for time adjustment. (3.68 KB, text/plain)
2008-04-09 07:32 UTC, IBM Bug Proxy
no flags Details
Fix detection of need for time adjustment. v2 (3.56 KB, text/plain)
2008-04-09 07:32 UTC, IBM Bug Proxy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 43828 0 None None None Never

Description Miroslav Lichvar 2008-02-26 14:19:38 UTC
Description of problem:
When adjtime() (i.e. adjtimex in SINGLESHOT mode) is called with offset smaller
than 1000us, the clock is not adjusted.

Version-Release number of selected component (if applicable):
kernel-2.6.9-55.EL.ppc64

How reproducible:
Always

Steps to Reproduce:
1. gcc -x c - <<EOF
#include <stdio.h>
#include <sys/time.h>
#include <sys/timex.h>
#include <unistd.h>
#include <string.h>

int main() {
        struct timeval tv, otv;
        long usecs;

        while (1) {
                struct timex t;

                gettimeofday(&tv, NULL);
                usleep(1000000 - tv.tv_usec % 1000000);

                memset(&t, 0, sizeof (t));

                t.modes = ADJ_OFFSET_SINGLESHOT;
                t.offset = 999;
                if (adjtimex(&t) >= 0)
                        printf("%d\n", t.offset);
        }
        return 0;
}
EOF
2. ./a.out &
3. compare system time with a reference (e.g. ntpdate -q)
  
Actual results:
No adjustment.

Expected results:
Time adjusted.

Additional info:
This is blocking bug #431728, where ntpd in -x mode should be fixed to use only
adjtime() instead of switching between adjtimex() and adjtime().

Comment 2 IBM Bug Proxy 2008-04-07 15:56:51 UTC
------- Comment From mjwolf.com 2008-04-07 11:51 EDT-------
No, I haven't seen this error before. I have added some team members to the cc
list to see if they recognize this error.

------- Comment From mjwolf.com 2008-04-07 11:56 EDT-------
Also I'm a little confused here, the kernel version is listed as
kernel-2.6.9-55.EL.ppc64.  That is a rhel4u5 kernel.  I guess I expected that
the kernel version would have been -67 (rhel4u6) or a newer one for the upcoming
rhel4u7.  Can someone clarify if this problem is being seen on the newer kernels?

Comment 3 Miroslav Lichvar 2008-04-08 10:45:45 UTC
kernel-2.6.9-67.0.7.EL.ppc64 has the same problem.

Comment 4 IBM Bug Proxy 2008-04-09 07:32:39 UTC
Created attachment 301757 [details]
Fix detection of need for time adjustment.

When making the adjtimex() system call in ADJ_OFFSET_SINGLESHOT, we fail to
correctly detect that time needs to be adjusted.  This patch checks the current
time step at the begining of a decrementer tick and adjusts the varisou time
paramenters to cope with the requested change in ttime.  When we have
successfully adjusted time as requested revert to the "stable" values.

Fixes both the 32-bit (compat) and 64-bit cases.

Comment 5 IBM Bug Proxy 2008-04-09 07:32:40 UTC
Created attachment 301758 [details]
Fix detection of need for time adjustment. v2

When making the adjtimex() system call in ADJ_OFFSET_SINGLESHOT, we fail to
correctly detect that time needs to be adjusted.  This patch checks the current
time step at the begining of a decrementer tick and adjusts the various timeing
paramenters to cope with the requested change in time.	When we have
successfully adjusted time as requested revert to the "stable" values.
    
Fixes both the 32-bit (compat) and 64-bit cases.

Previous version had a DEBUG #define enabled by mistake.
Can we get the original reported to test?

Comment 6 Miroslav Lichvar 2008-04-09 16:39:28 UTC
With this patch applied it seems to work well.

Comment 7 IBM Bug Proxy 2008-04-09 23:48:31 UTC
------- Comment From tbreeds.com 2008-04-09 19:41 EDT-------
(In reply to comment #9)
> ------- Comment From mlichvar 2008-04-09 12:39 EST-------
> With this patch applied it seems to work well.

Okay I'll update this to submitted for inclusion by RHEL

Comment 8 Miroslav Lichvar 2008-04-10 10:16:43 UTC
Unfortunately the patch breaks ntpd running without -x option, clock frequency
offset is adjusted, but it doesn't have any effect.

Comment 9 Peter Martuccelli 2008-04-22 19:58:52 UTC
We are past code freeze and I do not see a patch proposed to resolve this issue
as an exception.  Radek you may want to consider reverting the ntp changes for R4.7.

Comment 12 IBM Bug Proxy 2008-05-09 19:24:56 UTC
------- Comment From mjwolf.com 2008-05-09 15:22 EDT-------
Is RedHat still looking for a patch for this problem.  I'm confused by the
comment above:

"------- Comment From peterm 2008-04-22 15:58 EST-------
We are past code freeze and I do not see a patch proposed to resolve this issue
as an exception.  Radek you may want to consider reverting the ntp changes for
R4.7."

Comment 13 IBM Bug Proxy 2008-07-01 19:48:50 UTC
------- Comment From mjwolf.com 2008-07-01 15:47 EDT-------
reopening this bug and marking as NEEDINFO.  Would RedHat like us to look at
this for RHEL4u8?

Comment 14 IBM Bug Proxy 2008-07-15 13:51:21 UTC
------- Comment From mjwolf.com 2008-07-15 09:49 EDT-------
Can IBM get a copy of the patched ntpd that showed the problem with adjtime()?

Comment 15 Miroslav Lichvar 2008-07-15 15:39:23 UTC
ntp packages showing the problem are available here:
http://people.redhat.com/mlichvar/tmp/ntpadjtime/

Comment 16 RHEL Program Management 2008-09-03 13:03:53 UTC
Updating PM score.

Comment 17 IBM Bug Proxy 2008-10-14 18:32:17 UTC
Hmm....from what I can see, the NTP along with adjtime are working as expected.

Here are my tests along with my results and why I think there is no problem here to be fixed.

I am running with the ntp package from Comment 20 and kernel
Linux version 2.6.9-78.EL (brewbuilder.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-10)) #1 SMP Wed Jul 9 15:37:15 EDT 2008

Make sure NTP server is up and running......
root> pgrep ntp
21838

Make sure drift file is set to 0.......
root> cat /var/lib/ntp/drift
0.000

Set the system clock to time from NTP server......
root> ntpdate -u 91.189.94.4  ;  date  ;  hwclock --show
14 Oct 13:05:04 ntpdate[23026]: step time server 91.189.94.4 offset 223005059.003048 sec
Tue Oct 14 13:05:04 CDT 2008
Thu 20 Sep 2001 04:14:06 PM CDT  -0.285046 seconds

Make sure clocks are still as expected
root> date  ;  hwclock --show
Tue Oct 14 13:05:17 CDT 2008
Thu 20 Sep 2001 04:14:19 PM CDT  -0.624382 seconds

Set hw clock to same as system clock.....
root> date  ;  hwclock --show  ;  hwclock --localtime --systohc   ;  date  ;  hwclock --show
Tue Oct 14 13:05:37 CDT 2008
Thu 20 Sep 2001 04:14:39 PM CDT  -0.384789 seconds
Tue Oct 14 13:05:40 CDT 2008
Tue 14 Oct 2008 01:05:41 PM CDT  -0.997783 seconds

Make sure clocks are still as expected....progressing together.....
root> date  ;  hwclock --show
Tue Oct 14 13:05:54 CDT 2008
Tue 14 Oct 2008 01:05:55 PM CDT  -0.010500 seconds

Set system clock to Sept 2001.....
root> date  ;  hwclock --show  ;  date 092011082001  ;  date  ;  hwclock --show
Tue Oct 14 13:06:00 CDT 2008
Tue 14 Oct 2008 01:06:01 PM CDT  -0.090070 seconds
Thu Sep 20 11:08:00 CDT 2001
Thu Sep 20 11:08:00 CDT 2001
Tue 14 Oct 2008 01:06:02 PM CDT  -0.996832 seconds

Make sure the two clocks are still as expected, and, are progressing.....
root> date  ;  hwclock --show
Thu Sep 20 11:08:08 CDT 2001
Tue 14 Oct 2008 01:06:10 PM CDT  -0.088859 seconds

Try to adjust the hw clock to be in sync with the system clock.....
root> date  ;  hwclock --show  ;  hwclock  --adjust  ;  date  ;  hwclock --show
Thu Sep 20 11:08:19 CDT 2001
Tue 14 Oct 2008 01:06:21 PM CDT  -0.631395 seconds
Thu Sep 20 11:08:21 CDT 2001
Tue 14 Oct 2008 01:06:23 PM CDT  -0.997863 seconds

Wait a few seconds then check times again......
root> date  ;  hwclock --show
Thu Sep 20 11:10:38 CDT 2001
Thu 20 Sep 2001 04:10:39 PM CDT  -0.996076 seconds

Comment 18 IBM Bug Proxy 2008-10-14 18:41:52 UTC
I ran some tests on another machine to help gather data points, and, noticed the same behavior there as well.

In particular, I ran some analysis tests on an x86 based Ubuntu system running with kernel.....
Linux version 2.6.20-17-generic (root@terranova) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #2 SMP Wed Aug 20 16:47:34 UTC 2008

And NTP .....  14 Oct 13:32:21 ntpdate[25739]: ntpdate 4.2.2p4 Wed Mar  7 20:43:31 UTC 2007 (1)

Some helpful links........

http://linux.die.net/man/8/ntpd
http://linux.die.net/man/2/settimeofday
http://linux.about.com/library/cmd/blcmdl8_hwclock.htm
http://ecorrado.us/scholarly/documents/chlug-time.pdf
http://linux.die.net/man/1/ntpd
http://linux.about.com/od/commands/l/blcmdl2_gettime.htm
http://www.experts-exchange.com/OS/Microsoft_Operating_Systems/Server/2003_Server/Q_22945569.html
http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch24_:_The_NTP_Server

This bug, as defined, does not appear to be a problem. However, if there is something else (which is not being covered by this particular bug), please open another bugzilla and add an abundance of data to help debug and resolve the problem, thanks in advance!!

Comment 19 Miroslav Lichvar 2008-10-15 14:35:12 UTC
I can still reproduce the problem with kernel-2.6.9-78.EL.

- clock offset adjusted to about 0.3 sec (the machine has a slightly slower clock, if uncorrected the offset will increase)
- added "statistics loopstats" to ntp.conf
- ntpd started with the -x option to use adjtime() and -s /tmp/ to specify a stats dir

The loopstats file in the stats dir shows:

54754 37605.169 0.325754698 0.000000 0.162877349 0.000000 4
54754 37632.213 0.311736644 4.756724 0.141229953 38.053790 4
54754 37649.243 0.297691084 9.299129 0.122510179 49.057199 4
54754 37666.272 0.293997093 13.785169 0.106113003 55.614098 4
54754 37683.302 0.278460071 15.112970 0.092224329 49.320700 4
...
54754 40947.741 0.130068301 494.654183 0.000783042 30.638022 4
54754 40964.760 0.128114085 496.486870 0.001189372 30.314612 4
54754 40981.777 0.128193493 498.442948 0.001030792 30.563231 4
54754 40998.794 0.128413508 498.810342 0.000899444 26.631221 4
54754 41015.811 0.128723566 500.000000 0.000794219 27.907437 4
54754 41032.829 0.128131188 500.000000 0.000748877 28.788194 4
54754 41049.845 0.128259471 500.000000 0.000651710 25.855169 4
54754 41066.863 0.128458423 500.000000 0.000573097 27.336054 4


The third column is the offset. ntpd is unable to reach zero, it hovers over 128 ms which is where ntpd crosses the 1000us value used for adjtime() when poll interval is 16 s and frequency offset has reached the 500ppm maximum.

A plot of the offset is here:
http://people.redhat.com/mlichvar/tmp/ntpadjtime/offset.png


Probably the easiest way to reproduce the bug is to run adjtime(999 us) in a loop and compare the clock with a reference.

Comment 21 IBM Bug Proxy 2009-03-18 13:50:53 UTC
Tony Breeds writes:
"I've been workign to try and reproduce this bug.  With the patch already
attached to this bug and the RPMS from RHEL 4.8 Beta.  I do not see the
problems detailed in comment 31.

I'm not denying the existencce of the problem but I'm unable to reporoduce it
on my hardware."

I'm going to close this bug as unreproducible

Comment 22 IBM Bug Proxy 2009-03-18 15:02:28 UTC
Hello Red Hat,
We are unable to reproduce this problem at this point using RHEL 4.8.
Please let us know if you are still able to recreate the problem on RHEL 4.8.

Thanks!

Comment 23 IBM Bug Proxy 2009-03-18 17:01:23 UTC
Reopening the bug.

Just to be clear.  We can reproduce the original problem stated in the bug.  However, when we apply the
patch that is currently attached to this bug we no longer see errors. We cannot reproduce the secondary
problem that was reported when the patch was tested at the distro

Comment 25 Miroslav Lichvar 2009-03-31 20:25:24 UTC
I can still reproduce the secondary problem with kernel 2.6.9-86 and the "Fix detection of need for time adjustment. v2" patch, git commit 0b221e6. Changing the kernel frequency offset with ntptime -f doesn't change the real frequency.

However, since the ntp bug depending on this probably won't get a chance to be fixed in RHEL4, I think this bug is very low priority.

Comment 26 Ameet Paranjape 2009-04-02 15:04:39 UTC
Miroslav,

Thanks for the update.  Does this mean you are willing to accept IBM's patch for the original problem?  Reading through the bug maybe we should open a new Bugzilla for the secondary problem and I can reverse mirror that to IBM.  What do you suggest?

-Ameet

Comment 27 Miroslav Lichvar 2009-04-02 15:34:17 UTC
Ameet,

the secondary problem is much worse than the original one, it would break ntp badly. I'd suggest to leave it as it is for now or close it as WONTFIX.

Comment 28 IBM Bug Proxy 2009-05-12 20:10:57 UTC
------- Comment From kumarr.com 2009-05-12 16:00 EDT-------
(In reply to comment #43)
> Ameet,
>
> the secondary problem is much worse than the original one, it would break ntp
> badly. I'd suggest to leave it as it is for now or close it as WONTFIX.
>

I am closing this bug as WILL_NOT_FIX, since this is unlikely to be addressed in the near future.
If this becomes a cirtical issue, please reopen and raise the severity of this bug.

Comment 29 Ameet Paranjape 2009-07-28 14:52:33 UTC
Closing on the Red Hat side as well.


Note You need to log in before you can comment on or make changes to this bug.