Bug 434947 - adjtime() doesn't work correctly on ppc
adjtime() doesn't work correctly on ppc
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.6
powerpc Linux
low Severity medium
: rc
: ---
Assigned To: Ameet Paranjape
Martin Jenner
rhts
: OtherQA
Depends On:
Blocks: 431728 461297
  Show dependency treegraph
 
Reported: 2008-02-26 09:19 EST by Miroslav Lichvar
Modified: 2009-07-28 10:52 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-07-28 10:52:33 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fix detection of need for time adjustment. (3.68 KB, text/plain)
2008-04-09 03:32 EDT, IBM Bug Proxy
no flags Details
Fix detection of need for time adjustment. v2 (3.56 KB, text/plain)
2008-04-09 03:32 EDT, IBM Bug Proxy
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 43828 None None None Never

  None (edit)
Description Miroslav Lichvar 2008-02-26 09:19:38 EST
Description of problem:
When adjtime() (i.e. adjtimex in SINGLESHOT mode) is called with offset smaller
than 1000us, the clock is not adjusted.

Version-Release number of selected component (if applicable):
kernel-2.6.9-55.EL.ppc64

How reproducible:
Always

Steps to Reproduce:
1. gcc -x c - <<EOF
#include <stdio.h>
#include <sys/time.h>
#include <sys/timex.h>
#include <unistd.h>
#include <string.h>

int main() {
        struct timeval tv, otv;
        long usecs;

        while (1) {
                struct timex t;

                gettimeofday(&tv, NULL);
                usleep(1000000 - tv.tv_usec % 1000000);

                memset(&t, 0, sizeof (t));

                t.modes = ADJ_OFFSET_SINGLESHOT;
                t.offset = 999;
                if (adjtimex(&t) >= 0)
                        printf("%d\n", t.offset);
        }
        return 0;
}
EOF
2. ./a.out &
3. compare system time with a reference (e.g. ntpdate -q)
  
Actual results:
No adjustment.

Expected results:
Time adjusted.

Additional info:
This is blocking bug #431728, where ntpd in -x mode should be fixed to use only
adjtime() instead of switching between adjtimex() and adjtime().
Comment 2 IBM Bug Proxy 2008-04-07 11:56:51 EDT
------- Comment From mjwolf@us.ibm.com 2008-04-07 11:51 EDT-------
No, I haven't seen this error before. I have added some team members to the cc
list to see if they recognize this error.

------- Comment From mjwolf@us.ibm.com 2008-04-07 11:56 EDT-------
Also I'm a little confused here, the kernel version is listed as
kernel-2.6.9-55.EL.ppc64.  That is a rhel4u5 kernel.  I guess I expected that
the kernel version would have been -67 (rhel4u6) or a newer one for the upcoming
rhel4u7.  Can someone clarify if this problem is being seen on the newer kernels?
Comment 3 Miroslav Lichvar 2008-04-08 06:45:45 EDT
kernel-2.6.9-67.0.7.EL.ppc64 has the same problem.
Comment 4 IBM Bug Proxy 2008-04-09 03:32:39 EDT
Created attachment 301757 [details]
Fix detection of need for time adjustment.

When making the adjtimex() system call in ADJ_OFFSET_SINGLESHOT, we fail to
correctly detect that time needs to be adjusted.  This patch checks the current
time step at the begining of a decrementer tick and adjusts the varisou time
paramenters to cope with the requested change in ttime.  When we have
successfully adjusted time as requested revert to the "stable" values.

Fixes both the 32-bit (compat) and 64-bit cases.
Comment 5 IBM Bug Proxy 2008-04-09 03:32:40 EDT
Created attachment 301758 [details]
Fix detection of need for time adjustment. v2

When making the adjtimex() system call in ADJ_OFFSET_SINGLESHOT, we fail to
correctly detect that time needs to be adjusted.  This patch checks the current
time step at the begining of a decrementer tick and adjusts the various timeing
paramenters to cope with the requested change in time.	When we have
successfully adjusted time as requested revert to the "stable" values.
    
Fixes both the 32-bit (compat) and 64-bit cases.

Previous version had a DEBUG #define enabled by mistake.
Can we get the original reported to test?
Comment 6 Miroslav Lichvar 2008-04-09 12:39:28 EDT
With this patch applied it seems to work well.
Comment 7 IBM Bug Proxy 2008-04-09 19:48:31 EDT
------- Comment From tbreeds@au1.ibm.com 2008-04-09 19:41 EDT-------
(In reply to comment #9)
> ------- Comment From mlichvar@redhat.com 2008-04-09 12:39 EST-------
> With this patch applied it seems to work well.

Okay I'll update this to submitted for inclusion by RHEL
Comment 8 Miroslav Lichvar 2008-04-10 06:16:43 EDT
Unfortunately the patch breaks ntpd running without -x option, clock frequency
offset is adjusted, but it doesn't have any effect.
Comment 9 Peter Martuccelli 2008-04-22 15:58:52 EDT
We are past code freeze and I do not see a patch proposed to resolve this issue
as an exception.  Radek you may want to consider reverting the ntp changes for R4.7.
Comment 12 IBM Bug Proxy 2008-05-09 15:24:56 EDT
------- Comment From mjwolf@us.ibm.com 2008-05-09 15:22 EDT-------
Is RedHat still looking for a patch for this problem.  I'm confused by the
comment above:

"------- Comment From peterm@redhat.com 2008-04-22 15:58 EST-------
We are past code freeze and I do not see a patch proposed to resolve this issue
as an exception.  Radek you may want to consider reverting the ntp changes for
R4.7."
Comment 13 IBM Bug Proxy 2008-07-01 15:48:50 EDT
------- Comment From mjwolf@us.ibm.com 2008-07-01 15:47 EDT-------
reopening this bug and marking as NEEDINFO.  Would RedHat like us to look at
this for RHEL4u8?
Comment 14 IBM Bug Proxy 2008-07-15 09:51:21 EDT
------- Comment From mjwolf@us.ibm.com 2008-07-15 09:49 EDT-------
Can IBM get a copy of the patched ntpd that showed the problem with adjtime()?
Comment 15 Miroslav Lichvar 2008-07-15 11:39:23 EDT
ntp packages showing the problem are available here:
http://people.redhat.com/mlichvar/tmp/ntpadjtime/
Comment 16 RHEL Product and Program Management 2008-09-03 09:03:53 EDT
Updating PM score.
Comment 17 IBM Bug Proxy 2008-10-14 14:32:17 EDT
Hmm....from what I can see, the NTP along with adjtime are working as expected.

Here are my tests along with my results and why I think there is no problem here to be fixed.

I am running with the ntp package from Comment 20 and kernel
Linux version 2.6.9-78.EL (brewbuilder@js20-bc1-8.build.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-10)) #1 SMP Wed Jul 9 15:37:15 EDT 2008

Make sure NTP server is up and running......
root> pgrep ntp
21838

Make sure drift file is set to 0.......
root> cat /var/lib/ntp/drift
0.000

Set the system clock to time from NTP server......
root> ntpdate -u 91.189.94.4  ;  date  ;  hwclock --show
14 Oct 13:05:04 ntpdate[23026]: step time server 91.189.94.4 offset 223005059.003048 sec
Tue Oct 14 13:05:04 CDT 2008
Thu 20 Sep 2001 04:14:06 PM CDT  -0.285046 seconds

Make sure clocks are still as expected
root> date  ;  hwclock --show
Tue Oct 14 13:05:17 CDT 2008
Thu 20 Sep 2001 04:14:19 PM CDT  -0.624382 seconds

Set hw clock to same as system clock.....
root> date  ;  hwclock --show  ;  hwclock --localtime --systohc   ;  date  ;  hwclock --show
Tue Oct 14 13:05:37 CDT 2008
Thu 20 Sep 2001 04:14:39 PM CDT  -0.384789 seconds
Tue Oct 14 13:05:40 CDT 2008
Tue 14 Oct 2008 01:05:41 PM CDT  -0.997783 seconds

Make sure clocks are still as expected....progressing together.....
root> date  ;  hwclock --show
Tue Oct 14 13:05:54 CDT 2008
Tue 14 Oct 2008 01:05:55 PM CDT  -0.010500 seconds

Set system clock to Sept 2001.....
root> date  ;  hwclock --show  ;  date 092011082001  ;  date  ;  hwclock --show
Tue Oct 14 13:06:00 CDT 2008
Tue 14 Oct 2008 01:06:01 PM CDT  -0.090070 seconds
Thu Sep 20 11:08:00 CDT 2001
Thu Sep 20 11:08:00 CDT 2001
Tue 14 Oct 2008 01:06:02 PM CDT  -0.996832 seconds

Make sure the two clocks are still as expected, and, are progressing.....
root> date  ;  hwclock --show
Thu Sep 20 11:08:08 CDT 2001
Tue 14 Oct 2008 01:06:10 PM CDT  -0.088859 seconds

Try to adjust the hw clock to be in sync with the system clock.....
root> date  ;  hwclock --show  ;  hwclock  --adjust  ;  date  ;  hwclock --show
Thu Sep 20 11:08:19 CDT 2001
Tue 14 Oct 2008 01:06:21 PM CDT  -0.631395 seconds
Thu Sep 20 11:08:21 CDT 2001
Tue 14 Oct 2008 01:06:23 PM CDT  -0.997863 seconds

Wait a few seconds then check times again......
root> date  ;  hwclock --show
Thu Sep 20 11:10:38 CDT 2001
Thu 20 Sep 2001 04:10:39 PM CDT  -0.996076 seconds
Comment 18 IBM Bug Proxy 2008-10-14 14:41:52 EDT
I ran some tests on another machine to help gather data points, and, noticed the same behavior there as well.

In particular, I ran some analysis tests on an x86 based Ubuntu system running with kernel.....
Linux version 2.6.20-17-generic (root@terranova) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #2 SMP Wed Aug 20 16:47:34 UTC 2008

And NTP .....  14 Oct 13:32:21 ntpdate[25739]: ntpdate 4.2.2p4@1.1585-o Wed Mar  7 20:43:31 UTC 2007 (1)

Some helpful links........

http://linux.die.net/man/8/ntpd
http://linux.die.net/man/2/settimeofday
http://linux.about.com/library/cmd/blcmdl8_hwclock.htm
http://ecorrado.us/scholarly/documents/chlug-time.pdf
http://linux.die.net/man/1/ntpd
http://linux.about.com/od/commands/l/blcmdl2_gettime.htm
http://www.experts-exchange.com/OS/Microsoft_Operating_Systems/Server/2003_Server/Q_22945569.html
http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch24_:_The_NTP_Server

This bug, as defined, does not appear to be a problem. However, if there is something else (which is not being covered by this particular bug), please open another bugzilla and add an abundance of data to help debug and resolve the problem, thanks in advance!!
Comment 19 Miroslav Lichvar 2008-10-15 10:35:12 EDT
I can still reproduce the problem with kernel-2.6.9-78.EL.

- clock offset adjusted to about 0.3 sec (the machine has a slightly slower clock, if uncorrected the offset will increase)
- added "statistics loopstats" to ntp.conf
- ntpd started with the -x option to use adjtime() and -s /tmp/ to specify a stats dir

The loopstats file in the stats dir shows:

54754 37605.169 0.325754698 0.000000 0.162877349 0.000000 4
54754 37632.213 0.311736644 4.756724 0.141229953 38.053790 4
54754 37649.243 0.297691084 9.299129 0.122510179 49.057199 4
54754 37666.272 0.293997093 13.785169 0.106113003 55.614098 4
54754 37683.302 0.278460071 15.112970 0.092224329 49.320700 4
...
54754 40947.741 0.130068301 494.654183 0.000783042 30.638022 4
54754 40964.760 0.128114085 496.486870 0.001189372 30.314612 4
54754 40981.777 0.128193493 498.442948 0.001030792 30.563231 4
54754 40998.794 0.128413508 498.810342 0.000899444 26.631221 4
54754 41015.811 0.128723566 500.000000 0.000794219 27.907437 4
54754 41032.829 0.128131188 500.000000 0.000748877 28.788194 4
54754 41049.845 0.128259471 500.000000 0.000651710 25.855169 4
54754 41066.863 0.128458423 500.000000 0.000573097 27.336054 4


The third column is the offset. ntpd is unable to reach zero, it hovers over 128 ms which is where ntpd crosses the 1000us value used for adjtime() when poll interval is 16 s and frequency offset has reached the 500ppm maximum.

A plot of the offset is here:
http://people.redhat.com/mlichvar/tmp/ntpadjtime/offset.png


Probably the easiest way to reproduce the bug is to run adjtime(999 us) in a loop and compare the clock with a reference.
Comment 21 IBM Bug Proxy 2009-03-18 09:50:53 EDT
Tony Breeds writes:
"I've been workign to try and reproduce this bug.  With the patch already
attached to this bug and the RPMS from RHEL 4.8 Beta.  I do not see the
problems detailed in comment 31.

I'm not denying the existencce of the problem but I'm unable to reporoduce it
on my hardware."

I'm going to close this bug as unreproducible
Comment 22 IBM Bug Proxy 2009-03-18 11:02:28 EDT
Hello Red Hat,
We are unable to reproduce this problem at this point using RHEL 4.8.
Please let us know if you are still able to recreate the problem on RHEL 4.8.

Thanks!
Comment 23 IBM Bug Proxy 2009-03-18 13:01:23 EDT
Reopening the bug.

Just to be clear.  We can reproduce the original problem stated in the bug.  However, when we apply the
patch that is currently attached to this bug we no longer see errors. We cannot reproduce the secondary
problem that was reported when the patch was tested at the distro
Comment 25 Miroslav Lichvar 2009-03-31 16:25:24 EDT
I can still reproduce the secondary problem with kernel 2.6.9-86 and the "Fix detection of need for time adjustment. v2" patch, git commit 0b221e6. Changing the kernel frequency offset with ntptime -f doesn't change the real frequency.

However, since the ntp bug depending on this probably won't get a chance to be fixed in RHEL4, I think this bug is very low priority.
Comment 26 Ameet Paranjape 2009-04-02 11:04:39 EDT
Miroslav,

Thanks for the update.  Does this mean you are willing to accept IBM's patch for the original problem?  Reading through the bug maybe we should open a new Bugzilla for the secondary problem and I can reverse mirror that to IBM.  What do you suggest?

-Ameet
Comment 27 Miroslav Lichvar 2009-04-02 11:34:17 EDT
Ameet,

the secondary problem is much worse than the original one, it would break ntp badly. I'd suggest to leave it as it is for now or close it as WONTFIX.
Comment 28 IBM Bug Proxy 2009-05-12 16:10:57 EDT
------- Comment From kumarr@linux.ibm.com 2009-05-12 16:00 EDT-------
(In reply to comment #43)
> Ameet,
>
> the secondary problem is much worse than the original one, it would break ntp
> badly. I'd suggest to leave it as it is for now or close it as WONTFIX.
>

I am closing this bug as WILL_NOT_FIX, since this is unlikely to be addressed in the near future.
If this becomes a cirtical issue, please reopen and raise the severity of this bug.
Comment 29 Ameet Paranjape 2009-07-28 10:52:33 EDT
Closing on the Red Hat side as well.

Note You need to log in before you can comment on or make changes to this bug.