Created attachment 342558 [details] nanosleep02.c Description of problem: This is the RHEL5.3.z candidate kernel which is due to GA tomorrow (7 May.). LTP nanosleep02 has failed on, pq0-0.lab.bos.redhat.com <<<test_start>>> tag=nanosleep02 stime=1241550838 cmdline="nanosleep02" contacts="" analysis=exit initiation_status="ok" <<<test_output>>> nanosleep02 0 WARN : This test could fail if the system was under load nanosleep02 0 WARN : due to the limitation of the way it calculates the nanosleep02 0 WARN : system call execution time. nanosleep02 1 FAIL : Remaining sleep time 3999009 usec doesn't match with the expected 3999357 usec time nanosleep02 1 FAIL : child process exited abnormally <<<execution_status>>> duration=1 termination_type=exited termination_id=1 corefile=no cutime=1 cstime=1 <<<test_end>>> This has been seen on kernel-debug and kernel-xen so far. NTP has been stopped prior to the test. I have not seen such failure before. Version-Release number of selected component (if applicable): kernel-2.6.18-128.1.10.el5 How reproducible: unknown Steps to Reproduce: 1. reserve pq0-0.lab.bos.redhat.com 2. yum -y install rh-tests-kernel-distribution-ltp-20090228 3. cd /mnt/tests/kernel/distribution/ltp/20090228 4. make testbuild 5. cd ltp-full-20090228/testcases/kernel/syscalls/nanosleep 6. ./nanosleep02 Actual results: Failed. Expected results: Passed.
Created attachment 342602 [details] simple reproducer This simple reproducer is doing the following, - parent forks a child and sleep a second. - child calls nanosleep(&timereq, &timerem) for 5 seconds. - parent send a signal to wake up the child. - compare timerem with (timereq - nanosleep syscall time). gettimeofday(&otime, NULL); nanosleep(&timereq, &timerem); gettimeofday(&ntime, NULL); req = timereq.tv_sec * 1000000 + timereq.tv_nsec / 1000; rem = timerem.tv_sec * 1000000 + timerem.tv_nsec / 1000; elapsed = (ntime.tv_sec - otime.tv_sec) * 1000000 + ntime.tv_usec - otime.tv_usec; if (rem - (req - elapsed) > USEC_PRECISION) Failed. else Passed. On kernels with the patch for bug 485323 like -133.el5 and -128.1.6.el5, the test can pass without any problem (20 times in a row). # ./nanosleep timereq.tv_sec = 5 timereq.tv_nsec = 9999 timerem.tv_sec = 3 timerem.tv_nsec = 999261678 otime.tv_sec = 1241593048 otime.tv_usec = 68589 ntime.tv_sec = 1241593049 ntime.tv_usec = 69367 req = timereq.tv_sec * 1000000 + timereq.tv_nsec / 1000 req = 5000009 rem = timerem.tv_sec * 1000000 + timerem.tv_nsec / 1000 rem = 3999261 elapsed = (ntime.tv_sec - otime.tv_sec) * 1000000 + ntime.tv_usec - otime.tv_usec elapsed = 1000778 rem - (req - elapsed) > USEC_PRECISION USEC_PRECISION = 250000 call succeeded Functionality of nanosleep() is correct On kernels without the patch like -134.el5 and -128.1.8.el5, the test is usually fail (seen it either 7 in 10 attempts or 1 in 20 attempts). I can reliably reproduce this on all IA-64 machines tested. # ./nanosleep timereq.tv_sec = 5 timereq.tv_nsec = 9999 timerem.tv_sec = 3 timerem.tv_nsec = 999863999 otime.tv_sec = 1241592788 otime.tv_usec = 355705 ntime.tv_sec = 1241592789 ntime.tv_usec = 355794 req = timereq.tv_sec * 1000000 + timereq.tv_nsec / 1000 req = 5000009 rem = timerem.tv_sec * 1000000 + timerem.tv_nsec / 1000 rem = 3999863 elapsed = (ntime.tv_sec - otime.tv_sec) * 1000000 + ntime.tv_usec - otime.tv_usec elapsed = 1000089 rem - (req - elapsed) > USEC_PRECISION USEC_PRECISION = 250000 Remaining sleep time 3999863 usec doesn't match with the expected 3999920 usec time child process exited abnormally The problem looks like we are either getting a smaller remaining sleep time (rem) or a bigger syscall time by gettimeofday().
(In reply to comment #1) > > On kernels with the patch for bug 485323 like -133.el5 and -128.1.6.el5, the Correction -- without the patch. > On kernels without the patch like -134.el5 and -128.1.8.el5, the test is Correction -- with the patch.
Hi Prarit, can you please look at this? It seems like most probably this issue is caused by your patch linux-2.6-ia64-use-current_kernel_time-xtime-in-hrtimer_start.patch
(In reply to comment #1) > > The problem looks like we are either getting a smaller remaining sleep time > (rem) or a bigger syscall time by gettimeofday(). Correction -- a smaller syscall time by gettimeofday(). Not sure if this helps, but on the patched kernel, the following syscall time measuring programs has showed the nanosleep() slept a little less. # cat nano.c #include <stdio.h> #include <sys/select.h> #include <sys/time.h> #include <time.h> #include <errno.h> #include <assert.h> #define DT(t0, t1) \ (t1.tv_sec - t0.tv_sec + 1e-6 * (t1.tv_usec - t0.tv_usec)) #define SEC_MAX 10 int main(void) { struct timeval t0, t1; int sec, ret; printf("\nsleep:\n"); for (sec = 1; sec <= SEC_MAX; sec++) { struct timespec ts = { .tv_sec = sec }; gettimeofday(&t0, NULL); nanosleep(&ts, NULL); gettimeofday(&t1, NULL); printf("%d %f\n", sec, DT(t0, t1)); } return 0; } # ./nano sleep: 1 1.001110 2 2.001231 3 3.000405 4 4.000541 5 5.000691 6 6.000837 7 6.999980 <--- smaller one 8 8.000126 9 9.001258 It is a little bit harder to trigger this though.
I'll poke around. How often does this occur? P.
Prarit, fairly often for the simple reproducer to fail -- seen it either 7 in 10 attempts or 1 in 20 attempts. It has been reliable reproduced on several IA-64 machines.
Cai, I've put in a reserve for an ia64 system for RHTS. I'm waiting for that to go through and then I'll make this a priority. I have a gut feeling that we're experiencing an overflow somewhere ... If you have a few minutes, could you write down the names of the systems you've seen this on? Thanks, P.
Almost all IA-64 machines tested either automatically or manually, bull1.rhts.bos.redhat.com nec-tx7-1.rhts.bos.redhat.com hp-rx3600-01.rhts.bos.redhat.com hp-diablo-01.rhts.bos.redhat.com altix3.rhts.bos.redhat.com nec-ibl1.rhts.bos.redhat.com pq0-0.lab.bos.redhat.com
When looking back the history of this test, I have not seen a single failure on IA-64 with, -92.el5, -122.el5, -123.el5, -125.el5, -128.el5, and -129.el5.
To make the test status more clearly. On kernels without the patch for bug 485323 like -133.el5 and -128.1.6.el5, the test can pass without any problem (100 times in a row). On kernels with the patch like -134.el5 and -128.1.8.el5, the test is usually fail (seen it either 7 in 10 attempts or 1 in 20 attempts). I can reliably reproduce this on all IA-64 machines tested. So the test result matrix is, -133.el5: PASSED -134.el5: FAILED -128.1.6.el5: PASSED -128.1.8.el5: FAILED
Cai, thanks for the list of systems. I have a patch that resolves the issue and I'm currently testing. It seems, however, that this can occur on *any* IA64 system or at least it did on the three systems I'm using. Oddly it happens more often when the cpu count is low ... My test procedure is to run the test described above, as well as the test in 485323 in a tight loop. So far, after 12 hours of testing on three ia64 systems (HP, SGI, and NEC) I haven't seen any issues with either test. I will attach the patch here, and POST later today after some more testing. I will warn you, however, that there is a drawback to the patch I'm posting. hrtimer_get_softirq_time() is called on every timer interrupt. When an hrtimer is active this means that an extra gettimeofday() call will be made. This is a minor performance hit. I've reworked the code so that this "extra" call is only made when timer interpolation is configured into the kernel, and timer interpolation is actually active. I've tried a few other solutions and they all end up resulting in an extra gettimeofday() call. So I'm going with the cleanest solution... P.
Created attachment 343430 [details] RHEL5 fix for this issue
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
I forwarded the patch in Comment #12 to Fujitsu to review (Linda approved it.) Here is Fujitsu's feedback on it: [Questions/confirmation about the regression] Is this regression issue "nanosleep() can sleep less than (1 msec at most?) the requested time, depending on the timing of xtime and nanosleep() to invoke"? Can it sleep like 2 msecs or more less than the requested time? [Concerns about the patch] This patch reverts the hrtimer_v2.patch (IT#263792) that got included in 134.el5 and adds something to the hrtimer.patch. Looking at the code, it fixes both IT#263792 and BZ#499289. The concern is that, as also written in comment #10 of BZ#485323, it'd introduce some performance impact because "reading ITC" operation will be run at every single timer interrupt. Probably, we need to do measurement on machines to figure out the exact impact. [Proposal patch] Fujitsu made a proposal patch as well to avoid performance issue. This patch should fix BZ#499289. This adds a correction value (offset) of ITC to xtime when setting up wake-up timing of nanosleep(). Could you review it and give us feedback, please? (Although Fujitsu confirmed IT#263792 didn't reproduce on 144.el5 with this patch applied, it's still development.)
Created attachment 343634 [details] Proposal patch from Fujitsu
(In reply to comment #17) > I forwarded the patch in Comment #12 to Fujitsu to review (Linda approved > it.) Here is Fujitsu's feedback on it: > > [Questions/confirmation about the regression] > Is this regression issue "nanosleep() can sleep less than (1 msec > at most?) the requested time, depending on the timing of xtime and > nanosleep() to invoke"? Can it sleep like 2 msecs or more less > than the requested time? > It appears to sleep several msecs less than the requested time. This is because we are no longer adjusting for the time interpolator. > [Concerns about the patch] > This patch reverts the hrtimer_v2.patch (IT#263792) that got included > in 134.el5 and adds something to the hrtimer.patch. Looking at the > code, it fixes both IT#263792 and BZ#499289. The concern is that, > as also written in comment #10 of BZ#485323, it'd introduce some > performance impact because "reading ITC" operation will be run at > every single timer interrupt. Probably, we need to do measurement > on machines to figure out the exact impact. Yes, I think that is a good idea. I suspect there will be a minimal impact but measuring it is good. > > [Proposal patch] > Fujitsu made a proposal patch as well to avoid performance issue. > This patch should fix BZ#499289. This adds a correction value > (offset) of ITC to xtime when setting up wake-up timing of nanosleep(). > Could you review it and give us feedback, please? > (Although Fujitsu confirmed IT#263792 didn't reproduce on 144.el5 with > this patch applied, it's still development.) I think we should go with my patch previously mentioned for now, unless we see a significant performance hit. Your patch introduces a lot of replicated code. P.
Prarit, After some more testing done, it turned out the Fujitsu's proposal patch didn't really fix the problem. And we agree to go with your patch. Are you planning to perform any measuring?
(In reply to comment #20) > Prarit, > > After some more testing done, it turned out the Fujitsu's proposal patch didn't > really fix the problem. And we agree to go with your patch. Are you planning > to perform any measuring? I'm going to discuss it with Shak's team shortly. I'll update the BZ with results. P.
in kernel-2.6.18-148.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html