Bug 666264
| Summary: | ftrace: kernel/trace/ring_buffer.c:1987 rb_reserve_next_event | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | David Mair <dmair> | ||||||||||
| Component: | kernel | Assignee: | Jiri Olsa <jolsa> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||
| Severity: | medium | Docs Contact: | |||||||||||
| Priority: | medium | ||||||||||||
| Version: | 6.0 | CC: | james.leddy, jarod, jburke, mzywusko, srostedt | ||||||||||
| Target Milestone: | rc | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | x86_64 | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | kernel-2.6.32-130.el6 | Doc Type: | Bug Fix | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | |||||||||||||
| : | 666990 (view as bug list) | Environment: | |||||||||||
| Last Closed: | 2011-05-19 12:30:57 UTC | Type: | --- | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | |||||||||||||
| Bug Blocks: | 645519, 666990 | ||||||||||||
| Attachments: |
|
||||||||||||
|
Description
David Mair
2010-12-29 22:39:22 UTC
Dave, was this report filed from ABRT? If so we have two bugs, ABRT did not report everything from the --[ please cut here ]-- line.
This is regarding timestamps:
1981 if (unlikely(*delta > (1ULL << 59) && !once++)) {
1982 printk(KERN_WARNING "Delta way too big! %llu"
1983 " ts=%llu write stamp = %llu\n",
1984 (unsigned long long)*delta,
1985 (unsigned long long)*ts,
1986 (unsigned long long)cpu_buffer->write_stamp);
1987 WARN_ON(1);
1988 }
1989
As far as the ring buffer is concerned, we either need a way for the ring buffer to know that the computer was suspended recently, or we need to get rid of that warning entirely.
(In reply to comment #2) > Dave, was this report filed from ABRT? If so we have two bugs, ABRT did not > report everything from the --[ please cut here ]-- line. > > This is regarding timestamps: > > 1981 if (unlikely(*delta > (1ULL << 59) && !once++)) { > 1982 printk(KERN_WARNING "Delta way too big! %llu" > 1983 " ts=%llu write stamp = %llu\n", > 1984 (unsigned long long)*delta, > 1985 (unsigned long long)*ts, > 1986 (unsigned long long)cpu_buffer->write_stamp); > 1987 WARN_ON(1); > 1988 } > 1989 > > As far as the ring buffer is concerned, we either need a way for the ring > buffer to know that the computer was suspended recently, or we need to get rid > of that warning entirely. @James M. Leddy: Yes, this was reported by ABRT -- I would not have even been aware there was a problem if ABRT had not brought it to my attention. I'd agree if ABRT is not including everything then we have two bugs here. This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. hi, can you see/paste the actuall warning message? "Delta way too big... " the code seems to handle 2^59 ns, which should be big enough to handle suspend time, so maybe the time weent crazy, and the message would help.. thanks, jirka I could reproduce with following message, please let me know if you see the same -> "write_stamp = 0" "Delta way too big! 18446744046721091834 ts=18446744046721091834 write stamp = 0" thanks, jirka Yes that 2^59 is nanoseconds: ~6671 days ~18 years (more than we support any version of RHEL ;-) It gets the value from sched_clock, the probable cause is that time went backwards according to sched_clock. Perhaps I could write a patch that can avoid this on resume? Or just remove the warning, but it may cause a strange timestamp on whoever is reading from the buffer. -- Steve (In reply to comment #7) > Yes that 2^59 is nanoseconds: ~6671 days ~18 years (more than we support any > version of RHEL ;-) > > It gets the value from sched_clock, the probable cause is that time went > backwards according to sched_clock. Perhaps I could write a patch that can > avoid this on resume? This makes the most sense to me. > Or just remove the warning, but it may cause a strange > timestamp on whoever is reading from the buffer. I think we're confusing more suspend/resume users with this WARN_ON than potential ftrace users if we didn't have this warning. But the warning only happens if we ever enabled tracing, correct? I'll see if I can get a patch to fix this up. (In reply to comment #9) > But the warning only happens if we ever enabled tracing, correct? > > I'll see if I can get a patch to fix this up. true for my test.. AFAICS the reason is the write_stamp of the rb page is updated when the page is commited. When there's resume, the page stays with write_stamp set to 0, and hance trigerring the warning on the wakeup. jirka > AFAICS the reason is the write_stamp of the rb page is updated when the
> page is commited. When there's resume, the page stays with write_stamp
> set to 0, and hance trigerring the warning on the wakeup.
ok, this is wrong.. ;)
(In reply to comment #11) > > AFAICS the reason is the write_stamp of the rb page is updated when the > > page is commited. When there's resume, the page stays with write_stamp > > set to 0, and hance trigerring the warning on the wakeup. > > ok, this is wrong.. ;) Right, a write to the ring buffer to a commit happens with preemption disabled, so it is impossible for a write to start and then we suspend, and then resume and then finish the commit. And the timestamp written at commit is the per cpu ring buffer timestamp, not the rb page timestamp, which is updated when a writer enters the page. looks like workaround is to use global trace_clock: echo global > /debug/tracing/trace_clock the local trace clod returns bad time after the resume Created attachment 472893 [details]
deny the usage of local clock if it is unstable
AFAICS once the sched_clock_stable is set, I think we should not depend on it
the way we do: using it as a timestamp in the ring buffer
the patch is based on the above assumption and sets the global clock,
when the unstable local clock is detected
anyway, I'm not sure if there're better options in such case.. it probably
has performance impact
maybe we can set the global clock when we got notified of suspend? ;)
Talking with Peter Zijlstra, there exists some boxes that have sched_clock() go backwards on resume, which will cause this error message. But because this error message only happens if we are tracing across a suspend/resume on buggy hardware, does it really need to be fixed? Note, Peter said sched_clock() doesn't do this on newer kernels. Perhaps back porting that change may fix this. The user will only see it if they initiated a trace and then did s2r. Normal users that do not run ftrace will not see this error doing s2r. Jiri, yes it has performance impact, I don't want to had that if we don't need it. I'm about to attach two patches, could you test again with them, and see if it fixes the issue. Thanks, -- Steve Created attachment 473123 [details]
fix sched clock on resume
this is taken from upstream.
Created attachment 473124 [details]
fix bug introduced by previous patch
The previous patch introduced a preemption leak. This fixes it.
(In reply to comment #16) > Jiri, yes it has performance impact, I don't want to had that if we don't need > it. I'm about to attach two patches, could you test again with them, and see if > it fixes the issue. sure, I'll test it during Friday and let you know hi, I tested on current tip tree, which already has both patches above and still fails.. as for RHEL6, I can see following commit already including both patches from above: [x86] tsc, sched: Recompute cyc2ns_offset's during resume from sleep states commit ab7ccb103a196f81ad9f1f80575a02edbffd15b5 Author: Matthew Garrett <mjg> Date: Wed Sep 29 16:57:27 2010 -0400 I tried on RHEL6 and it has the same behaviour warning from the latest RHEL6: Restarting tasks ... Delta way too big! 18446744035752685168 ts=18446744044477216573 write stamp = 8724531405 done. ------------[ cut here ]------------ WARNING: at kernel/trace/ring_buffer.c:1987 rb_reserve_next_event+0x236/0x320() (Not tainted) Hardware name: Montevina platform Modules linked in: Pid: 361, comm: udevd Not tainted 2.6.32kp #100 Call Trace: [<ffffffff810427e7>] warn_slowpath_common+0x87/0xc0 [<ffffffff8104283a>] warn_slowpath_null+0x1a/0x20 [<ffffffff810a02d6>] rb_reserve_next_event+0x236/0x320 [<ffffffff810a0572>] ring_buffer_lock_reserve+0xa2/0x160 [<ffffffff810a23ac>] trace_buffer_lock_reserve+0x2c/0x70 [<ffffffff810a2546>] trace_current_buffer_lock_reserve+0x16/0x20 [<ffffffff8100cece>] ftrace_raw_event_sys_exit+0x4e/0xc0 [<ffffffff8100e033>] syscall_trace_leave+0x93/0xc0 [<ffffffff810032e0>] int_check_syscall_exit_work+0x34/0x3d ---[ end trace bd05bfed46c2dc35 ]--- wbr, jirka with this guiltyfunc: bug 620026 (In reply to comment #15) > Talking with Peter Zijlstra, there exists some boxes that have sched_clock() go > backwards on resume, which will cause this error message. But because this > error message only happens if we are tracing across a suspend/resume on buggy > hardware, does it really need to be fixed? Note, Peter said sched_clock() > doesn't do this on newer kernels. Perhaps back porting that change may fix > this. > > The user will only see it if they initiated a trace and then did s2r. Normal > users that do not run ftrace will not see this error doing s2r. I agree, if there's no other patch/fix for this behaviour, I think we could leave this one not fixed with the workaround (setting clock to global). thoughts? ping..? While we're on the subject, we might include the message in the WARN_ON, so that ABRT gets the entire message from ---[ please cut here ]--- see drivers/pci/dmar.c:594 in bug 672526 Created attachment 475383 [details]
warn on unstable trace clock during the suspend
steve, would something like this be acceptable?
thanks,
jirka
Jirka,
What about instead changing the warning in the ring buffer to specify that. Something like:
WARN_ONCE(delta > (1ULL << 59),
"Delta way too big! %llu ts=%llu write stamp = %llu\n"
" If you just came from a suspend/resume, please switch\n"
" to the trace global clock:\n"
" echo global > /sys/kernel/debug/tracing/trace_clock\n",
(unsigned long long)delta,
(unsigned long long)ts,
(unsigned long long)cpu_buffer->write_stamp);
-- Steve
(In reply to comment #27) > Jirka, > > What about instead changing the warning in the ring buffer to specify that. > Something like: > > WARN_ONCE(delta > (1ULL << 59), > "Delta way too big! %llu ts=%llu write stamp = %llu\n" > " If you just came from a suspend/resume, please switch\n" > " to the trace global clock:\n" > " echo global > /sys/kernel/debug/tracing/trace_clock\n", > (unsigned long long)delta, > (unsigned long long)ts, > (unsigned long long)cpu_buffer->write_stamp); > > -- Steve fine with me.. please let me know if you need me to send it upstream thanks, jirka This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release. sent upstream https://lkml.org/lkml/2011/2/2/128 Peter Zijlstra just informed me that the sched_clock() going backwards was fixed by: cd7240c0b900eb6d690ccee088a6c9b46dae815a x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states Can we see if this prevents the warning to appear? If it does, then we do not need to add the current patch. -- Steve (In reply to comment #32) > Peter Zijlstra just informed me that the sched_clock() going backwards was > fixed by: > > cd7240c0b900eb6d690ccee088a6c9b46dae815a > x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states > > Can we see if this prevents the warning to appear? > > If it does, then we do not need to add the current patch. > > -- Steve as for my test machine, I got the warning again: (I used current tip tree) [ 103.126387] ------------[ cut here ]------------ [ 103.126394] WARNING: at kernel/trace/ring_buffer.c:2173 rb_reserve_next_event+0x450/0x5d0() [ 103.126396] Hardware name: Montevina platform [ 103.126398] Delta way too big! 18446744035988710276 ts=18446744046271032527 write stamp = 10282322251 [ 103.126399] If you just came from a suspend/resume, [ 103.126400] please switch to the trace global clock: [ 103.126400] echo global > /sys/kernel/debug/tracing/trace_clock [ 103.126402] Modules linked in: [ 103.126404] Pid: 1, comm: init Not tainted 2.6.38-rc5-tip+ #460 [ 103.126406] Call Trace: [ 103.126412] [<ffffffff810800df>] ? warn_slowpath_common+0x7f/0xc0 [ 103.126414] [<ffffffff810801d6>] ? warn_slowpath_fmt+0x46/0x50 [ 103.126418] [<ffffffff81046d65>] ? native_sched_clock+0x25/0xa0 [ 103.126421] [<ffffffff810e1e40>] ? rb_reserve_next_event+0x450/0x5d0 [ 103.126424] [<ffffffff810e2136>] ? ring_buffer_lock_reserve+0xb6/0x180 [ 103.126426] [<ffffffff810e662c>] ? trace_buffer_lock_reserve+0x2c/0x70 [ 103.126429] [<ffffffff810e7156>] ? trace_current_buffer_lock_reserve+0x16/0x20 [ 103.126432] [<ffffffff8104a9c1>] ? ftrace_raw_event_sys_enter+0x51/0x130 [ 103.126434] [<ffffffff8104ae71>] ? syscall_trace_enter+0x101/0x1c0 [ 103.126438] [<ffffffff8103f24f>] ? tracesys+0x71/0xd5 [ 103.126439] ---[ end trace cd128abebe4b8374 ]--- with this guiltyfunc: bug 527824 bug 528295 bug 528296 bug 528521 bug 528768 bug 530380 bug 536855 bug 536985 bug 542694 bug 562008 version 2 sent upstream - https://lkml.org/lkml/2011/2/18/172 Patch(es) available on kernel-2.6.32-130.el6 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html |