Bug 666264

Summary:

ftrace: kernel/trace/ring_buffer.c:1987 rb_reserve_next_event

Product:

Red Hat Enterprise Linux 6

Reporter:

David Mair <dmair>

Component:

kernel

Assignee:

Jiri Olsa <jolsa>

Status:

CLOSED ERRATA

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

6.0

CC:

james.leddy, jarod, jburke, mzywusko, srostedt

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

kernel-2.6.32-130.el6

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

666990 (view as bug list)

Environment:

Last Closed:

2011-05-19 12:30:57 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

645519, 666990

Attachments:

Description	Flags
deny the usage of local clock if it is unstable	none
fix sched clock on resume	none
fix bug introduced by previous patch	none
warn on unstable trace clock during the suspend	none

Description David Mair 2010-12-29 22:39:22 UTC

Description of problem:
Kernel warning occurs when resuming from sleep.  This may only be a warning message and therefore nothing to fix; however, this event triggers an ABRT warning so, IMHO, we need to either correct the condition that causes the warning message or we need to tweak ABRT to disregard warning messages. I suspect this situation will lead to many tickets being filed once RHEL6 becomes more widely deployed.  One case has already been opened against this during the beta. BZ620026

Version-Release number of selected component (if applicable):
kernel-2.6.32-71.el6.x86_64

How reproducible:
Every time

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
architecture: x86_64
cmdline: not_applicable
comment: Not sure if the docking station is necessary to reproduce this or not.
component: kernel
executable: kernel
kernel: 2.6.32-71.el6.x86_64
package: kernel
reason: WARNING: at kernel/trace/ring_buffer.c:1987 rb_reserve_next_event+0x2b2/0x350() (Tainted: G        W  ---------------- )
release: Red Hat Enterprise Linux Workstation release 6.0 (Santiago)
sosreport.tar.xz: /var/spool/abrt/kerneloops-1291124686-2417-1/sosreport.tar.xz
time: 1291507840
uid: 0

backtrace
-----
WARNING: at kernel/trace/ring_buffer.c:1987 rb_reserve_next_event+0x2b2/0x350() (Tainted: G        W  ---------------- )
Hardware name: 2241B36
Modules linked in: authenc deflate zlib_deflate ctr camellia cast5 rmd160 crypto_null ccm serpent blowfish twofish_x86_64 twofish_common xcbc cbc sha256_generic sha512_generic des_generic tunnel4 xfrm_ipcomp tunnel6 ext2 fuse rfcomm sco bridge bnep l2cap autofs4 sunrpc fcoe libfcoe 8021q garp libfc stp llc scsi_transport_fc scsi_tgt cpufreq_ondemand acpi_cpufreq freq_table cachefiles fscache ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log kvm uinput thinkpad_acpi hwmon wmi btusb bluetooth sg uvcvideo videodev v4l2_compat_ioctl32 i2c_i801 iTCO_wdt iTCO_vendor_support arc4 ecb iwlagn iwlcore mac80211 cfg80211 rfkill snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000e ext4 mbcache jbd2 cryptd aes_x86_64 aes_generic xts gf128mul dm_crypt sr_mod cdrom sd_mod crc_t10di
f sdhci_pci sdhci mmc_core firewire_ohci firewire_core crc_itu_t yenta_socket rsrc_nonstatic ahci usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mod [last unloaded: af_key]
Pid: 24887, comm: pm-suspend Tainted: G        W  ----------------  2.6.32-71.el6.x86_64 #1
Call Trace:
[<ffffffff8106b857>] warn_slowpath_common+0x87/0xc0
[<ffffffff8106b8aa>] warn_slowpath_null+0x1a/0x20
[<ffffffff810e9a62>] rb_reserve_next_event+0x2b2/0x350
[<ffffffff810e9cb2>] ring_buffer_lock_reserve+0xa2/0x160
[<ffffffff810ebbec>] trace_buffer_lock_reserve+0x2c/0x70
[<ffffffff810ebe36>] trace_current_buffer_lock_reserve+0x16/0x20
[<ffffffff810aa67b>] ftrace_raw_event_id_module_refcnt+0x6b/0x100
[<ffffffff813dbeea>] ? cpufreq_cpu_get+0x8a/0x160
[<ffffffff810aa728>] ftrace_raw_event_module_get+0x18/0x20
[<ffffffff813dbfa8>] cpufreq_cpu_get+0x148/0x160
[<ffffffff813dc02d>] cpufreq_resume+0x6d/0xc0
[<ffffffff8132c52e>] __sysdev_resume+0x4e/0xe0
[<ffffffff8132c649>] sysdev_resume+0x89/0x190
[<ffffffff810b2002>] suspend_devices_and_enter+0x162/0x1b0
[<ffffffff810b2172>] enter_state+0x122/0x190
[<ffffffff810b17da>] state_store+0x9a/0x100
[<ffffffff8125aaa7>] kobj_attr_store+0x17/0x20
[<ffffffff811df145>] sysfs_write_file+0xe5/0x170
[<ffffffff8116c818>] vfs_write+0xb8/0x1a0
[<ffffffff8116d251>] sys_write+0x51/0x90
[<ffffffff81013387>] tracesys+0xd9/0xde

reproduce
-----
1.  Disconnect power from laptop
2.  Allow laptop to go into sleep mode
3.  Reconnect laptop to power (in my case a docking station)
4.  Resume the laptop

Also it appears that this was reported

Comment 2 James M. Leddy 2010-12-30 18:21:15 UTC

Dave, was this report filed from ABRT? If so we have two bugs, ABRT did not report everything from the --[ please cut here ]-- line. 

This is regarding timestamps:

1981        if (unlikely(*delta > (1ULL << 59) && !once++)) {
1982                printk(KERN_WARNING "Delta way too big! %llu"
1983                       " ts=%llu write stamp = %llu\n",
1984                       (unsigned long long)*delta,
1985                       (unsigned long long)*ts,
1986                       (unsigned long long)cpu_buffer->write_stamp);
1987                WARN_ON(1);
1988        }
1989

As far as the ring buffer is concerned, we either need a way for the ring buffer to know that the computer was suspended recently, or we need to get rid of that warning entirely.

Comment 3 David Mair 2011-01-03 14:48:33 UTC

(In reply to comment #2)
> Dave, was this report filed from ABRT? If so we have two bugs, ABRT did not
> report everything from the --[ please cut here ]-- line. 
> 
> This is regarding timestamps:
> 
> 1981        if (unlikely(*delta > (1ULL << 59) && !once++)) {
> 1982                printk(KERN_WARNING "Delta way too big! %llu"
> 1983                       " ts=%llu write stamp = %llu\n",
> 1984                       (unsigned long long)*delta,
> 1985                       (unsigned long long)*ts,
> 1986                       (unsigned long long)cpu_buffer->write_stamp);
> 1987                WARN_ON(1);
> 1988        }
> 1989
> 
> As far as the ring buffer is concerned, we either need a way for the ring
> buffer to know that the computer was suspended recently, or we need to get rid
> of that warning entirely.

@James M. Leddy:

Yes, this was reported by ABRT -- I would not have even been aware there was a problem if ABRT had not brought it to my attention. I'd agree if ABRT is not including everything then we have two bugs here.

Comment 4 RHEL Program Management 2011-01-07 04:39:13 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 5 Jiri Olsa 2011-01-10 17:22:37 UTC

hi,

can you see/paste the actuall warning message?
"Delta way too big... "

the code seems to handle 2^59 ns, which should be big enough
to handle suspend time, so maybe the time weent crazy, and the
message would help..

thanks,
jirka

Comment 6 Jiri Olsa 2011-01-10 17:34:31 UTC

I could reproduce with following message,
please let me know if you see the same -> "write_stamp = 0"

"Delta way too big! 18446744046721091834 ts=18446744046721091834 write stamp = 0"

thanks,
jirka

Comment 7 Steven Rostedt 2011-01-10 18:26:06 UTC

Yes that 2^59 is nanoseconds: ~6671 days ~18 years (more than we support any version of RHEL ;-)

It gets the value from sched_clock, the probable cause is that time went backwards according to sched_clock. Perhaps I could write a patch that can avoid this on resume? Or just remove the warning, but it may cause a strange timestamp on whoever is reading from the buffer.

-- Steve

Comment 8 James M. Leddy 2011-01-10 18:35:01 UTC

(In reply to comment #7)
> Yes that 2^59 is nanoseconds: ~6671 days ~18 years (more than we support any
> version of RHEL ;-)
> 
> It gets the value from sched_clock, the probable cause is that time went
> backwards according to sched_clock. Perhaps I could write a patch that can
> avoid this on resume?

This makes the most sense to me.

> Or just remove the warning, but it may cause a strange
> timestamp on whoever is reading from the buffer.

I think we're confusing more suspend/resume users with this WARN_ON than potential ftrace users if we didn't have this warning.

Comment 9 Steven Rostedt 2011-01-10 18:46:22 UTC

But the warning only happens if we ever enabled tracing, correct?

I'll see if I can get a patch to fix this up.

Comment 10 Jiri Olsa 2011-01-11 13:25:00 UTC

(In reply to comment #9)
> But the warning only happens if we ever enabled tracing, correct?
> 
> I'll see if I can get a patch to fix this up.

true for my test.. 

AFAICS the reason is the write_stamp of the rb page is updated when the
 page is commited. When there's resume, the page stays with write_stamp
set to 0, and hance trigerring the warning on the wakeup.

jirka

Comment 11 Jiri Olsa 2011-01-11 17:03:09 UTC

> AFAICS the reason is the write_stamp of the rb page is updated when the
>  page is commited. When there's resume, the page stays with write_stamp
> set to 0, and hance trigerring the warning on the wakeup.

ok, this is wrong.. ;)

Comment 12 Steven Rostedt 2011-01-11 17:29:11 UTC

(In reply to comment #11)
> > AFAICS the reason is the write_stamp of the rb page is updated when the
> >  page is commited. When there's resume, the page stays with write_stamp
> > set to 0, and hance trigerring the warning on the wakeup.
> 
> ok, this is wrong.. ;)

Right, a write to the ring buffer to a commit happens with preemption disabled, so it is impossible for a write to start and then we suspend, and then resume and then finish the commit.

And the timestamp written at commit is the per cpu ring buffer timestamp, not the rb page timestamp, which is updated when a writer enters the page.

Comment 13 Jiri Olsa 2011-01-11 17:44:11 UTC

looks like workaround is to use global trace_clock:
echo global > /debug/tracing/trace_clock

the local trace clod returns bad time after the resume

Comment 14 Jiri Olsa 2011-01-11 20:38:07 UTC

Created attachment 472893 [details]
deny the usage of local clock if it is unstable

AFAICS once the sched_clock_stable is set, I think we should not depend on it
the way we do: using it as a timestamp in the ring buffer

the patch is based on the above assumption and sets the global clock,
when the unstable local clock is detected

anyway, I'm not sure if there're better options in such case.. it probably
has performance impact

maybe we can set the global clock when we got notified of suspend? ;)

Comment 15 Steven Rostedt 2011-01-12 19:43:49 UTC

Talking with Peter Zijlstra, there exists some boxes that have sched_clock() go backwards on resume, which will cause this error message. But because this error message only happens if we are tracing across a suspend/resume on buggy hardware, does it really need to be fixed? Note, Peter said sched_clock() doesn't do this on newer kernels. Perhaps back porting that change may fix this.

The user will only see it if they initiated a trace and then did s2r. Normal users that do not run ftrace will not see this error doing s2r.

Comment 16 Steven Rostedt 2011-01-12 20:02:26 UTC

Jiri, yes it has performance impact, I don't want to had that if we don't need it. I'm about to attach two patches, could you test again with them, and see if it fixes the issue.

Thanks,

-- Steve

Comment 17 Steven Rostedt 2011-01-12 20:03:27 UTC

Created attachment 473123 [details]
fix sched clock on resume

this is taken from upstream.

Comment 18 Steven Rostedt 2011-01-12 20:04:24 UTC

Created attachment 473124 [details]
fix bug introduced by previous patch

The previous patch introduced a preemption leak. This fixes it.

Comment 19 Jiri Olsa 2011-01-13 09:47:26 UTC

(In reply to comment #16)
> Jiri, yes it has performance impact, I don't want to had that if we don't need
> it. I'm about to attach two patches, could you test again with them, and see if
> it fixes the issue.

sure, I'll test it during Friday and let you know

Comment 20 Jiri Olsa 2011-01-14 13:54:42 UTC

hi,

I tested on current tip tree, which already has both patches above
and still fails..


as for RHEL6, I can see following commit already including both
patches from above:

[x86] tsc, sched: Recompute cyc2ns_offset's during resume from sleep states
commit ab7ccb103a196f81ad9f1f80575a02edbffd15b5
Author: Matthew Garrett <mjg>
Date:   Wed Sep 29 16:57:27 2010 -0400


I tried on RHEL6 and it has the same behaviour
warning from the latest RHEL6:

Restarting tasks ...
Delta way too big! 18446744035752685168 ts=18446744044477216573 write stamp = 8724531405
done.
------------[ cut here ]------------
WARNING: at kernel/trace/ring_buffer.c:1987 rb_reserve_next_event+0x236/0x320() (Not tainted)
Hardware name: Montevina platform
Modules linked in:
Pid: 361, comm: udevd Not tainted 2.6.32kp #100
Call Trace:
 [<ffffffff810427e7>] warn_slowpath_common+0x87/0xc0
 [<ffffffff8104283a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff810a02d6>] rb_reserve_next_event+0x236/0x320
 [<ffffffff810a0572>] ring_buffer_lock_reserve+0xa2/0x160
 [<ffffffff810a23ac>] trace_buffer_lock_reserve+0x2c/0x70
 [<ffffffff810a2546>] trace_current_buffer_lock_reserve+0x16/0x20
 [<ffffffff8100cece>] ftrace_raw_event_sys_exit+0x4e/0xc0
 [<ffffffff8100e033>] syscall_trace_leave+0x93/0xc0
 [<ffffffff810032e0>] int_check_syscall_exit_work+0x34/0x3d
---[ end trace bd05bfed46c2dc35 ]---


wbr,
jirka

Comment 21 KernelOops Bot 2011-01-14 13:59:07 UTC

 with this guiltyfunc:  bug 620026

Comment 23 Jiri Olsa 2011-01-17 09:54:45 UTC

(In reply to comment #15)
> Talking with Peter Zijlstra, there exists some boxes that have sched_clock() go
> backwards on resume, which will cause this error message. But because this
> error message only happens if we are tracing across a suspend/resume on buggy
> hardware, does it really need to be fixed? Note, Peter said sched_clock()
> doesn't do this on newer kernels. Perhaps back porting that change may fix
> this.
> 
> The user will only see it if they initiated a trace and then did s2r. Normal
> users that do not run ftrace will not see this error doing s2r.

I agree, if there's no other patch/fix for this behaviour, I think we could
leave this one not fixed with the workaround (setting clock to global).

thoughts?

Comment 24 Jiri Olsa 2011-01-25 16:36:16 UTC

ping..?

Comment 25 James M. Leddy 2011-01-25 17:03:46 UTC

While we're on the subject, we might include the message in the WARN_ON, so that ABRT gets the entire message from ---[ please cut here ]--- 

see drivers/pci/dmar.c:594 in bug 672526

Comment 26 Jiri Olsa 2011-01-26 12:29:37 UTC

Created attachment 475383 [details]
warn on unstable trace clock during the suspend

steve, would something like this be acceptable?

thanks,
jirka

Comment 27 Steven Rostedt 2011-01-26 14:02:59 UTC

Jirka,

What about instead changing the warning in the ring buffer to specify that. Something like:

 WARN_ONCE(delta > (1ULL << 59),
	  "Delta way too big! %llu ts=%llu write stamp = %llu\n"
          " If you just came from a suspend/resume, please switch\n"
          " to the trace global clock:\n"
          "   echo global > /sys/kernel/debug/tracing/trace_clock\n",
	  (unsigned long long)delta,
	  (unsigned long long)ts,
	  (unsigned long long)cpu_buffer->write_stamp);

-- Steve

Comment 28 Jiri Olsa 2011-01-26 15:19:58 UTC

(In reply to comment #27)
> Jirka,
> 
> What about instead changing the warning in the ring buffer to specify that.
> Something like:
> 
>  WARN_ONCE(delta > (1ULL << 59),
>    "Delta way too big! %llu ts=%llu write stamp = %llu\n"
>           " If you just came from a suspend/resume, please switch\n"
>           " to the trace global clock:\n"
>           "   echo global > /sys/kernel/debug/tracing/trace_clock\n",
>    (unsigned long long)delta,
>    (unsigned long long)ts,
>    (unsigned long long)cpu_buffer->write_stamp);
> 
> -- Steve

fine with me.. please let me know if you need me to send it upstream

thanks,
jirka

Comment 29 RHEL Program Management 2011-02-01 06:15:22 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 30 RHEL Program Management 2011-02-01 19:03:08 UTC

This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 31 Jiri Olsa 2011-02-02 13:02:01 UTC

sent upstream https://lkml.org/lkml/2011/2/2/128

Comment 32 Steven Rostedt 2011-02-17 15:23:18 UTC

Peter Zijlstra just informed me that the sched_clock() going backwards was fixed by:

cd7240c0b900eb6d690ccee088a6c9b46dae815a
x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states

Can we see if this prevents the warning to appear?

If it does, then we do not need to add the current patch.

-- Steve

Comment 33 Jiri Olsa 2011-02-17 16:43:52 UTC

(In reply to comment #32)
> Peter Zijlstra just informed me that the sched_clock() going backwards was
> fixed by:
> 
> cd7240c0b900eb6d690ccee088a6c9b46dae815a
> x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states
> 
> Can we see if this prevents the warning to appear?
> 
> If it does, then we do not need to add the current patch.
> 
> -- Steve

as for my test machine, I got the warning again:
(I used current tip tree)


[  103.126387] ------------[ cut here ]------------
[  103.126394] WARNING: at kernel/trace/ring_buffer.c:2173 rb_reserve_next_event+0x450/0x5d0()
[  103.126396] Hardware name: Montevina platform
[  103.126398] Delta way too big! 18446744035988710276 ts=18446744046271032527 write stamp = 10282322251
[  103.126399] If you just came from a suspend/resume,
[  103.126400] please switch to the trace global clock:
[  103.126400]   echo global > /sys/kernel/debug/tracing/trace_clock
[  103.126402] Modules linked in:
[  103.126404] Pid: 1, comm: init Not tainted 2.6.38-rc5-tip+ #460
[  103.126406] Call Trace:
[  103.126412]  [<ffffffff810800df>] ? warn_slowpath_common+0x7f/0xc0
[  103.126414]  [<ffffffff810801d6>] ? warn_slowpath_fmt+0x46/0x50
[  103.126418]  [<ffffffff81046d65>] ? native_sched_clock+0x25/0xa0
[  103.126421]  [<ffffffff810e1e40>] ? rb_reserve_next_event+0x450/0x5d0
[  103.126424]  [<ffffffff810e2136>] ? ring_buffer_lock_reserve+0xb6/0x180
[  103.126426]  [<ffffffff810e662c>] ? trace_buffer_lock_reserve+0x2c/0x70
[  103.126429]  [<ffffffff810e7156>] ? trace_current_buffer_lock_reserve+0x16/0x20
[  103.126432]  [<ffffffff8104a9c1>] ? ftrace_raw_event_sys_enter+0x51/0x130
[  103.126434]  [<ffffffff8104ae71>] ? syscall_trace_enter+0x101/0x1c0
[  103.126438]  [<ffffffff8103f24f>] ? tracesys+0x71/0xd5
[  103.126439] ---[ end trace cd128abebe4b8374 ]---

Comment 34 KernelOops Bot 2011-02-17 16:44:13 UTC

 with this guiltyfunc:  bug 527824 bug 528295 bug 528296 bug 528521 bug 528768 bug 530380 bug 536855 bug 536985 bug 542694 bug 562008

Comment 37 Jiri Olsa 2011-02-21 12:03:29 UTC

version 2 sent upstream - https://lkml.org/lkml/2011/2/18/172

Comment 40 Aristeu Rozanski 2011-04-07 13:50:27 UTC

Patch(es) available on kernel-2.6.32-130.el6

Comment 45 errata-xmlrpc 2011-05-19 12:30:57 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html