| Summary: | perf samples too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Jan Welker <jan> |
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
| Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 19 | CC: | alexey.brodkin, aschorr, bugzilla-redhat, collura, davidbranquinho, gansalmon, horst, igeorgex, itamar, jonathan, jones.peter.busi, js, kernel-maint, lampe, madhu.chinakonda, marcelo.barbosa, michele, michele, spetreolle |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-01-04 14:36:35 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Jan Welker
2013-09-30 15:57:08 UTC
I do not know if this is related, but I did a BTRFS scrub (heavy disk I/O) while the error occurred. Same thing with the latest kernel 3.11.2-201.fc19.x86_64 And with 3.12.0-0.rc6.git4.2.fc21.x86_64 ... I got this message on 3.11.3-201.fc19.x86_64: Oct 30 03:22:32 ti15 kernel: [ID kern.warning] [56333.143955] perf samples too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 I got it on 3.11.7-300.fc20.x86_64 And I got it on 3.11.8-200.fc19.x86_64. This change came via 3.11:
commit 14c63f17b1fde5a575a28e96547a22b451c71fb5
Author: Dave Hansen <dave.hansen.com>
Date: Fri Jun 21 08:51:36 2013 -0700
perf: Drop sample rate when sampling is too slow
This patch keeps track of how long perf's NMI handler is taking,
and also calculates how many samples perf can take a second. If
the sample length times the expected max number of samples
exceeds a configurable threshold, it drops the sample rate.
This way, we don't have a runaway sampling process eating up the
CPU.
This patch can tend to drop the sample rate down to level where
perf doesn't work very well. *BUT* the alternative is that my
system hangs because it spends all of its time handling NMIs.
I'll take a busted performance tool over an entire system that's
busted and undebuggable any day.
BTW, my suspicion is that there's still an underlying bug here.
Using the HPET instead of the TSC is definitely a contributing
factor, but I suspect there are some other things going on.
But, I can't go dig down on a bug like that with my machine
hanging all the time.
The warnings are harmless. From Documentation/sysctl/kernel.txt
perf_cpu_time_max_percent:
Hints to the kernel how much CPU time it should be allowed to
use to handle perf sampling events. If the perf subsystem
is informed that its samples are exceeding this limit, it
will drop its sampling frequency to attempt to reduce its CPU
usage.
Some perf sampling happens in NMIs. If these samples
unexpectedly take too long to execute, the NMIs can become
stacked up next to each other so much that nothing else is
allowed to execute.
0: disable the mechanism. Do not monitor or correct perf's
sampling rate no matter how CPU time it takes.
1-100: attempt to throttle perf's sample rate to this
percentage of CPU. Note: the kernel calculates an
"expected" length of each sample event. 100 here means
100% of that expected length. Even if this is set to
100, you may still see sample throttling if this
length is exceeded. Set to 0 if you truly do not care
how much CPU is consumed.
I'd say this one can be closed, unless anyone thinks otherwise. Josh, what's your take on this one? Ok to close? *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.12.6-200.fc19. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20. If you experience different issues, please open a new bug report for those. I am going ahead an closing this one as it is pretty much by design and can be simply disabled. Please repopen. I am seeing this in connection with bug 1048605. I tried adding a fixed-address ethernet port at 10.0.0.1 and starting "ncat -u -l -v -p 6666" and opening the firewall to UDP port 6666 on another ethernet-connected machine, and adding netconsole=6665.0.2/p4p1,6666.1 to the first line on the Fedora-Jam version mentioned in that bug. How can the bug be "simply diabled"? (In reply to Peter H. Jones from comment #11) > How can the bug be "simply diabled"? Just have a look at the patch: https://lkml.org/lkml/2013/5/29/640 > perf_cpu_time_max_percent: > [...] > 0: disable the mechanism. Do not monitor or correct perf's > sampling rate no matter how CPU time it takes. --> echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent This just means you opted out of monitoring and correcting. Now you really get it. Disabling is something else. |