Messages log says: perf samples too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 3.11.1-200.fc19.x86_64 I expect kernel.perf_event_max_sample_rate to be 100 000
I do not know if this is related, but I did a BTRFS scrub (heavy disk I/O) while the error occurred.
Same thing with the latest kernel 3.11.2-201.fc19.x86_64
And with 3.12.0-0.rc6.git4.2.fc21.x86_64 ...
I got this message on 3.11.3-201.fc19.x86_64: Oct 30 03:22:32 ti15 kernel: [ID kern.warning] [56333.143955] perf samples too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
I got it on 3.11.7-300.fc20.x86_64
And I got it on 3.11.8-200.fc19.x86_64.
This change came via 3.11: commit 14c63f17b1fde5a575a28e96547a22b451c71fb5 Author: Dave Hansen <dave.hansen.com> Date: Fri Jun 21 08:51:36 2013 -0700 perf: Drop sample rate when sampling is too slow This patch keeps track of how long perf's NMI handler is taking, and also calculates how many samples perf can take a second. If the sample length times the expected max number of samples exceeds a configurable threshold, it drops the sample rate. This way, we don't have a runaway sampling process eating up the CPU. This patch can tend to drop the sample rate down to level where perf doesn't work very well. *BUT* the alternative is that my system hangs because it spends all of its time handling NMIs. I'll take a busted performance tool over an entire system that's busted and undebuggable any day. BTW, my suspicion is that there's still an underlying bug here. Using the HPET instead of the TSC is definitely a contributing factor, but I suspect there are some other things going on. But, I can't go dig down on a bug like that with my machine hanging all the time. The warnings are harmless. From Documentation/sysctl/kernel.txt perf_cpu_time_max_percent: Hints to the kernel how much CPU time it should be allowed to use to handle perf sampling events. If the perf subsystem is informed that its samples are exceeding this limit, it will drop its sampling frequency to attempt to reduce its CPU usage. Some perf sampling happens in NMIs. If these samples unexpectedly take too long to execute, the NMIs can become stacked up next to each other so much that nothing else is allowed to execute. 0: disable the mechanism. Do not monitor or correct perf's sampling rate no matter how CPU time it takes. 1-100: attempt to throttle perf's sample rate to this percentage of CPU. Note: the kernel calculates an "expected" length of each sample event. 100 here means 100% of that expected length. Even if this is set to 100, you may still see sample throttling if this length is exceeded. Set to 0 if you truly do not care how much CPU is consumed.
I'd say this one can be closed, unless anyone thinks otherwise. Josh, what's your take on this one? Ok to close?
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.12.6-200.fc19. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20. If you experience different issues, please open a new bug report for those.
I am going ahead an closing this one as it is pretty much by design and can be simply disabled.
Please repopen. I am seeing this in connection with bug 1048605. I tried adding a fixed-address ethernet port at 10.0.0.1 and starting "ncat -u -l -v -p 6666" and opening the firewall to UDP port 6666 on another ethernet-connected machine, and adding netconsole=6665.0.2/p4p1,6666.1 to the first line on the Fedora-Jam version mentioned in that bug. How can the bug be "simply diabled"?
(In reply to Peter H. Jones from comment #11) > How can the bug be "simply diabled"? Just have a look at the patch: https://lkml.org/lkml/2013/5/29/640 > perf_cpu_time_max_percent: > [...] > 0: disable the mechanism. Do not monitor or correct perf's > sampling rate no matter how CPU time it takes. --> echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
This just means you opted out of monitoring and correcting. Now you really get it. Disabling is something else.