Red Hat Bugzilla – Bug 920289
Regular hard freezes with 3.9 kernels (intel_pstate_driver)
Last modified: 2013-10-08 15:15:52 EDT
Created attachment 708543 [details]
photo of the trace
Running 3.9.0-0.rc1.git0.5.1.fc19.x86_64 - which is a scratch build from the Rawhide spec as of 3.9.0-0.rc1.git0.5.fc19 with http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?id=50db54340c0412752577001b5ab3b54e6f3b9383 added - I've been seeing the system hard freeze every so often (once or twice a day). Usually it just completely freezes at the desktop, nothing in the system logs after a reboot and no ssh access possible, so I can't get much useful info. But yesterday I saw a kernel oops. Not 100% sure it's the same bug - it could be some _other_ one - but either way, it's obviously worth reporting. I took a picture of the trace: I'll attach it.
System is a self-built one, i7-2600k CPU on a P8P67 Deluxe, Nouveau 9600 GT graphics, I can provide lspci or whatever if the hardware's important.
Can you add "pause_on_oops=60" to your command line and try to capture the top of the stacktrace?
added, running, of course it hasn't crashed since then...
there may be a better trace of this in bug 923102.
Can you attach your kernel config I will try to reproduce. I don't have a system that can use the nouveau driver though.
I still haven't seen this again. I think it got fixed at rc2 or rc3, for me.
the trace in 923102 is from rc3, so this clearly isn't fixed.
lets just dupe this bug there, because that at least seems to be debuggable.
*** This bug has been marked as a duplicate of bug 923102 ***
re-opening, as it's not the same bug.
Adam, 3.9.0-0.rc3.git1.2 has a patch that should reduce the spew a little so hopefully we can see the top of the trace.
OK, I'll grab that and try not to do anything very important while I wait for it to explode =)
In case it helps, from yum history (how did we ever live without that?) it looks like I ran the nodebug build of kernel-3.9.0-0.rc2.git0.3.fc19.x86_64 for about a week, between 2013-03-12 and 2013-03-19, which was the time I didn't see the bug. On 2013-03-19 I updated to kernel-3.9.0-0.rc3.git0.5.fc20.x86_64 (also from nodebug), and it started happening again.
I've been running 3.9.0-0.rc3.git1.3.fc19.x86_64 for over a day now, and have not hit the bug. I wonder if the changes to kernel config for Rawhide post-f19 affect this for me somehow?
The bug was never easily reproducible. I have been running mainline 3.9.x (almost daily recompiles) and the first time this happened to me was soon after I remember enabling the PSTATE timer config option. Then after that it happened twice followed by a lull. Then with yesterday's git pull it surfaced again. So I am pretty sure it is definitely there at least in mainline - just takes time to pop up.
I posted some initial analysis and latest oops photo yesterday - http://marc.info/?l=linux-kernel&m=136443537224510&w=2
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.
(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)
More information and reason for this action is here:
I was finally able to see this happen and caputure debug info. Here is the patch to keep the driver from setting up a race with itself
I will get this queued for the next RC
Author: Dirk Brandewie <email@example.com>
Date: Thu Apr 4 08:55:29 2013 -0700
cpufreq/intel_pstate: Set timer timeout correctly
The current calculation of the delay time is wrong and a cut and paste
error from a previous experimental driver. This can result in the
timeout being set to jiffies + 1 which setup the driver to race with
it's self if the apic timer interrupt happen at just the right time.
Reported-by: Adam Williamson <firstname.lastname@example.org>
Reported-by: Parag Warudkar <email@example.com>
Signed-off-by: Dirk Brandewie <firstname.lastname@example.org>
drivers/cpufreq/intel_pstate.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 43ffe1c..4d6b988 100644
@@ -502,7 +502,6 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
sample_time = cpu->pstate_policy->sample_rate_ms;
delay = msecs_to_jiffies(sample_time);
- delay -= jiffies % delay;
mod_timer_pinned(&cpu->timer, jiffies + delay);
Thanks for that, sorry I wasn't able to provide better debugging info.
I just booted 3.9.0-0.rc6.git0.1.fc19.x86_64 and it hard froze in less than 30 minutes. Didn't show the trace though :/ I'll see if I can catch it and see if it's different now.
*********** MASS BUG UPDATE **************
We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.
Fedora 19 has now been rebased to 3.11.1-200.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.
If you experience different issues, please open a new bug report for those.
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.
I haven't been in the same country as the system lately, so I couldn't check this...I'll re-open it if it comes up again when I'm home.