Bug 920289
Summary: | Regular hard freezes with 3.9 kernels (intel_pstate_driver) | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 19 | CC: | dirk.brandewie, gansalmon, itamar, jeder, jonathan, kernel-maint, madhu.chinakonda, parag.lkml, perfbz | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 952244 (view as bug list) | Environment: | |||||
Last Closed: | 2013-10-08 17:31:47 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 952244 | ||||||
Attachments: |
|
Description
Adam Williamson
2013-03-11 17:31:24 UTC
Can you add "pause_on_oops=60" to your command line and try to capture the top of the stacktrace? added, running, of course it hasn't crashed since then... there may be a better trace of this in bug 923102. Can you attach your kernel config I will try to reproduce. I don't have a system that can use the nouveau driver though. I still haven't seen this again. I think it got fixed at rc2 or rc3, for me. the trace in 923102 is from rc3, so this clearly isn't fixed. lets just dupe this bug there, because that at least seems to be debuggable. *** This bug has been marked as a duplicate of bug 923102 *** re-opening, as it's not the same bug. Adam, 3.9.0-0.rc3.git1.2 has a patch that should reduce the spew a little so hopefully we can see the top of the trace. OK, I'll grab that and try not to do anything very important while I wait for it to explode =) In case it helps, from yum history (how did we ever live without that?) it looks like I ran the nodebug build of kernel-3.9.0-0.rc2.git0.3.fc19.x86_64 for about a week, between 2013-03-12 and 2013-03-19, which was the time I didn't see the bug. On 2013-03-19 I updated to kernel-3.9.0-0.rc3.git0.5.fc20.x86_64 (also from nodebug), and it started happening again. I've been running 3.9.0-0.rc3.git1.3.fc19.x86_64 for over a day now, and have not hit the bug. I wonder if the changes to kernel config for Rawhide post-f19 affect this for me somehow? The bug was never easily reproducible. I have been running mainline 3.9.x (almost daily recompiles) and the first time this happened to me was soon after I remember enabling the PSTATE timer config option. Then after that it happened twice followed by a lull. Then with yesterday's git pull it surfaced again. So I am pretty sure it is definitely there at least in mainline - just takes time to pop up. I posted some initial analysis and latest oops photo yesterday - http://marc.info/?l=linux-kernel&m=136443537224510&w=2 This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19 I was finally able to see this happen and caputure debug info. Here is the patch to keep the driver from setting up a race with itself I will get this queued for the next RC commit f404a661b000499b002919ffda43c8cb8c5d614d Author: Dirk Brandewie <dirk.brandewie> Date: Thu Apr 4 08:55:29 2013 -0700 cpufreq/intel_pstate: Set timer timeout correctly The current calculation of the delay time is wrong and a cut and paste error from a previous experimental driver. This can result in the timeout being set to jiffies + 1 which setup the driver to race with it's self if the apic timer interrupt happen at just the right time. https://bugzilla.redhat.com/show_bug.cgi?id=920289 Reported-by: Adam Williamson <awilliam> Reported-by: Parag Warudkar <parag.lkml> Signed-off-by: Dirk Brandewie <dirk.brandewie> --- drivers/cpufreq/intel_pstate.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 43ffe1c..4d6b988 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -502,7 +502,6 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu) sample_time = cpu->pstate_policy->sample_rate_ms; delay = msecs_to_jiffies(sample_time); - delay -= jiffies % delay; mod_timer_pinned(&cpu->timer, jiffies + delay); } Thanks for that, sorry I wasn't able to provide better debugging info. I just booted 3.9.0-0.rc6.git0.1.fc19.x86_64 and it hard froze in less than 30 minutes. Didn't show the trace though :/ I'll see if I can catch it and see if it's different now. *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.11.1-200.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those. This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously. I haven't been in the same country as the system lately, so I couldn't check this...I'll re-open it if it comes up again when I'm home. |