Summary: | CPU lockup during boot | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bruno Wolff III <bruno> | ||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | rawhide | CC: | aquini, awilliam, bruno, gansalmon, itamar, jistone, jonathan, kernel-maint, loganjerry, madhu.chinakonda, redhat, tflink | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | AcceptedBlocker | ||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2011-07-22 22:04:19 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 713560 | ||||||||||
Attachments: |
|
Created attachment 505470 [details]
/proc/cpuinfo
Created attachment 505471 [details]
lspci -vvv output
I have also filed a kernel.org bug for this issue: https://bugzilla.kernel.org/show_bug.cgi?id=37872 This is still happening with kernel-PAE-3.0-0.rc5.git0.1.fc16.i686. I am still seeing this with kernel-PAE-3.0-0.rc6.git6.1.fc16.i686. I am proposing as an alpha blocker, but it may be that the set of hardware affected is small. There is a suggested patch to try in the kernel bugtracker. Here is the patch that might fix things: --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -7750,6 +7750,9 @@ static void init_cfs_rq(struct cfs_rq *c #endif #endif cfs_rq->min_vruntime = (u64)(-(1LL << 20)); +#ifndef CONFIG_64BIT + cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime; +#endif } static void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq) bruno: are you set up to test the patch yourself, or would it help for someone to build a patched kernel for you to try? -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers I think I can do it myself. I don't do a lot of kernel building, but I have successfully done it in the past. I seem to have the build started, but it will take a while to finish. Since I can't test it until I get home from work anyway, taking a while to run should be OK. just in case, the Short Idiot's Guide To Kernel Patch Testing, Written By A Short Idiot: fedpkg co kernel cd kernel cp /path/to/patch.patch . nano kernel.spec (bump baserelease: I usually add 0.1, so it is 'newer' than the current official build but will be superseded by the next official build) (add the patch at the end of the big list of patches, probably around Patch12000 - e.g.:) Patch12000: patch.patch (find the big list of ApplyPatch statements and add one to the bottom:) ApplyPatch patch.patch (optionally, add a bit to the changelog) (save and exit) fedpkg srpm mock -r fedora-rawhide-x86_64 /path/to/kernel.src.rpm go get dinner and wait =) I got it started a couple of hours ago. The ApplyPatch stuff did throw me off a bit. I ended up doing it more with fedpkg, like I would for a regular package. I just did a local build, not a mock build. It's still running, so that's good. My memory from the last time is that it took about 12 hours to do a build. So I am hoping it's done before I need to sleep. I applied the patch to the -rc6.git6 kernel and I was able to boot both machines that had been locking up into graphical desktops. The fix has been posted to lkml, but is not yet in Linus' tree. Discussed at 2011-07-15 blocker review meeting. Given the severity of the impact, and Paul McKenney's suggestion that several users are affected by this - http://lkml.org/lkml/2011/7/12/298 - accepted as an Alpha blocker, under criterion "The installer must boot (if appropriate) and run on all primary architectures from default live image, DVD, and boot.iso install media" (or "In most cases (see Blocker_Bug_FAQ), a system installed according to any of the above criteria (or the appropriate Beta or Final criteria, when applying this criterion to those releases) must boot to the 'firstboot' utility on the first boot after installation, without unintended user intervention. This includes correctly accessing any encrypted partitions when the correct passphrase is supplied. The firstboot utility must be able to create a working user account", both subsume the idea that the system must boot). The fix is now in Linus' tree. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c64be78ffb415278d7d32d6f55de95c73dcc19a4 So it should show up in the next rawhide kernel update that uses an upstream update. Note that kernel-3.0-0.rc7.git1.1.fc16 is using a kernel tree from a copule of days ago and doesn't have the fix in it. No need to waste time testing the fix with that kernel. Somehow it didn't manage to get into kernel-3.0-0.rc7.git3.1.fc16 either. I double checked patch-3.0-rc7-git3.bz2 and it wasn't in there. It's in the next snapshot: $ git describe --contains c64be78ffb415278d7d32d6f55de95c73dcc19a4 v3.0-rc7-git4~4 I built a local kernel with rc7-git6, and was finally able to boot my i686 VM. I tried out 3.0-0.rc7.git10.1.fc16 on one of the two machines I saw the problem on and things are working. I hope to test the other machine in the morning. I tested the second system and things are now working there as well. kernel-3.0-0.rc7.git10.1.fc16 is in rawhide this morning. I think this can probably be closed now. As the latest kernel has been tested and verified to fix this issue, I am closing the bug. |
Created attachment 505469 [details] Picture of traceback Description of problem: Moderately late in the boot process my i686 machines lockup with 3.0 kernels. (2.6.39 kernels work OK with respect to this issue.) Version-Release number of selected component (if applicable): kernel-PAE-3.0-0.rc3.git5.1.fc16.i686 How reproducible: Seems to happen most boots. Steps to Reproduce: 1. Reboot 2. 3. Actual results: Machine locks up with a backtrace. Expected results: Additional info: