Bug 714478 - CPU lockup during boot
Summary: CPU lockup during boot
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On:
Blocks: F16Alpha, F16AlphaBlocker
TreeView+ depends on / blocked
 
Reported: 2011-06-19 14:49 UTC by Bruno Wolff III
Modified: 2011-07-22 22:04 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-22 22:04:19 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Picture of traceback (1.09 MB, image/jpeg)
2011-06-19 14:49 UTC, Bruno Wolff III
no flags Details
/proc/cpuinfo (1.01 KB, text/plain)
2011-06-19 14:50 UTC, Bruno Wolff III
no flags Details
lspci -vvv output (12.41 KB, text/plain)
2011-06-19 14:51 UTC, Bruno Wolff III
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 37872 0 None None None Never

Description Bruno Wolff III 2011-06-19 14:49:13 UTC
Created attachment 505469 [details]
Picture of traceback

Description of problem:
Moderately late in the boot process my i686 machines lockup with 3.0 kernels. (2.6.39 kernels work OK with respect to this issue.)

Version-Release number of selected component (if applicable):
kernel-PAE-3.0-0.rc3.git5.1.fc16.i686

How reproducible:
Seems to happen most boots.

Steps to Reproduce:
1. Reboot
2.
3.
  
Actual results:
Machine locks up with a backtrace.

Expected results:


Additional info:

Comment 1 Bruno Wolff III 2011-06-19 14:50:26 UTC
Created attachment 505470 [details]
/proc/cpuinfo

Comment 2 Bruno Wolff III 2011-06-19 14:51:51 UTC
Created attachment 505471 [details]
lspci -vvv output

Comment 3 Bruno Wolff III 2011-06-19 15:03:12 UTC
I have also filed a kernel.org bug for this issue:
https://bugzilla.kernel.org/show_bug.cgi?id=37872

Comment 4 Bruno Wolff III 2011-06-29 11:48:33 UTC
This is still happening with kernel-PAE-3.0-0.rc5.git0.1.fc16.i686.

Comment 5 Bruno Wolff III 2011-07-11 13:36:22 UTC
I am still seeing this with kernel-PAE-3.0-0.rc6.git6.1.fc16.i686.

I am proposing as an alpha blocker, but it may be that the set of hardware affected is small.

Comment 6 Bruno Wolff III 2011-07-11 14:45:07 UTC
There is a suggested patch to try in the kernel bugtracker.

Comment 7 Bruno Wolff III 2011-07-11 14:51:04 UTC
Here is the patch that might fix things:
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -7750,6 +7750,9 @@ static void init_cfs_rq(struct cfs_rq *c
 #endif
 #endif
        cfs_rq->min_vruntime = (u64)(-(1LL << 20));
+#ifndef CONFIG_64BIT
+	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
+#endif
 }

 static void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)

Comment 8 Adam Williamson 2011-07-11 15:10:39 UTC
bruno: are you set up to test the patch yourself, or would it help for someone to build a patched kernel for you to try?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 9 Bruno Wolff III 2011-07-11 16:01:14 UTC
I think I can do it myself. I don't do a lot of kernel building, but I have successfully done it in the past. I seem to have the build started, but it will take a while to finish. Since I can't test it until I get home from work anyway, taking a while to run should be OK.

Comment 10 Adam Williamson 2011-07-11 18:55:20 UTC
just in case, the Short Idiot's Guide To Kernel Patch Testing, Written By A Short Idiot:

fedpkg co kernel
cd kernel
cp /path/to/patch.patch .
nano kernel.spec
(bump baserelease: I usually add 0.1, so it is 'newer' than the current official build but will be superseded by the next official build)
(add the patch at the end of the big list of patches, probably around Patch12000 - e.g.:)
Patch12000: patch.patch
(find the big list of ApplyPatch statements and add one to the bottom:)
ApplyPatch patch.patch
(optionally, add a bit to the changelog)
(save and exit)
fedpkg srpm
mock -r fedora-rawhide-x86_64 /path/to/kernel.src.rpm

go get dinner and wait =)

Comment 11 Bruno Wolff III 2011-07-11 19:03:47 UTC
I got it started a couple of hours ago. The ApplyPatch stuff did throw me off a bit. I ended up doing it more with fedpkg, like I would for a regular package. I just did a local build, not a mock build. It's still running, so that's good. My memory from the last time is that it took about 12 hours to do a build. So I am hoping it's done before I need to sleep.

Comment 12 Bruno Wolff III 2011-07-11 23:21:28 UTC
I applied the patch to the -rc6.git6 kernel and I was able to boot both machines that had been locking up into graphical desktops.

Comment 13 Bruno Wolff III 2011-07-15 12:31:06 UTC
The fix has been posted to lkml, but is not yet in Linus' tree.

Comment 14 Adam Williamson 2011-07-15 17:23:40 UTC
Discussed at 2011-07-15 blocker review meeting. Given the severity of the impact, and Paul McKenney's suggestion that several users are affected by this - http://lkml.org/lkml/2011/7/12/298 - accepted as an Alpha blocker, under criterion "The installer must boot (if appropriate) and run on all primary architectures from default live image, DVD, and boot.iso install media" (or "In most cases (see Blocker_Bug_FAQ), a system installed according to any of the above criteria (or the appropriate Beta or Final criteria, when applying this criterion to those releases) must boot to the 'firstboot' utility on the first boot after installation, without unintended user intervention. This includes correctly accessing any encrypted partitions when the correct passphrase is supplied. The firstboot utility must be able to create a working user account", both subsume the idea that the system must boot).

Comment 15 Bruno Wolff III 2011-07-15 18:19:50 UTC
The fix is now in Linus' tree.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c64be78ffb415278d7d32d6f55de95c73dcc19a4

So it should show up in the next rawhide kernel update that uses an upstream update.

Comment 16 Bruno Wolff III 2011-07-15 21:51:14 UTC
Note that kernel-3.0-0.rc7.git1.1.fc16 is using a kernel tree from a copule of days ago and doesn't have the fix in it. No need to waste time testing the fix with that kernel.

Comment 17 Bruno Wolff III 2011-07-16 01:32:10 UTC
Somehow it didn't manage to get into kernel-3.0-0.rc7.git3.1.fc16 either. I double checked patch-3.0-rc7-git3.bz2 and it wasn't in there.

Comment 18 Josh Stone 2011-07-19 03:59:57 UTC
It's in the next snapshot:

$ git describe --contains c64be78ffb415278d7d32d6f55de95c73dcc19a4
v3.0-rc7-git4~4

I built a local kernel with rc7-git6, and was finally able to boot my i686 VM.

Comment 19 Bruno Wolff III 2011-07-22 05:27:20 UTC
I tried out 3.0-0.rc7.git10.1.fc16 on one of the two machines I saw the problem on and things are working. I hope to test the other machine in the morning.

Comment 20 Bruno Wolff III 2011-07-22 11:06:10 UTC
I tested the second system and things are now working there as well.

Comment 21 Bruno Wolff III 2011-07-22 13:50:11 UTC
kernel-3.0-0.rc7.git10.1.fc16 is in rawhide this morning. I think this can probably be closed now.

Comment 22 Tim Flink 2011-07-22 22:04:19 UTC
As the latest kernel has been tested and verified to fix this issue, I am closing the bug.


Note You need to log in before you can comment on or make changes to this bug.