Bug 438960 - G965 chipset box grinds to a halt on boot with 6GiB ram
G965 chipset box grinds to a halt on boot with 6GiB ram
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
8
x86_64 Linux
low Severity low
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-26 03:50 EDT by Noa Resare
Modified: 2008-04-09 01:14 EDT (History)
2 users (show)

See Also:
Fixed In Version: 2.6.24.4-64.fc8
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-04-01 17:37:43 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
hopefully complete patchset (22.47 KB, text/plain)
2008-03-27 12:13 EDT, Dave Jones
no flags Details
fixes. take 2. (54.11 KB, patch)
2008-03-27 13:20 EDT, Dave Jones
no flags Details | Diff

  None (edit)
Description Noa Resare 2008-03-26 03:50:56 EDT
Description of problem:

I have an x86_64 box with an Asus P5B-VM motherboard with the Intel G965 chipset. According to the 
motherboard documentation the machine is supposed to support up to 8GiB RAM. The machine 
operates flawlessly with 2 * 2048MiB PC2-6400 memory, but when I add another 2 * 1024MiB to the 
empty slots, the regular kernel doesn't boot. (The xen kernel works, however)

Version-Release number of selected component (if applicable):
Bios version: 1001
kernel: 2.6.24.3-34.fc8

How reproducible:
Always (if you have the right hardware :)


Steps to Reproduce:
1. Insert the memory, verify in bios that 6GiB is indeed detected
2. Boot the kernel
3.
  
Actual results:
The boot process actually goes on but extremely slow (several minutes for activating lvm volumes) but 
the "Starting udev:" never seems to return. (Have only tried about 20 minutes though)

Expected results:
A startup with regular speed

Additional info:
The system works and detects the memory correctly if using the xen kernel (2.6.21.7-2.fc8xen)
Comment 1 Dave Jones 2008-03-26 10:23:19 EDT
Does this failure also happen with the kernel-PAE package ?

Comment 2 Noa Resare 2008-03-26 10:43:51 EDT
Well, running an i686 kernel with an x86_64 distribution seems a bit problematic of it's own. I pulled 
kernel-PAE-2.6.24.3-34 from the F8 i386 updates and installed, but when booting i got five lines of 

'request_module: runaway loop modprobe binfmt-464c'

Am I doing something wrong in an obvious way?
Comment 3 Dave Jones 2008-03-26 10:52:40 EDT
oh, I missed that this was on x86-64 (too early, not enough caffeine yet).
Ignore the PAE suggestion.

Could you grab the 2.6.25rc x86-64 kernel from rawhide and see if that's any
better ?
If this is a fixed bug already, it may be something we can selectively backport
if we can pin point it to a single patch. 
Comment 4 Noa Resare 2008-03-26 11:12:12 EDT
Upgrading to kernel-2.6.25-0.155.rc6.git8.fc9 does the trick.

So, if you have a collection of kernel packages between 2.6.24.3-34 and 2.6.25-0.155.rc6.git8 I'd be 
happy to binary search for when the fix went in.
Comment 5 Dave Jones 2008-03-26 11:19:51 EDT
Adding Thomas and Ingo to the Cc. Perhaps they remember something in particular
that got merged which could be responsible for this.

You can find a ton of built rpms at
http://koji.fedoraproject.org/koji/packageinfo?packageID=8 though it looks like
a lot of the interim builds between .24 and .25rc3 got purged.

We might have to resort to hand building kernels, and using git-bisect to narrow
it down. Are you familiar with this process ?
Comment 6 Noa Resare 2008-03-26 11:28:25 EDT
I haven't built kernels in a few years, so if you have a pointer to some document describing the current 
state of affairs in kernel land it would be helpful.

I need to pick up my son at daycare now, but when I'm back I'll try out some pre-built kernels and 
perhaps have a go at rolling my own :)
Comment 7 Dave Jones 2008-03-26 11:47:58 EDT
http://fedoraproject.org/wiki/BuildingUpstreamKernel should be useful.
If any parts are unclear, send me an email, and I'll walk you through the process.
Comment 8 Noa Resare 2008-03-26 15:23:03 EDT
Some casual testing indicates that the first .25rc1 kernel built
(kernel-2.6.25-0.33.rc1.fc9) works but all 2.6.24 kernels has the problem. I
have played around with git bisect between 2.6.24 and 2.6.25-rc1 and after one
iteration I'm down to ~3000 patches to test. (It took a while to figure out that
to find a fix, good means bad and bad means good)

Compiling takes a lot of time though. Any good and easy to use shortcuts?
Comment 9 Noa Resare 2008-03-27 07:55:06 EDT
A few (heh) iterations of git bisect points to commit

99fc8d424bc5d803fe92cad56c068fe64e73747a x86, 32-bit: trim memory not covered by
wb mtrrs

Judging from the commit description it seems like this is our thing. If at all
possible, it would be really nice if this fix could be backported to the next
errata F8 kernel. If testing is needed, I'm here.
Comment 11 Dave Jones 2008-03-27 11:44:29 EDT
excellent.  Thanks for the great help debugging Noa.   And thanks Thomas for
pinpointing the solution so quickly.   I'll get this diff into the next update.
Comment 12 Dave Jones 2008-03-27 11:58:19 EDT
Hmm. That seems to be dependant upon other changes in .25
Does this patch look ok ?  It's a combination of
093af8d7f0ba3c6be1485973508584ef081e9f93 and
76c324182bbd29dfe4298ca65efb15be18055df1 , but just the setup_32.c changes on
top of .24

diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c
index a441deb..1fc93de 100644
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -47,6 +47,7 @@
 
 #include <video/edid.h>
 
+#include <asm/mtrr.h>
 #include <asm/apic.h>
 #include <asm/e820.h>
 #include <asm/mpspec.h>
@@ -328,8 +329,6 @@ static unsigned long __init setup_memory(void)
         */
        min_low_pfn = PFN_UP(init_pg_tables_end);
 
-       find_max_pfn();
-
        max_low_pfn = find_max_low_pfn();
 
 #ifdef CONFIG_HIGHMEM
@@ -616,6 +615,12 @@ void __init setup_arch(char **cmdline_p)
        strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
        *cmdline_p = command_line;
 
+       /* update e820 for memory not covered by WB MTRRs */
+       find_max_pfn();
+       mtrr_bp_init();
+       if (mtrr_trim_uncached_memory(max_pfn))
+               find_max_pfn();
+
        max_low_pfn = setup_memory();
 
 #ifdef CONFIG_VMI
Comment 13 Dave Jones 2008-03-27 12:01:47 EDT
argh, this won't work at all, because we also don't have
99fc8d424bc5d803fe92cad56c068fe64e73747a in .24
Comment 14 Dave Jones 2008-03-27 12:13:09 EDT
Created attachment 299355 [details]
hopefully complete patchset 

This should be all the dependant patches all in one.
Noa, can you try this on top of v2.6.24 ?

You can do this with the git tree you already have by doing..

git reset v2.6.24
git checkout
cat ~/mtrr.diff | patch -p1

and then building like you did the others.

Thanks
Comment 15 Noa Resare 2008-03-27 12:34:35 EDT
It seems like your patch is missing the update_e820() function. Compilation
fails with this:

arch/x86/kernel/cpu/mtrr/main.c: In function ‘mtrr_trim_uncached_memory’:
arch/x86/kernel/cpu/mtrr/main.c:730: error: implicit declaration of function
‘update_e820’
make[3]: *** [arch/x86/kernel/cpu/mtrr/main.o] Error 1
Comment 16 Thomas Gleixner 2008-03-27 13:13:00 EDT
sorry for the confusion, Dave just pushed my nose to the fact that you run a 64
bit kernel. So the commit which was pointed at by bisect changed something for
64bit as well. I have a look.

Comment 17 Dave Jones 2008-03-27 13:20:34 EDT
Created attachment 299365 [details]
fixes. take 2.

This one contains a lot more changes, but should be more complete.

You can unapply the previous one with git diff | patch -p1 -R
before applying this one in the same manner as before.
Comment 18 Noa Resare 2008-03-27 14:32:31 EDT
The new version of the patch applied to a clean 2.6.24 compiles and the
resulting kernel fixes the booting problem. Good work!
Comment 19 Dave Jones 2008-03-27 14:40:02 EDT
awesome, I'll get that into a build.
Thanks again for your testing.
Comment 20 Fedora Update System 2008-03-28 16:58:14 EDT
kernel-2.6.24.4-63.fc8 has been submitted as an update for Fedora 8
Comment 21 Fedora Update System 2008-03-29 14:31:30 EDT
kernel-2.6.24.4-64.fc8 has been submitted as an update for Fedora 8
Comment 22 Noa Resare 2008-03-29 15:13:38 EDT
I just tried out kernel-2.6.24.4-64.fc8 from koji and it works beautifully with my 6 gigs of memory. 
Thanks a lot Dave and others for the quick response to my report. 
Comment 23 Fedora Update System 2008-04-01 17:37:33 EDT
kernel-2.6.24.4-64.fc8 has been pushed to the Fedora 8 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 24 Fedora Update System 2008-04-09 01:14:24 EDT
kernel-2.6.24.4-64.fc8 has been pushed to the Fedora 8 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.