Bug 445893 - Grub sometimes does not detect entire memory map
Summary: Grub sometimes does not detect entire memory map
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: grub
Version: 5.2
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Jan Grulich
QA Contact:
URL:
Whiteboard:
Depends On: 250299
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-05-09 17:09 UTC by Dave Wysochanski
Modified: 2016-06-13 15:56 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-03 12:49:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
meminfo, cpuinfo, and xm list (3.78 KB, text/plain)
2008-05-09 17:23 UTC, Dave Wysochanski
no flags Details
Patch fixing some HW memory issues (325 bytes, patch)
2013-03-22 11:04 UTC, Václav Pavlín
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
CentOS 1905 0 None None None Never

Description Dave Wysochanski 2008-05-09 17:09:14 UTC
+++ This bug was initially created as a clone of Bug #250299 +++

Description of problem:

Sometimes, grub does not recognize the entire e820 memory map, when the map is
not provided by the BIOS in the multiboot information data structure.
Ultimately, this appears to be an over-optimization by the compiler when
building grub, in combination with buggy BIOS.

The result is that some downstream kernels (in my particular case, the Xen
kernel) does not recognize all the memory available on a system.

Version-Release number of selected component (if applicable):

grub-0.97-13

How reproducible:

From the lack of discussion on the Fedora and Xen mailing lists, I presume this
problem is rare in the real world, and may be hard to reproduce.

The machine this happens on for me is has two Dual-Core AMD Opteron 2210 w/16GB
of memory, using an American Megatrends BIOS. Upon boot on an unpatched grub,
Xen only recognizes slightly less than 4GB. The Multiboot information data
structure has flag for the mem_lower/mem_upper fields set (and Xen memory
detection matches the values in these fields), and does not have the flag for
memory map set. 

Patch to fix problem:

--- grub-0.97/stage2/common.c.e820      2007-07-30 11:36:19.000000000 -0800    
                            
+++ grub-0.97/stage2/common.c           2007-07-30 11:36:55.000000000 -0800    
                            
@@ -142,6 +142,7 @@
 init_bios_info (void)                                                         
                            
 {                                                                             
                            
 #ifndef STAGE1_5                                                              
                            
-  unsigned long cont, memtmp, addr;                                           
                            
+  unsigned long memtmp, addr;                                                 
                            
+  volatile unsigned long cont;                                                
                            
   int drive;                                                                  
                            
 #endif

-- Additional comment from peter.peltonen on 2007-08-14 08:56 EST --

I encountered the same problem with i386 and x86_64 CentOS5 + xen-enabled
kernels. Only 2,9GB of my 6GB RAM was recognized. Non-xen 64bit and 32bit
PAE-kernels saw the memory correctly. After patching grub with this patch and
reinstalling grub (not just the rpm -- grub has to be reinstalled from grub
prompt) all memory was recognized correctly.

-- Additional comment from dustin.henning on 2007-10-23 13:14 EST --
I too experienced this issue.  The system in question was a Core 2 Quad E6600 
with 8GiB on an Intel P965 Express chipset.  Like the original reporter, this 
system also utilizes an AMIBIOS.
In my case, only 3.2GiB was recognized prior to the patch, and the base (SMP) 
kernel did not recognize the ful amount of memory until I added mem=10G to the 
kernel arguments.  Once the base (SMP) kernel was booted detecting 7.8GiB, the 
problem did not reoccur when the kernel argument was removed.  Said kernel 
argument (and derivatives) had no effect on the xen kernels when placed after 
either/both kernel lines (xen.gz and module vmlinuz).

-- Additional comment from nathan.robertson on 2008-02-07 09:49 EST --
I too an experiencing this issue on an AMD64 machine with 8GBs of memory.  Does
anyone know if there is an updated Grub package with this patch applied?

-- Additional comment from eric.moret on 2008-02-14 09:25 EST --
Any progress in applying this patch? I too have this issue.

-- Additional comment from eric.moret on 2008-02-15 11:11 EST --
You can grab the fixed package at:
ftp://ftp.zouric.com/public/grub-0.97-14.x86_64.rpm
SRPMS at
ftp://ftp.zouric.com/public/grub-0.97-14.src.rpm

-- Additional comment from amyagi on 2008-02-15 11:29 EST --
(In reply to comment #5)
> You can grab the fixed package at:
> ftp://ftp.zouric.com/public/grub-0.97-14.x86_64.rpm
> SRPMS at
> ftp://ftp.zouric.com/public/grub-0.97-14.src.rpm

Thanks for making the patched grub available. According to your note on the
CentOS forum:

http://www.centos.org/modules/newbb/viewtopic.php?topic_id=12491&forum=38

you have fixed the problem on a Hetzner root server DS8000 ?

-- Additional comment from grover66 on 2008-02-17 00:35 EST --
After installing the above grub rpm, you will have to run "grub-install
/dev/sda" (for example) to make it all work.

-Mike

-- Additional comment from eric.moret on 2008-03-20 02:41 EST --
(In reply to comment #6)
> you have fixed the problem on a Hetzner root server DS8000 ?

Yes, that is correct. I now have my 8Gb of RAM recognized on a Hetzner DS8000

-- Additional comment from steve on 2008-05-07 02:18 EST --
(In reply to comment #5)
> You can grab the fixed package at:
> ftp://ftp.zouric.com/public/grub-0.97-14.x86_64.rpm
> SRPMS at
> ftp://ftp.zouric.com/public/grub-0.97-14.src.rpm

Many thank for this patch Eric, it's also allowed me to see the 6GB on
CentOS/Xen install on Core 2 Quad Acer.

Steve

Comment 1 Dave Wysochanski 2008-05-09 17:19:40 UTC
Running latest grub for RHEL5 - grub-0.97-13.2 and I see the same problem on my
local machine.  I have pics of screen captures if my mail ever comes back up.

Sounds similar/identical to the fedora bug though:
- 8GB RAM, only ~3GB seen by grub
- American Megatrends BIOS

Oddly, dom0 only has 1GB RAM.

I have not tried the small patch.

Comment 2 Dave Wysochanski 2008-05-09 17:23:20 UTC
Created attachment 304968 [details]
meminfo, cpuinfo, and xm list

Comment 3 Dave Wysochanski 2008-05-09 19:49:12 UTC
Tried the patch mentioned above but it did not work.

However upgrading to the latest 5.2 nightly kernel (and other packages) helped
the machine recognize more memory.  I now have 6GB recognized by the kernel (not
8 but probably ok for now).  The low usable memory was my main problem.

Comment 4 cherradi 2008-05-11 19:21:58 UTC
Hi there,
I've got a Dell 1900 with CentOS 5.1 64 bit and 4 GB ram.
Only 3 GB are recognized with kernel-xen-2.6.18-53.1.19.el5.
I've downloaded & installed grub-0.97-14.x86_64.rpm and did grub-install /dev/sda.
But it did not help and I still got 3 GB recognized.
Can you please help me to fix this.
Thank you in advance.

Comment 5 Vadym Chepkov 2010-06-25 08:15:23 UTC
It has been more then two years since the report, still issue in Redhat 5.5

Comment 6 Dave Wysochanski 2011-04-11 14:04:01 UTC
Peter - can you check this out?

Comment 7 Akemi Yagi 2011-04-11 14:49:59 UTC
Vadym and others who are affected by this bug:

RHEL 5.6 still has grub-0.97-13.5. The patched version (  grub-0.97-13.5.bz250299 ) mentioned in Bug #250299 is available for testing at:

http://centos.toracat.org/grub-0.97/CentOS-5/

Can you give it a try and see if that fixes the issue?

Comment 8 Vadym Chepkov 2011-04-13 03:51:32 UTC
No, it does not.


# cat /proc/meminfo 
MemTotal:      3374032 kB
MemFree:       3315092 kB

# rpm -q grub
grub-0.97-13.5.bz250299

(I also did grub-install /dev/sda)

# dmidecode

Handle 0x1100, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 1024 MB
        Form Factor: DIMM
        Set: None
        Locator: DIMM1
        Bank Locator: Not Specified
        Type: DDR
        Type Detail: Synchronous
        Speed: 533 MHz
        Manufacturer:AD00000000000000
        Serial Number: 00000000
        Asset Tag: Not Specified
        Part Number:  HYMP512U64BP8-Y5           

Handle 0x1101, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 1024 MB
        Form Factor: DIMM
        Set: None
        Locator: DIMM2
        Bank Locator: Not Specified
        Type: DDR
        Type Detail: Synchronous
        Speed: 533 MHz
        Manufacturer: AD00000000000000
        Serial Number: 00000000
        Asset Tag: Not Specified
        Part Number: HYMP512U64BP8-Y5  


Handle 0x1102, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 1024 MB
        Form Factor: DIMM
        Set: None
        Locator: DIMM3
        Bank Locator: Not Specified
        Type: DDR
        Type Detail: Synchronous
        Speed: 533 MHz
        Manufacturer: AD00000000000000
        Serial Number: 00000000
        Asset Tag: Not Specified
        Part Number: HYMP512U64BP8-Y5  

Handle 0x1103, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x1000
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 1024 MB
        Form Factor: DIMM
        Set: None
        Locator: DIMM4
        Bank Locator: Not Specified
        Type: DDR
        Type Detail: Synchronous
        Speed: 533 MHz
        Manufacturer: AD00000000000000
        Serial Number: 00000000
        Asset Tag: Not Specified
        Part Number: HYMP512U64BP8-Y5  


# dmesg|grep Memory:
Memory: 3369632k/3406360k available (2190k kernel code, 35392k reserved, 911k data, 228k init, 2488856k highmem)

Comment 9 Vadym Chepkov 2011-09-19 01:38:52 UTC
Still a problem in 5.7

Comment 10 Akemi Yagi 2011-12-08 20:33:01 UTC
Because I do not have hardware that is affected by this bug, I cannot check if it still exists in RHEL 6.2. But I see that the section of code that appears in the patch provided by the submitter remains unchanged in grub-0.97-75.el6. So, I suspect the problem persists in RHEL 6.2. 

A case that is _potentially_ related (but with CentOS 6.0):

http://www.centos.org/modules/newbb/viewtopic.php?topic_id=34608&forum=57

Comment 11 John Robinson 2012-02-26 12:47:18 UTC
This updated GRUB fixes an issue I was having with the shipping GRUB on (CentOS) el5.

I have an Asus P5Q Pro motherboard, and recently installed an IBM ServeRAID M1015 (LSI MegaRAID 9240-8i). With the new card in, the stock GRUB gets as far as a black-and-white screen (no splash image etc) and says:

GNU GRUB version 0.97 (0K lower / 0K upper memory)
[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename. ]

Yes, no memory detected. There's the GRUB prompt but it hasn't even initialised the keyboard properly so you can't ask for a memory map or indeed go any further.

With the original IBM firmware, it started OK, but that firmware doesn't support JBOD mode. With the latest IBM firmware, there's this problem. With the latest LSI firmware, we have this problem. It looks like the MegaRAID BIOS does something screwy with the memory map, perhaps in conjunction with the Asus/AMI BIOS doing something screwy with the memory map, so that the stock GRUB can't boot, while this one can.

In the interrim, before finding this bug report and patched GRUB, I had been able to boot just fine from rescue media and with lilo.

In my Googling, I also found someone else with a similar problem years ago, which they worked around by changing compiler options (sorry, I've lost that link), and there are other examples of people having similar problems with Asus P5Q series motherboards, e.g. http://www.linuxquestions.org/questions/linux-kernel-70/grub-error-28-selected-item-cannot-fit-into-memory-747029/ and with the more recent MegaRAIDs, e.g. http://forum.proxmox.com/threads/6965-grub-boot-problem-after-raid-controller-replacement

I was thinking of upgrading to el6 soon but will hold off if this bug is likely to be present there too.

If you want more info about my setup, tell me what you want. In particular, if you want the output of grub's displaymem command, tell me how to get grub to save it somewhere, it's over a screen long so I'd screw up retyping it.

Comment 12 Martin Dengler 2012-04-03 03:09:41 UTC
This problem is also present in grub2, as I'm seeing it using F16 on a Dell Optiplex 980 with (last I checked) the latest BIOS.

Smolt system profile:
http://www.smolts.org/client/show/pub_63981753-52a1-4470-b69d-dc3e05ecfbaa .

I will get some dmesg data, but as I'm using grub2 I'm not sure what's required
/ useful.

Comment 13 Václav Pavlín 2013-03-21 13:35:32 UTC
This bug would probably need quite extensive changes in code, which are not desired in this late phase of RHEl-5 release cycle. Closing wontfix.

Comment 14 John Robinson 2013-03-21 15:56:46 UTC
Given that for me this is a showstopper bug (GRUB sees zero memory and refuses to boot), that the RPM linked to in comment #7 fixes it for me, and that the only change in that RPM is applying the patch from bug #250299 - marking one variable as volatile - I would have thought it was worth another look.

Comment 15 Václav Pavlín 2013-03-22 11:04:01 UTC
Created attachment 714482 [details]
Patch fixing some HW memory issues

Sorry John,

I got somehow confused about the conflicting statements that it for someone works and for others don't. Sure, the patch is small and if it works for you, we can try to apply it.

Comment 16 RHEL Program Management 2014-03-07 12:14:01 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in the  last planned RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX. To request that Red Hat re-consider this request, please re-open the bugzilla via  appropriate support channels and provide additional business and/or technical details about its importance to you.

Comment 17 RHEL Program Management 2014-06-03 12:49:09 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).


Note You need to log in before you can comment on or make changes to this bug.