Bug 453671 - Memory test bails prematurely when testing greater than 256 GB in RH4.6 x64
Memory test bails prematurely when testing greater than 256 GB in RH4.6 x64
Status: CLOSED WONTFIX
Product: Red Hat Hardware Certification Program
Classification: Red Hat
Component: Test Suite (tests) (Show other bugs)
5.2
All Linux
low Severity medium
: ---
: ---
Assigned To: Greg Nichols
Lawrence Lim
: Reopened
Depends On:
Blocks: 465506
  Show dependency treegraph
 
Reported: 2008-07-01 15:56 EDT by Gregg Shick
Modified: 2014-03-25 20:55 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-12-16 15:44:44 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
test results passing with 256 and failing with greater than 256GB. (798.46 KB, application/x-rpm)
2008-07-01 15:56 EDT, Gregg Shick
no flags Details
/var/log/messages output (291.85 KB, text/plain)
2008-07-16 14:34 EDT, Gregg Shick
no flags Details
runtest.sh results run outside of hts test harness (389 bytes, text/plain)
2008-07-16 14:34 EDT, Gregg Shick
no flags Details
dmesg file (15.07 KB, text/plain)
2008-10-09 17:14 EDT, Gregg Shick
no flags Details
xm info file (1.00 KB, text/plain)
2008-10-09 17:14 EDT, Gregg Shick
no flags Details

  None (edit)
Description Gregg Shick 2008-07-01 15:56:51 EDT
Description of problem: Memory test bails prematurely when testing greater than
256 GB in RH4.6 x64

Version-Release number of selected component (if applicable): RH4.6 x64 / HTS 5.2-16
Proliant 785 / 8 processors / 512GB memory

How reproducible: Every time


Steps to Reproduce:
1.Install RH4.6 x64 on a system with 512GB
2.Install HTS 5.2-16
3.Execute memory test
  
Actual results: Test fails after only a few seconds of runtime.  

Expected results: Test at minimum runs to completion.  


Additional info:  It will run successfully and pass with 256GB.  The breaking
point is somewhere between 320 and 384GB.
Comment 1 Gregg Shick 2008-07-01 15:56:52 EDT
Created attachment 310713 [details]
test results passing with 256 and failing with greater than 256GB.
Comment 2 David Aquilina 2008-07-09 11:06:52 EDT
Gregg, 

Does the test fail if you run it manually outside of the test harness? To do so,
switch to the directory containing the memory test and run runtest.sh:

cd /usr/share/hts/tests/memory
./runtest.sh

Please capture any output produced, as well as /var/log/messages. 

Thanks! 
Comment 3 Gregg Shick 2008-07-16 14:34:14 EDT
Created attachment 311977 [details]
/var/log/messages output
Comment 4 Gregg Shick 2008-07-16 14:34:47 EDT
Created attachment 311978 [details]
runtest.sh results run outside of hts test harness
Comment 5 Gregg Shick 2008-07-16 14:36:22 EDT
David

The test also fails outside of hts.  I tried increasing swap to 1TB (David 
Hester suggestion).  That also had no affect.  
Comment 6 Greg Nichols 2008-07-16 19:45:35 EDT
To run the test outside of the hardness, use "make run", which will compile
threaded_memtest.c as part of the test run.

Please give this a try.
Comment 7 Sandy Garza 2008-07-17 10:56:34 EDT
Is there a memory limitation on RHEL 4.6? What is the max memory support for 
4.6?
Comment 8 Rob Landry 2008-07-17 18:03:05 EDT
rhr2 has been deprecated, closing these remaining bugs as WONTFIX.  Future bugs
against the "hts" test suite should be opened agains the "Red Hat Hardware
Certification Program" product selecting either "Test Suite (harness)" or "Test
Suite (tests)" components.
Comment 9 Gregg Shick 2008-07-18 09:20:45 EDT
this was tested using hts-5.2-16.el4.noarch, not rhr2.
Comment 10 Gregg Shick 2008-07-21 14:22:08 EDT
(In reply to comment #6)
> To run the test outside of the hardness, use "make run", which will compile
> threaded_memtest.c as part of the test run.
> 
> Please give this a try.

Greg

Do you have more specific instructions for doing this?  Thanks..
Comment 11 David Aquilina 2008-08-20 11:56:16 EDT
(In reply to comment #10)
> Do you have more specific instructions for doing this?  Thanks..

cd /usr/share/hts/tests/memory
make run

... should do the trick.
Comment 13 Micah Parrish 2008-08-28 03:05:14 EDT
chmod a+x ./runtest.sh ./memory.py
./runtest.sh
/usr/share/hts/tests/memory/memory.py
Running ./memory.py:
System Memory: 515479 MB
Free Memory: 515010 MB
Swap Memory: 1983 MB
Starting Threaded Memory Test
running for more than free memory at 516034 MB for 60 sec.
mmap: Cannot allocate memory
Warning: memsize > free_mem. You will probably hit swap.
Detected 32 processors.
RAM: 99.6% free (501G/503G)
Testing 503G RAM for 60 seconds using 64 threads:
thread 0: mapping 8063M RAM
thread 1: mapping 8063M RAM
thread 2: mapping 8063M RAM
thread 3: mapping 8063M RAM
thread 4: mapping 8063M RAM
thread 5: mapping 8063M RAM
thread 6: mapping 8063M RAM
thread 7: mapping 8063M RAM
thread 8: mapping 8063M RAM
thread 9: mapping 8063M RAM
thread 10: mapping 8063M RAM
thread 11: mapping 8063M RAM
thread 12: mapping 8063M RAM
thread 13: mapping 8063M RAM
thread 14: mapping 8063M RAM
thread 15: mapping 8063M RAM
thread 16: mapping 8063M RAM
thread 17: mapping 8063M RAM
thread 18: mapping 8063M RAM
thread 19: mapping 8063M RAM
thread 20: mapping 8063M RAM
thread 21: mapping 8063M RAM
thread 22: mapping 8063M RAM
thread 23: mapping 8063M RAM
thread 24: mapping 8063M RAM
thread 25: mapping 8063M RAM
thread 26: mapping 8063M RAM
thread 27: mapping 8063M RAM
thread 28: mapping 8063M RAM
thread 29: mapping 8063M RAM
thread 30: mapping 8063M RAM
thread 31: mapping 8063M RAM
thread 32: mapping 8063M RAM
thread 33: mapping 8063M RAM
thread 34: mapping 8063M RAM
thread 35: mapping 8063M RAM
thread 36: mapping 8063M RAM
thread 37: mapping 8063M RAM
thread 38: mapping 8063M RAM
thread 39: mapping 8063M RAM
thread 40: mapping 8063M RAM
thread 41: mapping 8063M RAM
thread 42: mapping 8063M RAM
thread 43: mapping 8063M RAM
done.
...finished running ./memory.py, exit code=1
recovered exit code=1
hts-report-result /HTS/hts/memory FAIL /tmp/tmp.P28871
Comment 14 Micah Parrish 2008-08-28 03:43:00 EDT
I also tried to run memhog.  It runs with memhog 255g and fails with memhog 256g.  The failure message is:

numactl: mmap: Cannot allocate memory
Comment 15 Rob Landry 2008-08-28 14:44:27 EDT
We'll need to look but I think we're just running into the process size ceiling of x86_64 and will need to do like we do with x86 (process split) but @ a larger number instead of ~4GB.
Comment 17 David Aquilina 2008-09-17 15:31:00 EDT
Greg, 

Can you please open up a certification request with the INFO test run and the (failed) memory test logs? 512G is larger than the current maximum so we'll need to have Engineering take a look at the system as well. Please post the certification # once you've done so. 

Thanks! 

-David
Comment 19 Gregg Shick 2008-10-09 17:14:24 EDT
Created attachment 319940 [details]
dmesg file
Comment 20 Gregg Shick 2008-10-09 17:14:40 EDT
Created attachment 319941 [details]
xm info file
Comment 24 David Aquilina 2008-10-15 15:34:18 EDT
(In reply to comment #14)
> I also tried to run memhog.  It runs with memhog 255g and fails with memhog
> 256g.  The failure message is:
> 
> numactl: mmap: Cannot allocate memory

Can you provide us with a pointer to or copy of memhog? 

thanks!
Comment 25 Micah Parrish 2008-11-03 12:48:16 EST
It's proprietary, part of a test suite called Xorsyst, formerly known as busy.  I assume you can have it with the proper license.  Contact dustin.puim@hp.com if you still need it.
Comment 26 Sandy Garza 2008-12-01 11:38:48 EST
David,

Our engineer is asking "Why does the test pass when system memory is 256GB, and fail when 512GB? What does "process limit" have to do with system RAM?"

Thanks.
Comment 27 David Aquilina 2008-12-01 13:46:24 EST
Sandy, 

Currently a single process is used to run the memory test, so when that process hits the process size limit it's unable to allocate any additional memory. 

-David
Comment 28 Sandy Garza 2008-12-03 10:17:48 EST
David,

Is RH proposing a code change to the Cert Test? If so what is the change.

If RH is proposing a code change to the kernel, what is the change. A code snippet would be helpful.

Thanks,Sandy
Comment 29 David Aquilina 2008-12-03 15:33:30 EST
Sandy, 

We've been waiting to hear what HP's desire is here. If you do not care about increasing the process size limit then we can look into changing certification test suite to use multiple processes above 256G. This would however limit any one process to 256G, which could possibly cause customer problems as they bump against this limit. 

If you wanted to raise the process limit, you'll need to open an RFE with Ron to do so. 

-David
Comment 30 Sandy Garza 2008-12-09 10:53:24 EST
David,
We would like to close the BZ for the following reasons:

1. According to RH, the failure reported in this BZ seems to be expected behavior

2. We have no customer requests to increase the process size limit in the RH4.x kernel

3. RH4.x itself officially supports only 256G of system RAM for AMD64.
Comment 31 Rob Landry 2008-12-16 15:44:44 EST
Closing wontfix per the above reasons provided by HP; as well RHEL5.x should not encounter a similar issue making this not a generic problem.

Note You need to log in before you can comment on or make changes to this bug.