Bug 178887

Summary: MEMORY2 test segfaults
Product: [Retired] Red Hat Ready Certification Tests Reporter: Rainer Koenig <Rainer.Koenig>
Component: rhr2-testsAssignee: Will Woods <wwoods>
Status: CLOSED ERRATA QA Contact: Rob Landry <rlandry>
Severity: medium Docs Contact:
Priority: medium    
Version: 2CC: djuran, richardl
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2006-0273 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-05-08 16:23:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
hardware.log for CELSIUS M440
none
output.log for CELSIUS M440
none
hardware.log for CELSIUS R630
none
output.log for CELSIUS R630
none
updated lmbench package for RHEL4 - not signed, not for public consumption none

Description Rainer Koenig 2006-01-25 08:25:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-DE; rv:1.7.8) Gecko/20050517 Firefox/1.0.4 (Debian package 1.0.4-2)

Description of problem:
Running the MEMORY2 test results in FAILED. Output.log ends like that:

+ echo 'Testing memory read latency (cache-line size detection etc.)'
Testing memory read latency (cache-line size detection etc.)
+ /usr/lib/lmbench/bin/x86_64-linux-gnu/lat_mem_rd 8000 16 32 64 128 256 512 1024
"stride=16
/usr/share/rhr/tests/MEMORY2: line 29: 12246 Segmentation fault      $lmbench_dir/lat_mem_rd $MB 16 32 64 128 256 512 1024
17516180600441360101755021819678460706


Version-Release number of selected component (if applicable):
2.6.9.22-EL.smp

How reproducible:
Always

Steps to Reproduce:
1. Install RHEL4 Update 2 for x86_64 on machine
2. Install dt-15.14-2.EL4.x86_64.rpm, ltp-20050804-1.EL4.x86_64.rpm,
lmbench-2.0.4-1.EL4.x86_64.rpm  and rhr2-2.0-1.EL4.x86_64.rpm
3.  Run the MEMORY2 test
  

Actual Results:  MEMORY2 test results with "FAILED".

Expected Results:  MEMORY2 test should pass. :-)

Additional info:

I tried this on 2 independent hardware systems:
a) Fujitsu Siemens CELSIUS R630 with 16 GB Memory
b) Fujitsu Siemens CELSIUS M440 with 8 GB Memory
I'll upload attachments with the corresponding output.log and hardware.log files.

Both machines have ECC modules and the BIOS event log shows no errors at all.
On the R630 I was running our own Memory Test utility that we use for RAM module qualification. After 16 hours I got 8 cycles and 0 errors.

From that I have to draw the conclusion that the hardware is ok, but there might be an issue with the new MEMORY2 test.

I tried MEMORY2 as well on a CELSIUS H230 (2 GB RAM, i386 architecture) and there I didn't get any error. So maybe its depending on architecture or size of memory.

Comment 1 Rainer Koenig 2006-01-25 08:26:45 UTC
Created attachment 123653 [details]
hardware.log for CELSIUS M440

Comment 2 Rainer Koenig 2006-01-25 08:27:11 UTC
Created attachment 123654 [details]
output.log for CELSIUS M440

Comment 3 Rainer Koenig 2006-01-25 08:27:37 UTC
Created attachment 123655 [details]
hardware.log for CELSIUS R630

Comment 4 Rainer Koenig 2006-01-25 08:28:23 UTC
Created attachment 123656 [details]
output.log for CELSIUS R630

Comment 5 Rainer Koenig 2006-01-25 09:54:56 UTC
I did another test on the CELSIUS M440, this time with only 4 GB of RAM. Then
MEMORY2 works without segfault. Could it be a problem with the 4 GB boundary? 

Comment 6 Richard Li 2006-01-25 13:16:24 UTC
Rainer, with your own Memory Test utility that you ran on the R630, was it run
on the same version of RHEL that the MEMORY2 test failed on?

Comment 7 Rainer Koenig 2006-01-25 14:15:04 UTC
Hi,
I'm afraid there is a sort of misunderstanding. Our memory test tool doesn't run
on Linux, its a sort of DOS tool that we boot from CD. Its capable also for
configurations above 4 GB and I guess based on the memtest86 that you find with
some other Linux distributions.
Regards
Rainer

Comment 8 Richard Li 2006-01-25 14:19:08 UTC
Thanks for the clarification. Our MEMORY2 test checks that Linux can access
every addressable segment of memory. I think we can rule out bad memory given
your report above. Thus, I see two possibilities:

- A bug in the MEMORY2 test
- A bug related to how RHEL and the hardware interact

We will investigate. Any additional insight you may have on the issue would be
helpful.

Comment 9 Will Woods 2006-01-25 19:02:02 UTC
Created attachment 123683 [details]
updated lmbench package for RHEL4 - not signed, not for public consumption

Comment 10 Will Woods 2006-01-25 19:07:39 UTC
This is probably due to a bug in lmbench - on certain arches, attempting to run
lat_mem_rd with >=2048MB RAM causes it to segfault.

I've attached an updated lmbench package above, which should fix this bug. Could
you try upgrading lmbench with this package, and then run MEMORY2 again? 


Comment 11 Rainer Koenig 2006-01-26 09:09:40 UTC
I tried that update of lmbench on my M440 machine with 8 GB. This time no
errors. The R630 is still busy with the old memory test, if its completed I'll
do the test there as well. But so far it looks like the problem is solved.

Comment 12 Richard Li 2006-01-26 14:08:40 UTC
Thanks for the bug report! Feel free to use the new package for submitting test
results. We will work on issuing an errata for this package.

Comment 13 linas alinskas 2006-02-02 18:47:25 UTC
Hi. I also used the new lmbench RPM and did not see the Seg Fault. It looks like
this is fixed. Thanks.

Comment 14 David Juran 2006-03-06 11:24:23 UTC
According to comment comment 7 in Bug 182713, this problem affects at least the
i386 architecture as well.

/David

Comment 16 Red Hat Bugzilla 2006-05-08 16:23:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0273.html