Bug 223165
| Summary: | RHEL5 certification MEMORY test took too long on 256GB memory | ||
|---|---|---|---|
| Product: | [Retired] Red Hat Hardware Certification Program | Reporter: | erikj |
| Component: | Test Suite (tests) | Assignee: | Greg Nichols <gnichols> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 5 | CC: | cww, edwardsg, gbeshers, iboverma, jh, martinez, niwa.hideyuki, wei, wwlinuxengineering |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | ia64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2007-04-09 17:12:41 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 230220 | ||
| Bug Blocks: | 222068 | ||
|
Description
erikj
2007-01-18 04:14:51 UTC
Did the test complete? The use of lat_mem_rd should end with a stride of 1024. I'm interested if these test completed, as took excessively long, or some other problem caused the test to hang. - Thanks! Jerry - could you answer comment #3? The test didn't complete. It took about 2 days to finish one stride in lat_mem_rd and we killed the the test after 4 days (we need the big machine for other usage). So it just tooke excessively long. Thanks. John Hesterberg requested we bump the severity to high. *** Bug 227975 has been marked as a duplicate of this bug. *** I made the following changes in the interest of reducing test time for large memory machines: 1) bw_mem cp and bcopy now limit to 1/4 of available memory. 3) lat_mem_rd uses 1G array size, or available memory if less than 1G Fix is in R25 Is this a NUMA system? Was it the xen kernel being tested? Yes, a NUMA system. No, xen is not being used. This is Itanium, and xen no workie yet. Did the memory test with hts-5.0-25 and found out there were redundant output
lines in memory bandwidth test. The most time consuming test, lat_mem_rd on
big memory, wasn't changed. Here were the related scripts in MEMORY2
==============
# limit latency test arraysize to 1 GB
if (( "$MB" > "1024")); then
arraysize=1024
else
arraysize=$MB
fi
echo "Testing memory read latency (cache-line size detection etc.)"
echo "Running: lat_mem_rd $arraysize 16 32 64 128 256 512 1024"
lat_mem_rd $MB 16 32 64 128 256 512 1024
echo "done."
=============
Looked like the lat_mem_rd still used $MB in stead of $arraysize.
Changing to Assigned per above. Fixed R26 If you wanted to provide it (attach it here?), George or Jerry could probably test out a fix on a 256gb machine. We're having a hiccup giving you direct access to the 256gb machine (but working on it). The fix is just to change the variable in line 113 of MEMORY2 (/usr/share/hts/tests/memory/MEMORY2), as in: lat_mem_rd $arraysize 16 32 64 128 256 512 1024 So please make that change and try it out. I have run this on altix3.lab.boston.redhat.com (pw: altix3) and left the results available. Unless I am confused about when I started the test it ran for the better part of 4 days. I will save off the information before doing anything more with the system. The attached file is a log from a modified MEMORY2 which ran just the bw_mem tests. Things to note: Size rd wr rdwr bzero cp bcopy 1024m 1:39.20 0:32.29 0:54.74 0:12.34 1:49.47 1:45.97 16384m 27:29.96 9:00.16 15:46.34 3:13.94 27:10.77 30:39.53 397740m 2:20:43 **NOTE: -N1 (default is 11 times) 795481m 5:17:24 also libc error see BZ230220 Fixed R29 |