Bug 623787 - core test - stress portion should scale for memory size
Summary: core test - stress portion should scale for memory size
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Hardware Certification Program
Classification: Retired
Component: Test Suite (tests)
Version: 1.2
Hardware: s390x
OS: Linux
low
urgent
Target Milestone: ---
: ---
Assignee: Greg Nichols
QA Contact: Guangze Bai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-08-12 19:00 UTC by Greg Nichols
Modified: 2015-02-08 21:36 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2011-05-09 16:15:35 UTC
Embargoed:


Attachments (Terms of Use)
core test patch to scale stress test for free memory (3.19 KB, patch)
2011-01-21 19:05 UTC, Greg Nichols
no flags Details | Diff
test.py patch moving the function for getting memory limits to the Test base class (2.27 KB, patch)
2011-01-21 19:07 UTC, Greg Nichols
no flags Details | Diff
memory test patch to call the new method for memory limits (2.69 KB, patch)
2011-01-21 19:08 UTC, Greg Nichols
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0497 0 normal SHIPPED_LIVE v7 bug fix and enhancement update 2011-05-09 16:11:16 UTC

Description Greg Nichols 2010-08-12 19:00:06 UTC
Description of problem:

The stress portion of the core test hangs the system.   This occurs either by running the core test from v7, or running the stress test independantly:

[root@ibm-z10-15 ~]# stress --cpu 12 --io 12 --vm 12 --vm-bytes 128M --timeout 10m
stress: info: [2025] dispatching hogs: 12 cpu, 12 io, 12 vm, 0 hdd

Seen on system: ibm-z10-15.rhts.eng.bos.redhat.com

Version-Release number of selected component (if applicable):

v7 1.2 R16
RHEL6 Snapshot 7

Comment 1 Greg Nichols 2010-08-24 13:52:27 UTC
Also, the stress portion can fail with logs as follows:

<output>
Running ./core.py:
Clock Info: ------------------------------------------
kernel: Switching to clocksource tod

Clock Source per system log: tod
Clock Source in /sys/devices/system/clocksource/clocksource*/current_clocksource: tod

CPU Vendor: IBM/S390
Running clock tests
Testing for clock jitter on 2 cpus
PASSED, largest jitter seen was 0.015548
clock direction test: start time 1282624049, stop time 1282624109, sleeptime 60, delta 0
PASSED
Running stress for 10 min.
stress: FAIL: [3561] (416) <-- worker 3562 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3563 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3564 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3565 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3566 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3567 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3568 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (452) failed run completed in 601s
Error:
"stress --cpu 12 --io 12 --vm 12 --vm-bytes 128M --timeout 10m" has output on stderr
...finished running ./core.py, exit code=1
</output>

Comment 2 Greg Nichols 2010-08-27 17:57:45 UTC
Some investigation reveals stress processes are being OOM killed.    The core test uses 12 "vm hog" processes at 128MB each.   The test probably needs to check if the defaults are going to exaust memory.

Comment 3 Greg Nichols 2010-08-27 18:41:33 UTC
Changing summary.   This is really a scaling issue, in that the above example are for systems that are too small to run the test, but also, large scale systems should probably be stressed more by the test.

It's also not an issue dependant on arch.   For example, it also applies to the size of quests in fv_core.

Comment 5 Greg Nichols 2011-01-21 19:05:45 UTC
Created attachment 474663 [details]
core test patch to scale stress test for free memory

Also, this patch divides the core test into subtests.

Comment 6 Greg Nichols 2011-01-21 19:07:03 UTC
Created attachment 474664 [details]
test.py patch moving the function for getting memory limits to the Test base class

Comment 7 Greg Nichols 2011-01-21 19:08:13 UTC
Created attachment 474665 [details]
memory test patch to call the new method for memory limits

Comment 12 Caspar Zhang 2011-04-30 08:35:21 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
In v7 1.2, the stress portion of the core test hangs the system because of the memory size passed to test argument was too large. This issue is fixed in v7 1.3, now the argument in stress portion is scalable to fit for memory size.

Comment 13 errata-xmlrpc 2011-05-09 16:15:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0497.html


Note You need to log in before you can comment on or make changes to this bug.