Bug 623787 - core test - stress portion should scale for memory size
core test - stress portion should scale for memory size
Status: CLOSED ERRATA
Product: Red Hat Hardware Certification Program
Classification: Red Hat
Component: Test Suite (tests) (Show other bugs)
1.2
s390x Linux
low Severity urgent
: ---
: ---
Assigned To: Greg Nichols
Guangze Bai
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-12 15:00 EDT by Greg Nichols
Modified: 2015-02-08 16:36 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
In v7 1.2, the stress portion of the core test hangs the system because of the memory size passed to test argument was too large. This issue is fixed in v7 1.3, now the argument in stress portion is scalable to fit for memory size.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-05-09 12:15:35 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
core test patch to scale stress test for free memory (3.19 KB, patch)
2011-01-21 14:05 EST, Greg Nichols
no flags Details | Diff
test.py patch moving the function for getting memory limits to the Test base class (2.27 KB, patch)
2011-01-21 14:07 EST, Greg Nichols
no flags Details | Diff
memory test patch to call the new method for memory limits (2.69 KB, patch)
2011-01-21 14:08 EST, Greg Nichols
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0497 normal SHIPPED_LIVE v7 bug fix and enhancement update 2011-05-09 12:11:16 EDT

  None (edit)
Description Greg Nichols 2010-08-12 15:00:06 EDT
Description of problem:

The stress portion of the core test hangs the system.   This occurs either by running the core test from v7, or running the stress test independantly:

[root@ibm-z10-15 ~]# stress --cpu 12 --io 12 --vm 12 --vm-bytes 128M --timeout 10m
stress: info: [2025] dispatching hogs: 12 cpu, 12 io, 12 vm, 0 hdd

Seen on system: ibm-z10-15.rhts.eng.bos.redhat.com

Version-Release number of selected component (if applicable):

v7 1.2 R16
RHEL6 Snapshot 7
Comment 1 Greg Nichols 2010-08-24 09:52:27 EDT
Also, the stress portion can fail with logs as follows:

<output>
Running ./core.py:
Clock Info: ------------------------------------------
kernel: Switching to clocksource tod

Clock Source per system log: tod
Clock Source in /sys/devices/system/clocksource/clocksource*/current_clocksource: tod

CPU Vendor: IBM/S390
Running clock tests
Testing for clock jitter on 2 cpus
PASSED, largest jitter seen was 0.015548
clock direction test: start time 1282624049, stop time 1282624109, sleeptime 60, delta 0
PASSED
Running stress for 10 min.
stress: FAIL: [3561] (416) <-- worker 3562 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3563 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3564 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3565 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3566 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3567 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3568 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (452) failed run completed in 601s
Error:
"stress --cpu 12 --io 12 --vm 12 --vm-bytes 128M --timeout 10m" has output on stderr
...finished running ./core.py, exit code=1
</output>
Comment 2 Greg Nichols 2010-08-27 13:57:45 EDT
Some investigation reveals stress processes are being OOM killed.    The core test uses 12 "vm hog" processes at 128MB each.   The test probably needs to check if the defaults are going to exaust memory.
Comment 3 Greg Nichols 2010-08-27 14:41:33 EDT
Changing summary.   This is really a scaling issue, in that the above example are for systems that are too small to run the test, but also, large scale systems should probably be stressed more by the test.

It's also not an issue dependant on arch.   For example, it also applies to the size of quests in fv_core.
Comment 5 Greg Nichols 2011-01-21 14:05:45 EST
Created attachment 474663 [details]
core test patch to scale stress test for free memory

Also, this patch divides the core test into subtests.
Comment 6 Greg Nichols 2011-01-21 14:07:03 EST
Created attachment 474664 [details]
test.py patch moving the function for getting memory limits to the Test base class
Comment 7 Greg Nichols 2011-01-21 14:08:13 EST
Created attachment 474665 [details]
memory test patch to call the new method for memory limits
Comment 12 Caspar Zhang 2011-04-30 04:35:21 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
In v7 1.2, the stress portion of the core test hangs the system because of the memory size passed to test argument was too large. This issue is fixed in v7 1.3, now the argument in stress portion is scalable to fit for memory size.
Comment 13 errata-xmlrpc 2011-05-09 12:15:35 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0497.html

Note You need to log in before you can comment on or make changes to this bug.