Bug 623787

Summary: core test - stress portion should scale for memory size
Product: [Retired] Red Hat Hardware Certification Program Reporter: Greg Nichols <gnichols>
Component: Test Suite (tests)Assignee: Greg Nichols <gnichols>
Status: CLOSED ERRATA QA Contact: Guangze Bai <gbai>
Severity: urgent Docs Contact:
Priority: low    
Version: 1.2CC: czhang, rlandry, sdenham, yshao, yuchen
Target Milestone: ---   
Target Release: ---   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
In v7 1.2, the stress portion of the core test hangs the system because of the memory size passed to test argument was too large. This issue is fixed in v7 1.3, now the argument in stress portion is scalable to fit for memory size.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-09 16:15:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
core test patch to scale stress test for free memory
none
test.py patch moving the function for getting memory limits to the Test base class
none
memory test patch to call the new method for memory limits none

Description Greg Nichols 2010-08-12 19:00:06 UTC
Description of problem:

The stress portion of the core test hangs the system.   This occurs either by running the core test from v7, or running the stress test independantly:

[root@ibm-z10-15 ~]# stress --cpu 12 --io 12 --vm 12 --vm-bytes 128M --timeout 10m
stress: info: [2025] dispatching hogs: 12 cpu, 12 io, 12 vm, 0 hdd

Seen on system: ibm-z10-15.rhts.eng.bos.redhat.com

Version-Release number of selected component (if applicable):

v7 1.2 R16
RHEL6 Snapshot 7

Comment 1 Greg Nichols 2010-08-24 13:52:27 UTC
Also, the stress portion can fail with logs as follows:

<output>
Running ./core.py:
Clock Info: ------------------------------------------
kernel: Switching to clocksource tod

Clock Source per system log: tod
Clock Source in /sys/devices/system/clocksource/clocksource*/current_clocksource: tod

CPU Vendor: IBM/S390
Running clock tests
Testing for clock jitter on 2 cpus
PASSED, largest jitter seen was 0.015548
clock direction test: start time 1282624049, stop time 1282624109, sleeptime 60, delta 0
PASSED
Running stress for 10 min.
stress: FAIL: [3561] (416) <-- worker 3562 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3563 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3564 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3565 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3566 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3567 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3568 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (452) failed run completed in 601s
Error:
"stress --cpu 12 --io 12 --vm 12 --vm-bytes 128M --timeout 10m" has output on stderr
...finished running ./core.py, exit code=1
</output>

Comment 2 Greg Nichols 2010-08-27 17:57:45 UTC
Some investigation reveals stress processes are being OOM killed.    The core test uses 12 "vm hog" processes at 128MB each.   The test probably needs to check if the defaults are going to exaust memory.

Comment 3 Greg Nichols 2010-08-27 18:41:33 UTC
Changing summary.   This is really a scaling issue, in that the above example are for systems that are too small to run the test, but also, large scale systems should probably be stressed more by the test.

It's also not an issue dependant on arch.   For example, it also applies to the size of quests in fv_core.

Comment 5 Greg Nichols 2011-01-21 19:05:45 UTC
Created attachment 474663 [details]
core test patch to scale stress test for free memory

Also, this patch divides the core test into subtests.

Comment 6 Greg Nichols 2011-01-21 19:07:03 UTC
Created attachment 474664 [details]
test.py patch moving the function for getting memory limits to the Test base class

Comment 7 Greg Nichols 2011-01-21 19:08:13 UTC
Created attachment 474665 [details]
memory test patch to call the new method for memory limits

Comment 12 Caspar Zhang 2011-04-30 08:35:21 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
In v7 1.2, the stress portion of the core test hangs the system because of the memory size passed to test argument was too large. This issue is fixed in v7 1.3, now the argument in stress portion is scalable to fit for memory size.

Comment 13 errata-xmlrpc 2011-05-09 16:15:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0497.html