623787 – core test - stress portion should scale for memory size

Bug 623787 - core test - stress portion should scale for memory size

Summary: core test - stress portion should scale for memory size

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Hardware Certification Program
Classification:	Retired
Component:	Test Suite (tests)
Sub Component:
Version:	1.2
Hardware:	s390x
OS:	Linux
Priority:	low
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Greg Nichols
QA Contact:	Guangze Bai
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-08-12 19:00 UTC by Greg Nichols
Modified:	2015-02-08 21:36 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2011-05-09 16:15:35 UTC
Embargoed:

Attachments	(Terms of Use)
core test patch to scale stress test for free memory (3.19 KB, patch) 2011-01-21 19:05 UTC, Greg Nichols	no flags	Details \| Diff
test.py patch moving the function for getting memory limits to the Test base class (2.27 KB, patch) 2011-01-21 19:07 UTC, Greg Nichols	no flags	Details \| Diff
memory test patch to call the new method for memory limits (2.69 KB, patch) 2011-01-21 19:08 UTC, Greg Nichols	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0497	0	normal	SHIPPED_LIVE	v7 bug fix and enhancement update	2011-05-09 16:11:16 UTC

Description Greg Nichols 2010-08-12 19:00:06 UTC

Description of problem:

The stress portion of the core test hangs the system.   This occurs either by running the core test from v7, or running the stress test independantly:

[root@ibm-z10-15 ~]# stress --cpu 12 --io 12 --vm 12 --vm-bytes 128M --timeout 10m
stress: info: [2025] dispatching hogs: 12 cpu, 12 io, 12 vm, 0 hdd

Seen on system: ibm-z10-15.rhts.eng.bos.redhat.com

Version-Release number of selected component (if applicable):

v7 1.2 R16
RHEL6 Snapshot 7

Comment 1 Greg Nichols 2010-08-24 13:52:27 UTC

Also, the stress portion can fail with logs as follows:

<output>
Running ./core.py:
Clock Info: ------------------------------------------
kernel: Switching to clocksource tod

Clock Source per system log: tod
Clock Source in /sys/devices/system/clocksource/clocksource*/current_clocksource: tod

CPU Vendor: IBM/S390
Running clock tests
Testing for clock jitter on 2 cpus
PASSED, largest jitter seen was 0.015548
clock direction test: start time 1282624049, stop time 1282624109, sleeptime 60, delta 0
PASSED
Running stress for 10 min.
stress: FAIL: [3561] (416) <-- worker 3562 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3563 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3564 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3565 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3566 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3567 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (416) <-- worker 3568 got signal 9
stress: WARN: [3561] (418) now reaping child worker processes
stress: FAIL: [3561] (422) kill error: No such process
stress: FAIL: [3561] (452) failed run completed in 601s
Error:
"stress --cpu 12 --io 12 --vm 12 --vm-bytes 128M --timeout 10m" has output on stderr
...finished running ./core.py, exit code=1
</output>

Comment 2 Greg Nichols 2010-08-27 17:57:45 UTC

Some investigation reveals stress processes are being OOM killed.    The core test uses 12 "vm hog" processes at 128MB each.   The test probably needs to check if the defaults are going to exaust memory.

Comment 3 Greg Nichols 2010-08-27 18:41:33 UTC

Changing summary.   This is really a scaling issue, in that the above example are for systems that are too small to run the test, but also, large scale systems should probably be stressed more by the test.

It's also not an issue dependant on arch.   For example, it also applies to the size of quests in fv_core.

Comment 5 Greg Nichols 2011-01-21 19:05:45 UTC

Created attachment 474663 [details]
core test patch to scale stress test for free memory

Also, this patch divides the core test into subtests.

Comment 6 Greg Nichols 2011-01-21 19:07:03 UTC

Created attachment 474664 [details]
test.py patch moving the function for getting memory limits to the Test base class

Comment 7 Greg Nichols 2011-01-21 19:08:13 UTC

Created attachment 474665 [details]
memory test patch to call the new method for memory limits

Comment 12 Caspar Zhang 2011-04-30 08:35:21 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
In v7 1.2, the stress portion of the core test hangs the system because of the memory size passed to test argument was too large. This issue is fixed in v7 1.3, now the argument in stress portion is scalable to fit for memory size.

Comment 13 errata-xmlrpc 2011-05-09 16:15:35 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0497.html

Note You need to log in before you can comment on or make changes to this bug.