Description of problem: The stress portion of the core test hangs the system. This occurs either by running the core test from v7, or running the stress test independantly: [root@ibm-z10-15 ~]# stress --cpu 12 --io 12 --vm 12 --vm-bytes 128M --timeout 10m stress: info: [2025] dispatching hogs: 12 cpu, 12 io, 12 vm, 0 hdd Seen on system: ibm-z10-15.rhts.eng.bos.redhat.com Version-Release number of selected component (if applicable): v7 1.2 R16 RHEL6 Snapshot 7
Also, the stress portion can fail with logs as follows: <output> Running ./core.py: Clock Info: ------------------------------------------ kernel: Switching to clocksource tod Clock Source per system log: tod Clock Source in /sys/devices/system/clocksource/clocksource*/current_clocksource: tod CPU Vendor: IBM/S390 Running clock tests Testing for clock jitter on 2 cpus PASSED, largest jitter seen was 0.015548 clock direction test: start time 1282624049, stop time 1282624109, sleeptime 60, delta 0 PASSED Running stress for 10 min. stress: FAIL: [3561] (416) <-- worker 3562 got signal 9 stress: WARN: [3561] (418) now reaping child worker processes stress: FAIL: [3561] (422) kill error: No such process stress: FAIL: [3561] (416) <-- worker 3563 got signal 9 stress: WARN: [3561] (418) now reaping child worker processes stress: FAIL: [3561] (422) kill error: No such process stress: FAIL: [3561] (416) <-- worker 3564 got signal 9 stress: WARN: [3561] (418) now reaping child worker processes stress: FAIL: [3561] (422) kill error: No such process stress: FAIL: [3561] (416) <-- worker 3565 got signal 9 stress: WARN: [3561] (418) now reaping child worker processes stress: FAIL: [3561] (422) kill error: No such process stress: FAIL: [3561] (416) <-- worker 3566 got signal 9 stress: WARN: [3561] (418) now reaping child worker processes stress: FAIL: [3561] (422) kill error: No such process stress: FAIL: [3561] (416) <-- worker 3567 got signal 9 stress: WARN: [3561] (418) now reaping child worker processes stress: FAIL: [3561] (422) kill error: No such process stress: FAIL: [3561] (416) <-- worker 3568 got signal 9 stress: WARN: [3561] (418) now reaping child worker processes stress: FAIL: [3561] (422) kill error: No such process stress: FAIL: [3561] (452) failed run completed in 601s Error: "stress --cpu 12 --io 12 --vm 12 --vm-bytes 128M --timeout 10m" has output on stderr ...finished running ./core.py, exit code=1 </output>
Some investigation reveals stress processes are being OOM killed. The core test uses 12 "vm hog" processes at 128MB each. The test probably needs to check if the defaults are going to exaust memory.
Changing summary. This is really a scaling issue, in that the above example are for systems that are too small to run the test, but also, large scale systems should probably be stressed more by the test. It's also not an issue dependant on arch. For example, it also applies to the size of quests in fv_core.
Created attachment 474663 [details] core test patch to scale stress test for free memory Also, this patch divides the core test into subtests.
Created attachment 474664 [details] test.py patch moving the function for getting memory limits to the Test base class
Created attachment 474665 [details] memory test patch to call the new method for memory limits
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: In v7 1.2, the stress portion of the core test hangs the system because of the memory size passed to test argument was too large. This issue is fixed in v7 1.3, now the argument in stress portion is scalable to fit for memory size.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0497.html