Description of problem: Since the -90.el5 kernel I have on multiple occations seen s390 systems get into a state where thousands of crashme processes are created and will not terminate. The processes persist and eventually the guest runs out of process descriptors and becomes unusable. Version-Release number of selected component (if applicable): Any 5.2 kernel since -90 How reproducible: Very frequent, to 100% apperantly depending on the random seed used. Steps to Reproduce: 1.Install 5.2 w/ -90 or -91 kernel 2.install stress-kernel from rhts, run 'make build' to build and install the test. The ctcs test will probably work too. 3. Add a non-privilaged user to the system i.e. crashme 4. as the user crashme, run: /opt/ctcs/bin/crashme 8192 -1103949072 100 0:0:10 9 4. Watch as 1000's of processes are created and command output will look similar to: Subprocess 1: Barfed Subprocess 1: try 1 Subprocess 1: Got signal 4 illegal instruction Subprocess 1: Barfed Subprocess 1: try 2 Subprocess 1: Got signal 4 illegal instruction Subprocess 1: Barfed Subprocess 1: try 3 Subprocess 1: Got signal 14 alarm clock Subprocess 1: Got signal 14 alarm clock Subprocess 1: Got signal 14 alarm clock Subprocess 1: Got signal 14 alarm clock Subprocess 1: Got signal 14 alarm clock Subprocess 1: Barfed Actual results: thousands of crashme's created, unable to kill any of them Expected results: command completes in approximate 10 seconds as it does on x86_64, system recovers completely Additional info: First observed instance of the problem.. the test was able to mostly finish, but there were a few processes lying around: http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=20823&type=Single In other instances that occured, I wasn't able to get any useful output out of the system.
I've sucessfully reproduced this problem with both -89 and -87. [crashme@z213 ctcs]$ runin/bin/crashme 8192 -1103949072 100 0:0:10 2 Crashme: (c) Copyright 1990-1994 George J. Carrette Version: 2.5 20-APR-2005 crashme 8192 -1103949072 100 0:0:10 2 Subprocess run for 10 seconds (0 00:00:10) Time limit reached after run 1 Test complete, total real time: 37 seconds (0 00:00:37) exit status ... number of cases 9 ... 1 it is looking like the crashme process can't kill its own child processes, increasing verbosity (last arg in the command) seems to indicate that it can't kill the first child process and kill -9 won't kill any of them either. correction to reproducer instructions: The correct path to the crashme command is: /opt/ctcs/runin/bin
Spoken to Jan. This is not a security bug. We should open up this bug. I ran crashme on kernel 2.6.18-92.1.10.el5 (s390x), and I can run or kill crashme easily. I did not observe the problem as described in comment #1. If this test is skipped in our kernel testing because of this bug, it should be re-enabled. Thanks.
(In reply to comment #11) > Spoken to Jan. This is not a security bug. We should open up this bug. > > I ran crashme on kernel 2.6.18-92.1.10.el5 (s390x), and I can run or kill > crashme easily. I did not observe the problem as described in comment #1. Correction. Not comment #1 but description.