From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021224 Description of problem: Processes that make extensive use of pipes end up being unkillable. They do not die. You can not 'strace -p' or any other sort of attach to the process. They continue to consume CPU time, and refuse to quit running. After you 'kill -9' the process and wait around 6 hours, it usually exits. (Not always.) Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: Easiest way I've found is to use dvdrip. 1. run 'dvdrip' and tell it to encode a DVD. 2. this runs multiple transcode pipes 3. This process never finishes. The pipes from transcode never exit. Actual Results: Reboot usually. Expected Results: Transcode processes should exit after the EOF is read from the pipes. Additional info: Reproducable 100% using any of the following kernels: kernel-2.4.18-17.8.0, kernel-2.4.18-18.8.0, kernel-2.4.20-2.2. Bug has been confirmed by several others, and is not restricted to my specific hardware or to transcode. (It just happens that transcode is the easiest way for me to replicate this bug).
I have similar things happening to me, however, I am not convinced it is pipe related, but rather mmap related. As it happens to me when running transcode by itself, rpm, pine and gimp, and the common thing amongst them all, is mmap. It always happens while doing heavy file IO. When the process is "unkillable", it's in state "schedule" in the kernel. Sometimes, after extended periods of time, the process will actually die, but this doesn't always happen. Mostly have to reboot to clear. Unfortunately, I cannot reproduce this in a testable fasion, but I am trying.
Ok, I am now able to reliably hang gimp under 2.4.18-18.8.0. I can gimp this large photo, and do several Mirror flips in succession, usually after the 4th of 5th, it hangs (unkillable). But, I just went to 2.5.18-19.8.0, and I cannot reproduce this hang any more. It may not be that it's fixed, but the race-condition has moved elsewhere. I will keep thrashing and see what I can find. Running Athlon-600, UDMA66 drives, using Athlon optimised kernel. Could this perhaps be related to the old athlon memcpy optimisation issue with Via boards?
For reference, my system is Athlon-1200 using rhat's athlon kernel, nVidia Corporation NV25 [GeForce4 Ti4200] (rev a3) with NVIDIA driver -4191, and ATA100 drive. [VIA Technologies, Inc. VT82C586B PIPC Bus Master IDE (rev 06)] I have tried other NVIDIA revs as well: 2880, 2960.
Retried with 2.4.20-virgin, no pipe/mmap problems at all. I still loaded the NVIDIA_kernel module (4191) under this kernel, so that rules out a simple "Nvidia caused it" issue, as well as any hardware specific bugs with my setup. Actual dvdrip (transcode) log (for reference) was: Sun Dec 29 08:50:35 2002 Starting job (1): Transcoding video - title #1, pass 1 Sun Dec 29 08:50:35 2002 Executing command: mkdir -m 0775 -p '/home/dj/tmp/Matrix/tmp' && cd /home/dj/tmp/Matrix/tmp && transcode -a 0 -x vob,null -i /home/dj/tmp/Matrix/vob/001 -w 1951,250,100 -b 192,0,0 -s 1.412 -V -C 1 -I 1 -f 24,1 -g 720x480 -M 2 -j 62,8,62,6 -Z 752x320 -R 1 -y divx4,null --psu_mode --nav_seek /home/dj/tmp/Matrix/tmp/Matrix-001-nav.log --no_split -o /dev/null && echo DVDRIP_SUCCESS (PID=1528) Sun Dec 29 13:46:13 2002 Successfully finished job (1): Transcoding video - title #1, pass 1 Sun Dec 29 13:46:13 2002 Starting job (2): Transcoding video - title #1, pass 2 Sun Dec 29 13:46:13 2002 Executing command: mkdir -m 0775 -p '/home/dj/tmp/Matrix/tmp' && cd /home/dj/tmp/Matrix/tmp && transcode -a 0 -x vob -i /home/dj/tmp/Matrix/vob/001 -w 1951,250,100 -b 192,0,0 -s 1.412 -V -C 1 -I 1 -f 24,1 -g 720x480 -M 2 -j 62,8,62,6 -Z 752x320 -R 2 -y divx4 -E 48000 --psu_mode --nav_seek /home/dj/tmp/Matrix/tmp/Matrix-001-nav.log --no_split -o /home/dj/tmp/Matrix/avi/001/Matrix-001.avi && echo DVDRIP_SUCCESS (PID=5516) Sun Dec 29 18:56:52 2002 Successfully finished job (2): Transcoding video - title #1, pass 2 Please let me know if you'd like any more details.
It doesnt rule out an nvidia module interaction with mmap. It could be a kernel bug but until its reproduced with the RH kernel without the nvidia module its not that interesting. Alan
I've reproduced this as well, doing a 180deg rotation on a very large image in gimp. I'm running a virgin 2.4.18-3 rhat kernel with no binary modules loaded (lsmod |head -1 reports "Not tainted"). The machine is an IBM T23 laptop with a pIII.
After many, many itterations and crashes, I've narrowed transcodes issue down to the "Use PSU core" option, which is default ON in dvdrip. If you disable this, the kernel has no issue. Left enabled, the process becomes unkillable. This testing has been done without the NVIDIA kernel modules loaded. It has also been tested on a non-redhat-kernel, where it runs fine. (virgin 2.4.20 was used in testing as well.) For the past week, my kernel has been untainted so that I could properly run these tests.
(the other mmap related reports sound like stuff fixed in errata already)
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/