Description of problem:
If you start up two programs (kmail and konqueror in this case; not
sure if it's specific to these two or not) in quick succession, bash
goes into a loop on a waitpid call using up all spare CPU cycles and
becomes nonresponsive. Only closing one of the programs you started
will bring it back.
Version-Release number of selected component (if applicable):
Always, but very specific steps to reproduce. Feels very much like
some kind of race condition to me.
Steps to Reproduce:
1. Open a new bash shell. In my case I'm opening a new tab in
konsole, haven't tested to see if that element is necessary. I have
not had success reproducing this without this step, but due to the
speed dependency I might just not have been fast enough those times.
2. Start up program #1 in the background. The command I use is
"kmail &>/dev/null &". I'm not sure if it's necessary that kmail is
used here, or if it has to be &, because kmail normally automatically
daemonizes anyway; I just happened to start it with & when I
discovered this because I wasn't thinking.
3. Quickly, before program #1 finishes loading (you will probably
want to do this against a slow disk, and make sure program #1 and #2
aren't in disk cache), start up program #2 in background. In my case
"konqueror &>/dev/null &". If you do this after program #1 has
finished loading, the problem doesn't seem to occur. Also not sure
if konq has to be program #2 for this to happen. Konq does need &
since it doesn't daemonize on its own.
You've got something like
[wes@ip68-110-7-34 ~]$ kmail &>/dev/null &
[wes@ip68-110-7-34 ~]$ konqueror &>/dev/null &
in your shell, but the prompt is unresponsive to typing, and your cpu
is pegged. If you attach strace to the hung bash, you get a call
waitpid(-1, 0xfeffe774, WNOHANG|WUNTRACED) = 0
over and over. If you attach gdb to the hung bash, you get something
like the following call stack:
Sometimes you'll catch it with these two on the top:
#0 0xf6fe9782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0xf6f1ea63 in __waitpid_nocancel () from /lib/tls/libc.so.6
and the following below them numbered +2, other times just the below
#0 0x08076c38 in kill_pid ()
#1 0x08077250 in kill_pid ()
#2 <signal handler called>
#3 0xf6fe9782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#4 0xf6f4be43 in __read_nocancel () from /lib/tls/libc.so.6
#5 0x080b8be6 in rl_getc ()
#6 0x080b8ae2 in rl_read_key ()
#7 0x080aa104 in readline_internal_char ()
#8 0x080aa4b1 in readline ()
#9 0x0805ded1 in yy_input_name ()
#10 0x080f44f0 in ?? ()
#11 0x080873c5 in termination_unwind_protect ()
#12 0x0805fc66 in execute_prompt_command ()
#13 0x08060c64 in execute_prompt_command ()
#14 0x080636a0 in yyparse ()
#15 0x0805d9d4 in parse_command ()
#16 0x0805da79 in read_command ()
#17 0x0805dbe2 in reader_loop ()
#18 0x0805ce1c in main ()
The programs started up load and run normally, however. As soon as
you quit one of them, the hung bash comes back to life.
bash shouldn't hang.
kmail from kdepim-3.3.0-1
konqueror from kdebase-3.3.0-5
which are all current rawhide, as is everything else on the system.
Er, quick addition/correction... bash only comes back when you close
program #2 (konqueror), which suggests that's what the waitpid is on.
<< * Fri Sep 24 2004 Tim Waugh <email@example.com> 3.0-15
- Minor fix for job handling. >>
That seems to fix it, so I'm closing this bug as rawhide.
Thanks. I hadn't got round to testing that it fixes this particular
bug, so it's nice that it does. :-)