LTC Owner is: sripathi.com LTC Originator is: sripathi.com I was running strace on pthread_cond_many testcase when I saw a number of the following BUG messages in dmesg. I have not yet tested this with default (non-RT) RHEL5. BUG: scheduling with irqs disabled: strace/0x00000000/2011 caller is rt_spin_lock_slowlock+0x102/0x1af Call Trace: [<ffffffff8026d828>] dump_trace+0xbd/0x3d8 [<ffffffff8026db87>] show_trace+0x44/0x6d [<ffffffff8026ddc8>] dump_stack+0x13/0x15 [<ffffffff80264dc6>] schedule+0x87/0x10b [<ffffffff80265b06>] rt_spin_lock_slowlock+0x102/0x1af [<ffffffff802661af>] rt_spin_lock+0x1f/0x21 [<ffffffff8029af0c>] force_sig_info+0x26/0xb5 [<ffffffff8029b018>] force_sig_specific+0x11/0x13 [<ffffffff80298659>] ptrace_attach+0xdf/0x10b [<ffffffff802986d7>] sys_ptrace+0x52/0xb8 [<ffffffff8025f42c>] tracesys+0x151/0x1be [<00000034ecec71c9>] --------------------------- | preempt count: 00000000 ] | 0-level deep critical section nesting: ---------------------------------------- Kernel: 2.6.20-0119.rt8 glibc: glibc-2.5-12 Hardware: LS21 Kernel cmdline: ro root=LABEL=/1 rhgb quiet acpi=noirq Recreation steps: Start pthread_cond_many testcase. From another terminal, attach strace to pthread_cond_many process. Very soon, BUGs appear in dmesg. I think I know the root cause of this problem. I'll post details soon. In ptrace_attach, this is what happens: task_lock local_irq_disable write_lock(tasklist_lock) Using trylocks. Some work __ptrace_link Send SIGSTOP to target thread write_unlock_irq(tasklist_lock) task_unlock local_irq_disable + write_lock will work as write_lock_irq and write_unlock_irq will re-enable interrupts. However, on -rt, write_unlock_irq doesn't do local_irq_enable. But since we have explicitly called local_irq_disable, interrupts remain blocked! To fix the problem, I think we should call write_unlock(tasklist_lock) and local_irq_enable() instead of write_unlock_irq. Also, we should call them BEFORE sending SIGSTOP to the target thread. I think there is no need to hold the tasklist lock during sending of SIGSTOP. For the vanilla kernel too, I think we should do write_unlock_irq(tasklist_lock) before sending SIGSTOP. The following patch solves the problem on 2.6.20-rt8. I want to send this to LKML/Ingo soon. Does anyone have comments? --- linux-2.6.20.x86_64_org/kernel/ptrace.c 2007-04-19 18:19:37.000000000 +0530 +++ linux-2.6.20.x86_64/kernel/ptrace.c 2007-04-19 16:43:32.000000000 +0530 @@ -205,10 +205,16 @@ repeat: __ptrace_link(task, current); + write_unlock(&tasklist_lock); + local_irq_enable(); + force_sig_specific(SIGSTOP, task); + goto out2; bad: - write_unlock_irq(&tasklist_lock); + write_unlock(&tasklist_lock); + local_irq_enable(); +out2: task_unlock(task); out: return retval; What if some other process is reading the task_list at the time you are sending it the stop signal? Will the code in force_sig_specific take care of that by its own locking? Sripathi, thanks for clarifying it offline! I have posted this to LKML/Ingo: http://lkml.org/lkml/2007/04/20/41
----- Additional Comments From sripathi.com (prefers email at sripathik.com) 2007-05-11 10:49 EDT ------- I got no reply from Ingo/anyone else about my earlier mail (Apr 20). Hence I tried to fix it in another way by introducing write_trylock_irqsave API in mainline and -rt. Mainline patches are at http://lkml.org/lkml/2007/05/09/76 and http://lkml.org/lkml/2007/05/09/79 . -rt patches are at http://lkml.org/lkml/2007/05/10/47 and http://lkml.org/lkml/2007/05/10/48. The mainline patches have been accepted into -mm. I am awaiting response for -rt patches.
Unable to reproduce this with 2.6.21-4.el5rtdebug and 2.6.21-3.el5rt. Checked kernel/ptrace.c, it doesn't have your patches. I'm using http://www.kernel.org/pub/linux/kernel/people/dvhart/realtime/tests/tests.tar.bz2 with './run.sh all', wait for the "./pthread_cond_many --broadcast 400 5000" processes to start, ran strace on them, no BUG messages. Machine is a Dell PowerEdge 1950 with to dual core Xeon processors. Will try now with the same kernel as you used (2.6.20-0119.rt8).
Tried with http://people.redhat.com/mingo/realtime-preempt/yum/x86_64/kernel-rt-2.6.20-0119.rt8.x86_64.rpm: [root@mica ~]# uname -r 2.6.20-0119.rt8 And couldn't reproduce with it either. I'm running it now with this patch: [root@mica latency]# diff -u pthread_cond_many.sh.orig pthread_cond_many.sh --- pthread_cond_many.sh.orig 2007-05-15 12:47:11.000000000 -0300 +++ pthread_cond_many.sh 2007-05-15 12:48:27.000000000 -0300 @@ -9,11 +9,11 @@ nproc=5 i=0 -./pthread_cond_many $1 --broadcast $iter $nthread > 2100.$i.out & +strace -f ./pthread_cond_many $1 --broadcast $iter $nthread > 2100.$i.out 2> /dev/null & i=1 while test $i -lt $nproc do - ./pthread_cond_many --broadcast $iter $nthread > 2100.$i.out & + strace -f ./pthread_cond_many --broadcast $iter $nthread > 2100.$i.out 2> /dev/null & i=`expr $i + 1` done wait [root@mica latency]# pwd /home/acme/rt/IBM/rtlinux-tests/perf/latency [root@mica latency]# and running it like this: [root@mica latency]# pwd /home/acme/rt/IBM/rtlinux-tests/perf/latency [root@mica latency]# ./pthread_cond_many.sh --realtime
------- Additional Comments From sripathi.com (prefers email at sripathik.com) 2007-05-16 01:44 EDT ------- (In reply to comment #13) > ----- Additional Comments From acme 2007-05-15 11:11 EST ------- > Unable to reproduce this with 2.6.21-4.el5rtdebug and 2.6.21-3.el5rt. Checked > kernel/ptrace.c, it doesn't have your patches. I'm using > http://www.kernel.org/pub/linux/kernel/people/dvhart/realtime/tests/tests.tar.bz2 > with './run.sh all', wait for the "./pthread_cond_many --broadcast 400 5000" > processes to start, ran strace on them, no BUG messages. Machine is a Dell > PowerEdge 1950 with to dual core Xeon processors. Will try now with the same > kernel as you used (2.6.20-0119.rt8). I tried exactly the same just now and reproduced the problem on 2.6.21-2.el5rt. I pulled down the tests from kernel.org, started the tests by hand using "./pthread_cond_many --broadcast 400 5000" and ran "strace -f -v -o strace.out <pid of first pthread_cond_many process>" and immediately I see a bunch of BUGs in dmesg. My hardware is LS20 blade, but I don't think the problem is hardware dependent.
Tried now with 2.6.21-4.el5rt using exactly the same sequence described in your latest entry in this ticket: got the BUGs. Will now apply your patches to the rt kernel rpm and retest. Strange, the only difference from my test is to run ./pthread_cond_many directly instead of running it thru the shell script, anyway, reproduced, rebuilding the rpm with your patches, thanks.
Did it, the BUGs are over and from my perspective the patches are OK, will talk with Steven Rostedt for a second opinion and the ask Clark to put those patches in our 2.6.21-rt kernel rpms and ask Ingo to consider them for upstream rt-preempt acceptance, thanks!
Patches were merged, at least the 2.6.21-rt6 patch has it, thanks a lot for submitting them! It is already merged in the internal repo for kernel-rt and should be included in the 2.6.21-11.el5rt kernel-rt rpm release.
----- Additional Comments From sripathi.com (prefers email at sripathik.com) 2007-05-24 12:37 EDT ------- I have tested this with 2.6.21-14.el5rt kernel (which I believe contains Ingo's patch-2.6.21-rt7) and the problem is no more seen. strace does not produce any BUGs now. Thanks! -Sripathi.