=Comment: #0================================================= John G. Stultz <jstultz.com> - 2008-02-20 21:06 EDT Problem description: The 2.6.24.1-21 kernel uses IRQ- instead of IRQ_ for the process comm string. This breaks the set_kthread_prio script, and results in the IRQ threads not getting rt priority. I believe the reverse happened in the SR2->SR3 timeframe. We should check w/ Clark/Steven to see what the rational is here.
------- Comment From dvhltc.com 2008-02-25 20:01 EDT------- I checked our SR3 kernel, and it has IRQ- naming, not IRQ_. [root@elm3b207 ~]# ps aux | grep IRQ root 102 0.0 0.0 0 0 ? S< 13:05 0:00 [IRQ-11] root 329 0.0 0.0 0 0 ? S< 13:05 0:00 [IRQ-8] ... [root@elm3b207 ~]# uname -r 2.6.21.4-ibmrt1.23 Checking the set_kthread_prio script, I see that the regex's do indeed search for ^IRQ_. Checking the IRQ threads revealed that NONE of them were running with realtime priority! [root@elm3b207 ~]# ps -eLo rtprio,comm | grep IRQ - IRQ-11 - IRQ-8 - IRQ-12 - IRQ-1 - IRQ-3 - IRQ-19 - IRQ-26 - IRQ-6 - IRQ-25 - IRQ-4 This is all on a fresh R1-SR3.dat deploy.
------- Comment From dvhltc.com 2008-02-25 20:38 EDT------- So from patch-2.6.21.4-rt10 we see: +static int start_irq_thread(int irq, struct irq_desc *desc) +{ + if (desc->thread || !ok_to_create_irq_threads) + return 0; + + desc->thread = kthread_create(do_irqd, desc, "IRQ-%d", irq); and from util/set_kthread_prio we see: } else if (cmd ~ /^IRQ_/ && "IRQ_default" in config) { prio = config["IRQ_default"]; opts = confopts["IRQ_default"]; Note that the SR3-iFix1 kernel uses IRQ- and the SR3-iFix1 set_kthread_prio script uses IRQ_. From this, I can't imagine how SR3 ever had real-time priority hardware interrupt stubs. This is truly incredible given the lack of failures we've seen during testing.
------- Comment From dvhltc.com 2008-02-25 20:48 EDT------- To confirm I didn't botch the code review and that the ABAT deploy wasn't somehow at fault, I did an abat deploy of rhel5.1 on elm3b102 and then installed SR3-iFix1 manually. [root@elm3b102 ~]# uname -r 2.6.21.4-ibmrt1.23 [root@elm3b102 ~]# ps aux | grep IRQ root 166 0.0 0.0 0 0 ? S< 20:41 0:00 [IRQ-9] root 507 0.0 0.0 0 0 ? S< 20:41 0:00 [IRQ-8] ... [root@elm3b102 ~]# ps -eLo rtprio,comm | grep IRQ - IRQ-9 - IRQ-8 - IRQ-12 ... Applying the following patch: 39c39 < } else if (cmd ~ /^IRQ_/ && --- > } else if (cmd ~ /^IRQ[_-]/ && Rerunning the script we have: # ps -eLo rtprio,comm | grep IRQ 95 IRQ-9 95 IRQ-8 ... This fix allows for both the old and new versions of the IRQ naming convention to work. I would like to understand WHY this doesn't affect our testing more. But I think we will have to send an update to the customer immediately, perhaps not as an iFix, perhaps just the 1 line patch above with instructions for how to run it (maybe even a script that fixes the problem with a sed command). Thoughts on delivery? Bumping priority to P1, we need to understand why this isn't showing up as a bigger problem and how we'll get the fix out to customers.
------- Comment From jstultz.com 2008-02-26 18:15 EDT------- This is an IBM issue, and doesn't affect MRG or RH. I'm reject it.
closing on our side.