Bug 433956 - [MRG] 2.6.24.1-21 kernel uses
[MRG] 2.6.24.1-21 kernel uses
Status: CLOSED NOTABUG
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
beta
x86_64 All
low Severity medium
: ---
: ---
Assigned To: Red Hat Real Time Maintenance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-22 07:08 EST by IBM Bug Proxy
Modified: 2008-03-11 10:29 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-03-11 10:29:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 42562 None None None Never

  None (edit)
Description IBM Bug Proxy 2008-02-22 07:08:16 EST
=Comment: #0=================================================
John G. Stultz <jstultz@us.ibm.com> - 2008-02-20 21:06 EDT
Problem description:

The 2.6.24.1-21 kernel uses IRQ- instead of IRQ_ for the process comm string.
This breaks the set_kthread_prio script, and results in the IRQ threads not
getting rt priority.

I believe the reverse happened in the SR2->SR3 timeframe. We should check w/
Clark/Steven to see what the rational is here.
Comment 1 IBM Bug Proxy 2008-02-25 20:08:25 EST
------- Comment From dvhltc@us.ibm.com 2008-02-25 20:01 EDT-------
I checked our SR3 kernel, and it has IRQ- naming, not IRQ_.

[root@elm3b207 ~]# ps aux | grep IRQ
root       102  0.0  0.0      0     0 ?        S<   13:05   0:00 [IRQ-11]
root       329  0.0  0.0      0     0 ?        S<   13:05   0:00 [IRQ-8]
...

[root@elm3b207 ~]# uname -r
2.6.21.4-ibmrt1.23

Checking the set_kthread_prio script, I see that the regex's do indeed search
for ^IRQ_.  Checking the IRQ threads revealed that NONE of them were running
with realtime priority!

[root@elm3b207 ~]# ps -eLo rtprio,comm | grep IRQ
- IRQ-11
- IRQ-8
- IRQ-12
- IRQ-1
- IRQ-3
- IRQ-19
- IRQ-26
- IRQ-6
- IRQ-25
- IRQ-4

This is all on a fresh R1-SR3.dat deploy.
Comment 2 IBM Bug Proxy 2008-02-25 20:40:25 EST
------- Comment From dvhltc@us.ibm.com 2008-02-25 20:38 EDT-------
So from patch-2.6.21.4-rt10 we see:

+static int start_irq_thread(int irq, struct irq_desc *desc)
+{
+       if (desc->thread || !ok_to_create_irq_threads)
+               return 0;
+
+       desc->thread = kthread_create(do_irqd, desc, "IRQ-%d", irq);

and from util/set_kthread_prio we see:

} else if (cmd ~ /^IRQ_/ &&
"IRQ_default" in config) {
prio = config["IRQ_default"];
opts = confopts["IRQ_default"];

Note that the SR3-iFix1 kernel uses IRQ- and the SR3-iFix1 set_kthread_prio
script uses IRQ_.  From this, I can't imagine how SR3 ever had real-time
priority hardware interrupt stubs.  This is truly incredible given the lack of
failures we've seen during testing.
Comment 3 IBM Bug Proxy 2008-02-25 20:56:25 EST
------- Comment From dvhltc@us.ibm.com 2008-02-25 20:48 EDT-------
To confirm I didn't botch the code review and that the ABAT deploy wasn't
somehow at fault, I did an abat deploy of rhel5.1 on elm3b102 and then installed
SR3-iFix1 manually.

[root@elm3b102 ~]# uname -r
2.6.21.4-ibmrt1.23

[root@elm3b102 ~]# ps aux | grep IRQ
root       166  0.0  0.0      0     0 ?        S<   20:41   0:00 [IRQ-9]
root       507  0.0  0.0      0     0 ?        S<   20:41   0:00 [IRQ-8]
...

[root@elm3b102 ~]# ps -eLo rtprio,comm | grep IRQ
- IRQ-9
- IRQ-8
- IRQ-12
...

Applying the following patch:
39c39
<                       } else if (cmd ~ /^IRQ_/ &&
---
>                       } else if (cmd ~ /^IRQ[_-]/ &&

Rerunning the script we have:
# ps -eLo rtprio,comm | grep IRQ
95 IRQ-9
95 IRQ-8
...

This fix allows for both the old and new versions of the IRQ naming convention
to work.  I would like to understand WHY this doesn't affect our testing more.
But I think we will have to send an update to the customer immediately, perhaps
not as an iFix, perhaps just the 1 line patch above with instructions for how to
run it (maybe even a script that fixes the problem with a sed command).
Thoughts on delivery?

Bumping priority to P1, we need to understand why this isn't showing up as a
bigger problem and how we'll get the fix out to customers.
Comment 4 IBM Bug Proxy 2008-02-26 18:16:33 EST
------- Comment From jstultz@us.ibm.com 2008-02-26 18:15 EDT-------
This is an IBM issue, and doesn't affect MRG or RH. I'm reject it.
Comment 5 Clark Williams 2008-03-11 10:29:44 EDT
closing on our side.

Note You need to log in before you can comment on or make changes to this bug.