Bug 152441 - LTC17185- CAN-2005-0916 AIO panic ppc64
LTC17185- CAN-2005-0916 AIO panic ppc64
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
powerpc Linux
medium Severity medium
: ---
: ---
Assigned To: David Howells
Brian Brock
: Security
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-03-29 10:50 EST by Mark J. Cox (Product Security)
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-08-24 09:31:13 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fix AIO cleanup crash (4.74 KB, patch)
2005-05-18 11:01 EDT, David Howells
no flags Details | Diff

  None (edit)
Description Mark J. Cox (Product Security) 2005-03-29 10:50:58 EST
http://groups-beta.google.com/group/linux.kernel/browse_thread/thread/13b43bd5783842f6/7ce3c5a514a497ab?q=io_queue_init&rnum=3#7ce3c5a514a497ab

"When testing AIO on PPC64 (a power5 machine) running 2.6.11
with CONFIG_HUGETLB_PAGE=y, I ran into a kernel panic when a
process exits that has done AIO (io_queue_init()) but has not
done the io_queue_release().  The exit_aio() code is cleaning
up and panicking when trying to free the aio ring buffer."

See link for reproducer
Patched upstream

fixed=2.6-bk (20050328
http://linux.bkbits.net:8080/linux-2.6/cset@4248c8c0es30_4YVdwa6vteKi7h_nw
Comment 1 Mark J. Cox (Product Security) 2005-03-30 02:59:50 EST
(Note the report is confusing and mentions ia64 which Mitre thinks means that
ia64 is also vulnerable to this issue)
Comment 3 David Howells 2005-05-18 11:01:35 EDT
Created attachment 114511 [details]
Fix AIO cleanup crash

This patch has been modified slightly so that it applies to RHEL4.
Comment 4 David Howells 2005-05-18 11:28:50 EDT
I can't reproduce this crash on my ppc64 machines, at least not using the test 
program in the report. I've given the kernel some huge pages to play with, but 
it's not using them as far as I can tell. I'm not sure how to do so either. 
 
I've passed hugepages=40 on the kernel command line and I can see 27 pages 
in /proc/meminfo. I've tried mounting a hugetlbfs filesystem somewhere and 
running the testprogram whilst cd'd into that but open() fails to create a 
file there; so I'm not sure how to test this. 
 
Anyone any ideas? 
Comment 5 David Howells 2005-07-26 13:32:13 EDT
I've added this to patchtest.49 on: 
 
http://people.redhat.com/~dhowells/.pickup/ibm/squadrons/rhel4.shtml 
 
Can someone at IBM have a poke at the problem, see if they can reproduce it 
and whether this patched kernel fixes it? I've tried, but I can't reproduce it 
for myself. I'm not sure whether I'm doing it right though. 
Comment 6 IBM Bug Proxy 2005-07-26 14:31:41 EDT
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dwgibson@au1.ibm.com




------- Additional Comments From markwiz@us.ibm.com  2005-07-26 14:20 EDT -------
Adding David Gibson to the CC list. 
Comment 7 IBM Bug Proxy 2005-07-26 17:15:59 EDT
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
        Owning Team|LTC                         |LTC Kernel
            Version|RHEL4                       |Other




------- Additional Comments From marksmit@us.ibm.com  2005-07-26 16:52 EDT -------
perhaps someone in the LTC (kernel or screen team) can route this properly.
this occurs on 2.6.11 kernel, which too my knowledge is Fedora Core 4, not 
RHEL4.
changing release to "other" because LTC Bugzilla does not have an FC4 entry. 
Comment 8 IBM Bug Proxy 2005-07-26 20:47:56 EDT
---- Additional Comments From dwgibson@au1.ibm.com(prefers email via dwg@au1.ibm.com)  2005-07-26 20:39 EDT -------
Ok, I'm looking into this.  I think the patch should fix the problem.  Can
someone give me a pointer to the FC4, 2.6.11 kernel RPM to test.

Although, come to think of it, the RHEL4 kernel is probably affected as well. 
Comment 9 IBM Bug Proxy 2005-07-27 01:36:57 EDT
---- Additional Comments From dwgibson@au1.ibm.com(prefers email via dwg@au1.ibm.com)  2005-07-27 01:33 EDT -------
Ok, I tested this on RHEL4 (2.6.9-6.37.EL) and couldn't reproduce the bug.  I'm
guessing the bug was introduced by some other change between 2.6.9 and 2.6.11
(so that tsk->mm is cleared earlier than it used to be).

I'll try 2.6.11, either the FC4 rpm, or just a kernel.org build when I get a
chance.  Unfortunately our POWER5 seems to be on the blink at the moment. 
Comment 10 David Woodhouse 2005-08-12 12:20:14 EDT
The FC4 kernel is currently 2.6.12-1.1398 and is available from
http://download.fedora.redhat.com/pub/fedora/linux/core/updates/4/ppc/

Since this bug isn't observed in RHEL4, I don't think you need anything from us?
Comment 11 IBM Bug Proxy 2005-08-16 20:12:04 EDT
---- Additional Comments From dwgibson@au1.ibm.com(prefers email via dwg@au1.ibm.com)  2005-08-16 20:08 EDT -------
Um.. ok.  In which case, what do you want from us?  As far as I knew, this bug
was here because RH wanted us to test something, now I'm entirely unclear as to
what.. 
Comment 12 IBM Bug Proxy 2005-08-23 15:53:19 EDT
---- Additional Comments From mjwolf@us.ibm.com  2005-08-23 15:45 EDT -------
I may be wrong, but I think RedHat was requesting help to either prove or
disprove  that the problem would be seen in RHEL4. 
Comment 13 IBM Bug Proxy 2005-08-23 16:34:26 EDT
---- Additional Comments From rosalesa@us.ibm.com(prefers email via rosalesa@austin.ibm.com)  2005-08-23 16:17 EDT -------
----Summary----
RedHat was seeking assistance on reproducing this problem with a 2.6.11 fc4
kernel on a ppc64 platform.

RedHat was asking for assistance on trying to reproduce the problem:
-Quoting Mark J. Cox (RedHat)-
>http://groups-beta.google.com/group/linux.kernel/browse_thread/thread/13b43bd5783842f6/7ce3c5a514a497ab?q=io_queue_init&rnum=3#7ce3c5a514a497ab
>"When testing AIO on PPC64 (a power5 machine) running 2.6.11
>with CONFIG_HUGETLB_PAGE=y, I ran into a kernel panic when a
>process exits that has done AIO (io_queue_init()) but has not
>done the io_queue_release().  The exit_aio() code is cleaning
>up and panicking when trying to free the aio ring buffer."

with a 2.6.11 kernel, RedHat's FedoraCore4 kernel, on a ppc64 platform,
-Quoting David Howells (RedHat)-
>I can't reproduce this crash on my ppc64 machines, at least not using the test 
>program in the report. I've given the kernel some huge pages to play with, but 
>it's not using them as far as I can tell. I'm not sure how to do so either. 
>I've passed hugepages=40 on the kernel command line and I can see 27 pages 
>in /proc/meminfo. I've tried mounting a hugetlbfs filesystem somewhere and 
>running the testprogram whilst cd'd into that but open() fails to create a 
>file there; so I'm not sure how to test this. 
>Anyone any ideas?

David Howells gave the following link to the proposed fix, but first needed
assistance trying to recreate the problem on a 2.6.11 kernel:
http://people.redhat.com/~dhowells/.pickup/ibm/squadrons/rhel4.shtml 

David Gibson (ibm) was working on recrating this problem on a fc4 once a pointer
to fc4 was given, as the problem was not seen on RHEL4. RedHat provided the
following link to fc4:
http://download.fedora.redhat.com/pub/fedora/linux/core/updates/4/ppc/

Some of the confustion may have occured after the following RedHat update:
>---------dwmw2@redhat.com  2005-08-12 12:20 EST 
>The FC4 kernel is currently 2.6.12-1.1398 and is available from
>http://download.fedora.redhat.com/pub/fedora/linux/core/updates/4/ppc/
>Since this bug isn't observed in RHEL4, I don't think you need anything from >us?

Thus the question going forward is: RedHat would you like IBM to continue to
assit in checking if this problem can be repoduced on a ppc64 platform using the
fc4 2.6.11 kernel, and checking the fix given by David Howells (RedHat)? Maybe,
David Howells (RedHat) can confirm if assistance is still needed by IBM?

-Thanks 
Comment 14 IBM Bug Proxy 2005-08-24 00:22:14 EDT
---- Additional Comments From dwgibson@au1.ibm.com(prefers email via dwg@au1.ibm.com)  2005-08-24 00:11 EDT -------
I see.  I was rather confused by the fact that the FC4 kernel RPM includes the
fix for this problem already, in the patch patch-2.6.12-rc5-neutered.bz2.

I have tested the repro case on the FC4 kernel build with pSeries_defconfig and
could not get a crash.  Which is unsurprising given that that kernel includes
the fix.  That was running the FC4 kernel with a Debian root filesystem, rather
than a Red Hat one, though (I doubt that would make a difference, but just in
case..).

I have now spent quite a few hours trying and failing to build the FC4 source
rpm into a ppc64 binary kernel rpm to test on a RHEL root filesystem.  rpm
--rebuild simply builds a ppc32 rpm, rpmbuild with the --target option refuses
to do anything on my Debian system (because it attempts to access the local rpm
database) and I don't have a Red Hat system set up for builds convenient to hand. 
Comment 15 David Howells 2005-08-24 09:28:16 EDT
Thanks for your help. I think you've confirmed what I've found: that the 
problem probably doesn't occur in RHEL-4. I'll discuss it here to see whether 
we want to include the patch anyway. 
Comment 16 IBM Bug Proxy 2005-08-24 13:07:52 EDT
---- Additional Comments From rosalesa@us.ibm.com(prefers email via rosalesa@austin.ibm.com)  2005-08-24 13:01 EDT -------
David Howells (RedHat),
Please keep us updated on wheather the patch will be included in RHEL4 and in
what release, and also when you would like this bug on the IBM side to be closed.
-Thanks. 
Comment 17 IBM Bug Proxy 2005-09-08 16:44:33 EDT
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |REJECTED
         Resolution|                            |NOTABUG




------- Additional Comments From rosalesa@us.ibm.com(prefers email via rosalesa@austin.ibm.com)  2005-09-08 16:43 EDT -------
I will go ahead and close this bug to match RedHat's Bugzilla status, as it does
not appear that RedHat needs anymore assistance from IBM on this bug (which was
the reason this bug was initially opened). Since this problem does occur in
RHEL4 I will mark this bug as "NOTABUG."
-Thanks 

Note You need to log in before you can comment on or make changes to this bug.