Bug 116041 - Asynch writes with default v6.06.00b11 QLogic RHEL 3.0 driver causes system panic.
Summary: Asynch writes with default v6.06.00b11 QLogic RHEL 3.0 driver causes system p...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jeff Moyer
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-02-17 19:43 UTC by Heather Conway
Modified: 2007-11-30 22:07 UTC (History)
5 users (show)

Fixed In Version: v2.4.21-15.EL
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-07-23 15:27:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
AIO test sources (20.00 KB, text/plain)
2004-04-09 14:14 UTC, Heather Conway
no flags Details

Description Heather Conway 2004-02-17 19:43:34 UTC
Description of problem:
When I run asynchronous writes to CLARiiON devices on a RHEL 3.0 
system the machine goes into panic immediately.
The problem seems to happen only on RHEL 3.0 systems, both with and 
without PowerPath.
The system was running RHEL 3.0 with 2 QLA2200 HBAs with the default 
v6.06.b011 and the I/O is being generated with a program that makes 
use of LIBAIO library.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Rik van Riel 2004-02-17 19:48:05 UTC
Could you please provide us with more specific info on what exactly
the panic in question is ?

Comment 2 Heather Conway 2004-02-21 18:27:06 UTC
Issue produced with v6.07.00 driver as well.  
Panic with v6.06.00b11 below:

@lcla233 a_io]# ./a_io.linux -w 2 16 65536
Opening /dev/raw/raw2; fd = 3
Opening /dev/raw/raw3; fd = 4
Opening /dev/raw/raw4; fd = 5
Opening /dev/raw/raw5; fd = 6
Opening /dev/raw/raw6; fd = 7
Opening /dev/raw/raw7; fd = 8
Opening /dev/raw/raw8; fd = 9
Opening /dev/raw/raw9; invalid operand: 0000
parport_pc lp parport autofs nfs lockd sunrpc e100 floppy emcppn 
emcpmpc emcpmp sg emcp qla2200_conf qla2200 loop lvm-mod keybdev 
mousedev hid input usb-uhci
CPU: 0
EIP: 0060:[<02155a7e>] Tainted: P
EFLAGS: 00010202

EIP is at __free_pages_ok [kernel] 0x3ee (2.4.21-9.ELhugemem/i686)
eax: 00000001 ebx: 03186554 ecx: 00000000 edx: 00000000
esi: 1ddd1980 edi: 1ddd1a48 ebp: 00000000 esp: 10d19d14
ds: 0068 es: 0068 ss: 0068
Process a_io.linux (pid: 6826, stackpage=10d19000)
Stack: 00000400 00000800 1fb239ac 1fa5ca80 00000000 00000040 00000040 
1fa5cb80
10d19d88 00008000 00220740 00000040 021cdf7a 1ddd1988 1ddd1980 
1ddd1a48
00000000 02141ae7 1ddd1980 0f342f80 00010000 00000001 02189661 
00000216
Call Trace: [<021cdf7a>] generic_make_request [kernel] 0xea 
(0x10d19d44)
[<02141ae7>] unmap_kvec [kernel] 0x47 (0x10d19d58)
[<02189661>] generic_aio_complete_rw [kernel] 0x31 (0x10d19d6c)
[<02189747>] generic_aio_complete_write [kernel] 0x27 (0x10d19d84)
[<021671d1>] end_buffer_io_kiobuf_async [kernel] 0x91 (0x10d19d98)
[<22816607>] __scsi_end_request [scsi_mod] 0x127 (0x10d19db8)
[<22816910>] scsi_io_completion_Rsmp_d4f2f11b [scsi_mod] 0x180 
(0x10d19ddc)
[<2282dd5b>] rw_intr [sd_mod] 0x7b (0x10d19e30)
[<22a01b48>] emcpFreeAsyncPirp [emcp] 0x28 (0x10d19e50)
[<22a01d01>] PowerPlatformTopIodone [emcp] 0x1a1 (0x10d19e70)
[<229f8afe>] PowerIodone [emcp] 0x4e (0x10d19ea0)
[<22af2409>] PnIodoneCommon [emcppn] 0xd9 (0x10d19ec0)
[<22af2461>] PnIodone [emcppn] 0x11 (0x10d19ef0)
[<229f8afe>] PowerIodone [emcp] 0x4e (0x10d19f10)
[<2280e46f>] scsi_finish_command [scsi_mod] 0x9f (0x10d19f30)
[<2280e238>] scsi_softirq_handler [scsi_mod] 0x138 (0x10d19f54)
[<0212f072>] tasklet_action [kernel] 0x62 (0x10d19f64)
[<0212eed5>] do_softirq [kernel] 0xd5 (0x10d19f7c)
[<0210e146>] do_IRQ [kernel] 0x146 (0x10d19f98)
[<0210e000>] do_IRQ [kernel] 0x0 (0x10d19fbc)

Code: Bad EIP value.

Kernel panic: Fatal exception
In interrupt handler - not syncing

2/18/04 3:31:42 PM Ismail Moumni:
please use this stack trace instead as there are no powerpath 
functions in this one:

ot@lcla233 a_io]# ./a_io.linux -w 3 16 65536
Opening /dev/raw/raw3; fd = 3
Opening /dev/raw/raw4; fd = 4
Opening /dev/raw/raw5; fd = 5
Opening /dev/raw/raw6; fd = 6
Openinvalid operand: 0000
qla2200 parport_pc lp parport autofs nfs lockd sunrpc e100 floppy 
microcode loop lvm-mod keybdev mousedev input hid usb-uhci usbcore 
ext3 jbd aic7xxx sd_mod s
CPU: 0
EIP: 0060:[<021562bf>] Not tainted
EFLAGS: 00010202

EIP is at __free_pages_ok [kernel] 0x3df (2.4.21-9.ELcustom/i686)
eax: 00000001 ebx: 035e5a64 ecx: 00000000 edx: 00000000
esi: 1cef2e80 edi: 1cef2f48 ebp: 1d2efdec esp: 1d2efda4
ds: 0068 es: 0068 ss: 0068
Process a_io.linux (pid: 2916, stackpage=1d2ef000)
Stack: 00000000 00000000 021d4650 1b2b816c 00000000 00000040 1d2efdcc 
0214fd98
04636afc 000001f0 1d2efde8 0217ed95 04636afc 000001f0 00000000 
1cef2e88
1cef2e80 1cef2f48 1d2efdf8 02156beb 1b2b84cc 1d2efe10 0214231a 
00000000
Call Trace: [<0210d299>] show_stack [kernel] 0x79
[<0210d439>] show_registers [kernel] 0x169
[<0210d663>] die [kernel] 0x63
[<0210e3c4>] do_trap [kernel] 0xb4
[<0210d84d>] do_invalid_op [kernel] 0x5d

Code: Bad EIP value.

Kernel panic: Fatal exception
In interrupt handler - not syncing


Entering kdb (current=0x1d2ee000, pid 2916) on processor 0 due to 
KDB_ENTER()
[0]kdb>
 
 


Comment 3 Jeff Moyer 2004-02-23 15:36:33 UTC
This is probably a generic AIO bug.  Heather, can you attach the
sources of your test?  I'd like to know under what condition you
trigger this BUG().  It may be that this is a duplicate of bz #113213.

Comment 4 Heather Conway 2004-04-09 14:14:25 UTC
Created attachment 99272 [details]
AIO test sources

Comment 5 Jeff Moyer 2004-04-12 14:10:24 UTC
Hi, Heather,

Looking over your stack trace again, it looks like this bug was indeed
fixed in our latest U2 candidate.  See bug 113213.

Give the U2 candidate kernel a try, and let me know if that works for you.

Thanks!

Comment 6 Heather Conway 2004-04-12 16:46:00 UTC
Will do - thank you for the update.

Comment 7 Jeff Moyer 2004-05-11 21:16:05 UTC
Did you have a chance to ensure this was fixed in your environment?

Thanks.


Comment 8 Heather Conway 2004-07-23 15:27:10 UTC
We haven't been able to replicate this problem with RHEL 3.0 U2 so 
I'm closing the Bugzilla.
Thank for your help.
Heather


Note You need to log in before you can comment on or make changes to this bug.