Description of problem: When I run asynchronous writes to CLARiiON devices on a RHEL 3.0 system the machine goes into panic immediately. The problem seems to happen only on RHEL 3.0 systems, both with and without PowerPath. The system was running RHEL 3.0 with 2 QLA2200 HBAs with the default v6.06.b011 and the I/O is being generated with a program that makes use of LIBAIO library. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Could you please provide us with more specific info on what exactly the panic in question is ?
Issue produced with v6.07.00 driver as well. Panic with v6.06.00b11 below: @lcla233 a_io]# ./a_io.linux -w 2 16 65536 Opening /dev/raw/raw2; fd = 3 Opening /dev/raw/raw3; fd = 4 Opening /dev/raw/raw4; fd = 5 Opening /dev/raw/raw5; fd = 6 Opening /dev/raw/raw6; fd = 7 Opening /dev/raw/raw7; fd = 8 Opening /dev/raw/raw8; fd = 9 Opening /dev/raw/raw9; invalid operand: 0000 parport_pc lp parport autofs nfs lockd sunrpc e100 floppy emcppn emcpmpc emcpmp sg emcp qla2200_conf qla2200 loop lvm-mod keybdev mousedev hid input usb-uhci CPU: 0 EIP: 0060:[<02155a7e>] Tainted: P EFLAGS: 00010202 EIP is at __free_pages_ok [kernel] 0x3ee (2.4.21-9.ELhugemem/i686) eax: 00000001 ebx: 03186554 ecx: 00000000 edx: 00000000 esi: 1ddd1980 edi: 1ddd1a48 ebp: 00000000 esp: 10d19d14 ds: 0068 es: 0068 ss: 0068 Process a_io.linux (pid: 6826, stackpage=10d19000) Stack: 00000400 00000800 1fb239ac 1fa5ca80 00000000 00000040 00000040 1fa5cb80 10d19d88 00008000 00220740 00000040 021cdf7a 1ddd1988 1ddd1980 1ddd1a48 00000000 02141ae7 1ddd1980 0f342f80 00010000 00000001 02189661 00000216 Call Trace: [<021cdf7a>] generic_make_request [kernel] 0xea (0x10d19d44) [<02141ae7>] unmap_kvec [kernel] 0x47 (0x10d19d58) [<02189661>] generic_aio_complete_rw [kernel] 0x31 (0x10d19d6c) [<02189747>] generic_aio_complete_write [kernel] 0x27 (0x10d19d84) [<021671d1>] end_buffer_io_kiobuf_async [kernel] 0x91 (0x10d19d98) [<22816607>] __scsi_end_request [scsi_mod] 0x127 (0x10d19db8) [<22816910>] scsi_io_completion_Rsmp_d4f2f11b [scsi_mod] 0x180 (0x10d19ddc) [<2282dd5b>] rw_intr [sd_mod] 0x7b (0x10d19e30) [<22a01b48>] emcpFreeAsyncPirp [emcp] 0x28 (0x10d19e50) [<22a01d01>] PowerPlatformTopIodone [emcp] 0x1a1 (0x10d19e70) [<229f8afe>] PowerIodone [emcp] 0x4e (0x10d19ea0) [<22af2409>] PnIodoneCommon [emcppn] 0xd9 (0x10d19ec0) [<22af2461>] PnIodone [emcppn] 0x11 (0x10d19ef0) [<229f8afe>] PowerIodone [emcp] 0x4e (0x10d19f10) [<2280e46f>] scsi_finish_command [scsi_mod] 0x9f (0x10d19f30) [<2280e238>] scsi_softirq_handler [scsi_mod] 0x138 (0x10d19f54) [<0212f072>] tasklet_action [kernel] 0x62 (0x10d19f64) [<0212eed5>] do_softirq [kernel] 0xd5 (0x10d19f7c) [<0210e146>] do_IRQ [kernel] 0x146 (0x10d19f98) [<0210e000>] do_IRQ [kernel] 0x0 (0x10d19fbc) Code: Bad EIP value. Kernel panic: Fatal exception In interrupt handler - not syncing 2/18/04 3:31:42 PM Ismail Moumni: please use this stack trace instead as there are no powerpath functions in this one: ot@lcla233 a_io]# ./a_io.linux -w 3 16 65536 Opening /dev/raw/raw3; fd = 3 Opening /dev/raw/raw4; fd = 4 Opening /dev/raw/raw5; fd = 5 Opening /dev/raw/raw6; fd = 6 Openinvalid operand: 0000 qla2200 parport_pc lp parport autofs nfs lockd sunrpc e100 floppy microcode loop lvm-mod keybdev mousedev input hid usb-uhci usbcore ext3 jbd aic7xxx sd_mod s CPU: 0 EIP: 0060:[<021562bf>] Not tainted EFLAGS: 00010202 EIP is at __free_pages_ok [kernel] 0x3df (2.4.21-9.ELcustom/i686) eax: 00000001 ebx: 035e5a64 ecx: 00000000 edx: 00000000 esi: 1cef2e80 edi: 1cef2f48 ebp: 1d2efdec esp: 1d2efda4 ds: 0068 es: 0068 ss: 0068 Process a_io.linux (pid: 2916, stackpage=1d2ef000) Stack: 00000000 00000000 021d4650 1b2b816c 00000000 00000040 1d2efdcc 0214fd98 04636afc 000001f0 1d2efde8 0217ed95 04636afc 000001f0 00000000 1cef2e88 1cef2e80 1cef2f48 1d2efdf8 02156beb 1b2b84cc 1d2efe10 0214231a 00000000 Call Trace: [<0210d299>] show_stack [kernel] 0x79 [<0210d439>] show_registers [kernel] 0x169 [<0210d663>] die [kernel] 0x63 [<0210e3c4>] do_trap [kernel] 0xb4 [<0210d84d>] do_invalid_op [kernel] 0x5d Code: Bad EIP value. Kernel panic: Fatal exception In interrupt handler - not syncing Entering kdb (current=0x1d2ee000, pid 2916) on processor 0 due to KDB_ENTER() [0]kdb>
This is probably a generic AIO bug. Heather, can you attach the sources of your test? I'd like to know under what condition you trigger this BUG(). It may be that this is a duplicate of bz #113213.
Created attachment 99272 [details] AIO test sources
Hi, Heather, Looking over your stack trace again, it looks like this bug was indeed fixed in our latest U2 candidate. See bug 113213. Give the U2 candidate kernel a try, and let me know if that works for you. Thanks!
Will do - thank you for the update.
Did you have a chance to ensure this was fixed in your environment? Thanks.
We haven't been able to replicate this problem with RHEL 3.0 U2 so I'm closing the Bugzilla. Thank for your help. Heather