From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.6) Gecko/20040116 Description of problem: This system is an 8-CPU (plus hyperthreading) HP DL760 with 20GB RAM running Sybase 12.5. The kernel is 2.4.21-9.0.1.ELhugemem. Storage is a dual HP cciss card, with many 1 and 2GB LVM volumes (striped between the RAID volume on each controller). There is also a ql2300 card connected to an 8GB solid state disk. (sliced up into 1GB LVM volumes). The qla2300 driver is from Qlogic (v6.06.10) as the one Redhat ships is hopelessly broken in this configuration. There were two panics under heavy Sybase load (a couple thousand inserts and updates per second). The first panic was mangled by a broken serial console. Here's the surviving portion: Stack: 29b44600 b3bc2400 fffffffb 00000000 02189671 9e92fe80 fffffffb 7eef7580 00000400 00000000 02189727 00000000 b3bc2400 29b44600 fffffffb 02149ba7 b3bc2400 29b44600 fffffffb 29b44600 02189700 b3bc2400 f1fe9f68 39f43f7c Call Trace: [<02189671>] generic_aio_complete_rw [kernel] 0x31 (0x39f43ee0) [<02189727>] generic_aio_complete_read [kernel] 0x27 (0x39f43ef8) [<02149ba7>] __iodesc_free [kernel] 0x97 (0x39f43f0c) [<02189700>] generic_aio_complete_read [kernel] 0x0 (0x39f43f20) [<0214a48b>] __iodesc_read_finish [kernel] 0x22b (0x39f43f38) [<0214a678>] __iodesc_make_uptodate [kernel] 0xc8 (0x39f43f5c) [<0212f35a>] __run_task_queue [kernel] 0x6a (0x39f43f74) [<0213af7d>] context_thread [kernel] 0x13d (0x39f43f8c) [<021237a0>] schedule [kernel] 0x0 (0x39f43fbc) [<0213ae40>] context_thread [kernel] 0x0 (0x39f43fe0) [<0210958d>] kernel_thread_helper [kernel] 0x5 (0x39f43ff0) Code: 8b 41 18 a9 00 80 00 00 75 13 85 ed 74 06 f0 0f ba 69 18 04 Kernel panic: Fatal exception The code matches unmap_kvec+42, which is: 0x02141ac6 <unmap_kvec+38>: test %ecx,%ecx 0x02141ac8 <unmap_kvec+40>: je 0x2141ae7 <unmap_kvec+71> 0x02141aca <unmap_kvec+42>: mov 0x18(%ecx),%eax 0x02141acd <unmap_kvec+45>: test $0x8000,%eax Source says: if (likely(map != NULL) && !PageReserved(map)) { The second time, I had to copy it off the screen. Here's what I got: EIP is at generic_aio_complete_rw [kernel] 0x42 (2.4.21-9.0.1.ELhugemem/i686) eax: 00000000 ebx: 5f491600 ecx: 1c30e080 edx: 000000f5 esi: 6e0c9380 edi: 00000800 ebp: 00000000 esp: 39f43ee4 ds: 0068 es:0068 ss:0068 Process keventd (pid: 18, stackpage=39f43000) Stack: 00000001 00000800 ccf9e380 00000000 00000000 02189727 00000000 6e0c9380 5f491600 00000800 02149ba7 6e0c9380 5f491600 00000800 5f491600 02189700 6e0c9380 0215defc 065347ac e5021000 00000000 0214a48b ccf9e380 00000000 Call Trace: [<02189727>] generic_aio_complete_read [kernel] 0x27 (0x39f43ef8) [<02149ba7>] __iodesc_free [kernel] 0x97 (0x39f43f0c) [<02189700<] generic_aio_complete_read [kernel] 0x0 (0x39f43f20) [<0215defc>] kmap_high [kernel] 0x5c (0x39f43f28) [<0214a48b>] __iodesc_read_finish [kernel] 0x22b (0x39f43f38) [<0212f35a>] __run_task_queue [kernel] 0x6a (0x39f43f74) [<0213af7d>] context_thread [kernel] 0x13d (0x39f43f8c) [<021237a0>] schedule [kernel] 0x0 (0x39f43fbc) [<0213ae40>] context_kthread [kernel] 0x0 (0x39f43fe0) [<0210958d>] kernel_thread_helper [kernel] 0x5 (0x39f43ff0) Code: 8b 48 04 85 c9 75 05 3b 7e 30 74 24 8b 56 28 85 d2 89 f0 8b Kernel panic: Fatal exception In this case, generic_aio_complete_rw() tried to dereference a null iocb->ctx. I'm now trying to reproduce the problem with netdump enabled and hopefully the serial console fixed. Version-Release number of selected component (if applicable): 2.4.21-9.0.1.ELhugemem How reproducible: Sometimes Steps to Reproduce: happened twice, can't reproduce at-will yet. Additional info:
Created attachment 98988 [details] netdump log of kernel panic A complete panic message this time.
Great. Do you have the netdump image, as well? Could you compress it and attach it if you do?
Unfortunately, I didn't get the netdump image - the netdump server didn't have enough disk space. I assumed http://www.redhat.com/support/wpapers/redhat/netdump/setup.html was current documentation and only a 4GB dump would be saved. I'm banging on the machine some more, I'll hopefully get the netdump image the next time it crashes.
Ok, we need to update the documentation. That bug was fixed some time ago. Of course, I'm not sure how well the dump will compress, so it may be a moot point. ;-) On a side note, could you open up bugs on the shortcomings of the qla2300 driver that ships with RHEL 3? We are updating the driver, and we would like to ensure that we get the fixes that you need. Thanks!
I got a crash dump. It compresses to over 2GB, I probably shouldn't create an attachment that large (anyway Mozilla won't upload a file larger than 2GB). You can get it from: ftp://rhcrash:rhcrash@nitftp.knight-sec.com/goldinst-vmcore-20040401.gz and the panic message is: ftp://rhcrash:rhcrash@nitftp.knight-sec.com/goldinst-panic-20040401.txt I only have this machine until Monday, it's an eval unit from HP. (just so you know in case you need me to test a patch or collect more information). I opened a case on that ql2300 driver.
> Storage is a dual HP cciss card, with many 1 and 2GB LVM volumes > (striped between the RAID volume on each controller). I read this as you had configured each cciss separately, then used the Linux LVM to stripe your volumes across the two cciss devices. Is that correct? From the dump, it looks like the errant process is accessing /dev/hpdisk/s1g09 (i.e. opened a block device, not an lvm partition). Does this make sense given your current configuration? Has the box gone back to HP yet? Any chance you can keep it longer? Thanks! Jeff
That's correct about the volume layout. hpdisk is an LVM volume group. It contains physical volumes /dev/cciss/c1d0 and /dev/cciss/c2d0. /dev/hpdisk/s1g09 is a 1GB LVM volume. (striped between those two physical volumes) Sybase would have opened this particular volume through /dev/raw/raw9. I have the machine until tomorrow (Tuesday) morning now. I doubt I can keep it longer but I'll see.
If you still have the machine, let me know, and we'll turn on the slab debugging. If not, I'll continue to try to reproduce this in-house. Thanks.
I no longer have this machine, nor access to any similarly large x86 machine. Thanks for your help.
My name is Gwendal Grignou, I am working for Silverback Systems where I wrote driver for our new iSCSI chip, iSNAP 2110. Our test is 2 RHEL machines, one running as SCSI initiator, the other one as a target. I open 20 iSCSI sessions (which map to 20 TCP Connections and 20 SCSI host driver interfaces). We use iometer to pmup traffic, throuch a patched version of dynamo, iometer back end thread that generate traffic. After few seconds, the system crashes: with kgdb, it crashes in: generic_aio_complete_rw at aio.c:1135 The problem is iocb->ctx is NULL while the code try to access iocb->ctx->dead (gdb) p _iocb $3 = (void *) 0xd12f0700 (gdb) p *(struct kiocb*)$3 $4 = {list = {next = 0xd12f3480, prev = 0xd12f2c00}, filp = 0x0, ctx = 0x0, user_obj = 0x0, user_data = 3076314848, pos = 387072, buf = 0, nr_transferred = 2048, size = 2048, this_size = 2048, key = 1002, cancel = 0, data = 0x0, users = 0, u = {tq = {list = {next = 0x0, prev = 0x0}, sync = 0, routine = 0, data = 0x0}, list = {next = 0x0, prev = 0x0}}, rlim_fsize = 4294967295} It is easy to reproduce in my configuration. Do you have any idea where I should look to help you debug? Thanks, Gwendal.
Hi, Gwendal, As a start, you could enable slab debugging. Let me know what (if anything) that uncovers. -Jeff
Reposting prior update for inclusion in IT entry, 55870: <start> Hi, Gwendal, As a start, you could enable slab debugging. Let me know what (if anything) that uncovers. -Jeff <End>
For anyone still running into this problem, please try the debug kernel located here: http://people.redhat.com/jmoyer/.bz119457/ Be sure to capture the kernel logs via netlog/netdump or a serial console. When next the machine panics, I'm interested in seeing these logs.
Is there anyone who is still seeing this problem?
Created attachment 122301 [details] Add a missing return statement; without this, an io can be completed twice. I'm currently testing this patch, and it's holding up quite well. I am going to coordinate some more stress testing internally, but I'd like to see if the customer(s) could give this a try. Thanks.
Here is a description of the problem that the posted patch fixes: If a program issues a read for a page which is undergoing write I/O, you can trigger this problem. For example, a program which iissues a number of writes to a file, followed very closely by a number of reads to the same portions of the same file will trigger this bug. What ends up happening is that the AIO reads will catch up with the writes, since everything is being cached. When the read path reaches a page which is still locked by the writer, it follows a code path that is seldom run. The code path tests to see if the page is locked for I/O. If it is locked for I/O, then the reader puts itself on the page wait queue (entries on the page wait queue get woken up when the page is unlocked). The errant code path then completes the I/O, even though it hasn't completed. Because of this, when the page is unlocked, we get called again with real data in the page, and the same I/O is completed a second time. This second I/O completion then causes the dereferencing of NULL pointer in the io context structure. Because of the racy nature of this problem, it's easier to reproduce over a networked file system than it is on a local file system. That isn't to say that one cannot trigger the bug on a local file system; it's simply more difficult to hit the race in the local case.
A fix for this problem has just been committed to the RHEL3 U8 patch pool this evening (in kernel version 2.4.21-40.2.EL).
This issue is on Red Hat Engineering's list of planned work items for the upcoming Red Hat Enterprise Linux 3.8 release. Engineering resources have been assigned and barring unforeseen circumstances, Red Hat intends to include this item in the 3.8 release.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0437.html