From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; EMC IS 55; .NET CLR 1.0.3705; .NET CLR 1.1.4322; InfoPath.1) Description of problem: In the 4 nodes clustering environment, randomly issue reboot 3 of 4 server. One server each time. The panic occurred after the reboot was issues and the node started to come back up. Version-Release number of selected component (if applicable): kernel-2.6.9-22.EL How reproducible: Always Steps to Reproduce: 1.Configuration Server model: rx5670 HBA model: QLA2312 Driver version : 8.01.00 OS kernel: RH 4 UP2 2.6.9-22.EL FC switch : Brocade 3800 16 port Array model: CLARiiON CX300 2.runs in a cluster of 4 nodes randomly issues reboots to 3 out 4 nodes during the 24 hour cycle. However, reboot is not issued to all the nodes at the same time. That is, only one node at a time. After the node comes back up, the server loads it with lots of IO and continues with test and then issues another reboot after an hour or so. The panic occurred after the reboot was issues and the node started to come back up. 3. Additional info:
Do you have a trace of the panic?
The problem is not always reproduce. Clustering environment is not true. It is 4 servers on the same switch and 3 of them randomly reboot to generate RSCN. Console trace as following: end_request: I/O error, dev sdfz, sector 2799341 kernel BUG at drivers/scsi/scsi.c:292! scsi_eh_3[729]: bugcheck! 0 [1] Modules linked in: md5 ipv6 parport_pc lp parport pidentd(U) autofs4 sunrpc ds yenta_socket pcmcia_core deadman(U) vfat fat sg dm_multipath emcphr(U) emcpmpap(U) emcpmpaa(U) emcpmpc(U) emcpmp(U) emcp(U) emcplib(U) button ohci_hcd ehci_hcd e1000 tg3 bonding(U) dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300(U) qla2xxx(U) qla2xxx_conf(U) mptscsih mptbase sd_mod scsi_mod Pid: 729, CPU 0, comm: scsi_eh_3 psr : 0000101008122010 ifs : 800000000000058e ip : [<a000000200069f00>] Tainted: P ip is at scsi_put_command+0x1e0/0x200 [scsi_mod] unat: 0000000000000000 pfs : 000000000000058e rsc : 0000000000000003 rnat: 0000000043de6db3 bsps: 000000000002cc51 pr : 0000000000269941 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f csd : 0000000000000000 ssd : 0000000000000000 b0 : a000000200069f00 b6 : a0000001003418a0 b7 : a000000100256b40 f6 : 0fffbccccccccc8c00000 f7 : 0ffdaa200000000000000 f8 : 100008000000000000000 f9 : 10002a000000000000000 f10 : 0fffcccccccccc8c00000 f11 : 1003e0000000000000000 r1 : a00000010099d0e0 r2 : 000000000032b272 r3 : a00000010079d740 r8 : 0000000000000027 r9 : a000000100732ac0 r10 : a000000100732ab8 r11 : a00000010079d328 r12 : e0000001004c7df0 r13 : e0000001004c0000 r14 : 0000000000004000 r15 : a00000010074e540 r16 : 0000000000000001 r17 : 0000000000000538 r18 : a000000100650198 r19 : a000000100256b40 r20 : c0000000f4050000 r21 : 0000000000000005 r22 : a0000001007b3b30 r23 : a0000001007b3a40 r24 : a0000001007b3a40 r25 : a000000100a3d7c8 r26 : 00000ba5d86cafa5 r27 : 0000001008122010 r28 : 0000000000000000 r29 : 00000000110000c0 r30 : 0000000000000000 r31 : a0000001007b00c0 Call Trace: [<a000000100016a60>] show_stack+0x80/0xa0 sp=e0000001004c7960 bsp=e0000001004c1220 [<a000000100017370>] show_regs+0x890/0x8c0 sp=e0000001004c7b30 bsp=e0000001004c11d0 [<a00000010003d7f0>] die+0x150/0x240 sp=e0000001004c7b50 bsp=e0000001004c1190 [<a00000010003d920>] die_if_kernel+0x40/0x60 sp=e0000001004c7b50 bsp=e0000001004c1160 [<a00000010003dac0>] ia64_bad_break+0x180/0x600 sp=e0000001004c7b50 bsp=e0000001004c1138 [<a00000010000f480>] ia64_leave_kernel+0x0/0x260 sp=e0000001004c7c20 bsp=e0000001004c1138 [<a000000200069f00>] scsi_put_command+0x1e0/0x200 [scsi_mod] sp=e0000001004c7df0 bsp=e0000001004c10c8 [<a000000200078d00>] scsi_next_command+0x40/0x80 [scsi_mod] sp=e0000001004c7df0 bsp=e0000001004c10a0 [<a000000200078ff0>] scsi_end_request+0x1d0/0x2e0 [scsi_mod] sp=e0000001004c7df0 bsp=e0000001004c1058 [<a000000200079570>] scsi_io_completion+0x2b0/0xa00 [scsi_mod] sp=e0000001004c7df0 bsp=e0000001004c0fd0 [<a000000200026710>] sd_rw_intr+0x110/0x700 [sd_mod] sp=e0000001004c7df0 bsp=e0000001004c0f80 [<a00000020006c190>] scsi_finish_command+0x2d0/0x300 [scsi_mod] sp=e0000001004c7df0 bsp=e0000001004c0f50 [<a0000002000769d0>] scsi_error_handler+0x16b0/0x2560 [scsi_mod] sp=e0000001004c7df0 bsp=e0000001004c0e38 [<a000000100018930>] kernel_thread_helper+0x30/0x60 sp=e0000001004c7e30 bsp=e0000001004c0e10 [<a000000100008c60>] start_kernel_thread+0x20/0x40 sp=e0000001004c7e30 bsp=e0000001004c0e10 Kernel panic - not syncing: Fatal exception
ok. thanks for the trace. re-assigning to our scsi expert.
Is it still reproducible with the latest rhel4 or rhel5 or upstream?
It is looks like the same issue as Bug# 231319. *** This bug has been marked as a duplicate of 231319 ***