Hide Forgot
Description of problem: Got kernel panic after iscsi login. ===== (beiscsi_process_cq():1740):CQ Error notification for cmd.. code 19 cid 0x0 logger: 2011-12-02 05:42:49 /usr/bin/rhts-test-runner.sh 28896 240 hearbeat... Unable to handle kernel NULL pointer dereference at 0000000000000170 RIP: [<ffffffff8859e2cf>] :be2iscsi:beiscsi_process_cq+0x155/0x689 PGD 1a0982067 PUD 18e863067 PMD 0 Oops: 0002 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:02.0/0000:09:00.0/0000:0a:01.0/0000:0e:00.3/host48/session2/target48:0:1/48:0:1:0/state CPU 0 Modules linked in: sg be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic cxgb3i autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 xfrm_nalgo crypto_api uio libcxgbi cxgb3 libiscsi_tcp dm_mirror dm_multipath scsi_dh video backlight sbs power_meter i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport ide_cd i5k_amb cdrom hwmon hpilo tpm_tis tpm libiscsi2 pcspkr serio_raw be2net 8021q bnx2 i5000_edac tpm_bios scsi_transport_iscsi2 scsi_transport_iscsi edac_mc dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 9135, comm: dt Not tainted 2.6.18-298.el5 #1 RIP: 0010:[<ffffffff8859e2cf>] [<ffffffff8859e2cf>] :be2iscsi:beiscsi_process_cq+0x155/0x689 RSP: 0000:ffffffff804afe80 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8101a5ccc5c0 RCX: 0000000080000001 RDX: 0000000000000000 RSI: ffff8100cfdb4180 RDI: 0000000080000001 RBP: 0000000080280001 R08: ffff8101a5ccc640 R09: ffff8101a6221290 R10: 000000000000003c R11: 0000000000000246 R12: 00000000000005c0 R13: ffff8100cfdd9f60 R14: 0000000000000001 R15: ffff810196c985a0 FS: 00002ae22393d070(0000) GS:ffffffff8042f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000170 CR3: 000000019be9c000 CR4: 00000000000006e0 Process dt (pid: 9135, threadinfo ffff810192d6c000, task ffff8101a851a820) Stack: ffff8100cf7f0ca0 0000000000000000 ffff8101a8b7ad40 0000000002bbf20e 000000000000a800 000000000000000a ffff8100cf7f0040 000000000000000a 0000000000000100 0000000000000000 00000000ffffe2e7 ffffffff8859e817 Call Trace: <IRQ> [<ffffffff8859e817>] :be2iscsi:be_iopoll+0x14/0x50 [<ffffffff8859ff6a>] :be2iscsi:be_isr_msix+0x1cf/0x1e1 [<ffffffff8014be3a>] blk_iopoll_softirq+0x60/0xe0 [<ffffffff8001251d>] __do_softirq+0x89/0x133 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006d646>] do_softirq+0x2c/0x7d [<ffffffff8006d4d6>] do_IRQ+0xee/0xf7 [<ffffffff8005d615>] ret_from_intr+0x0/0xa <EOI> Code: 89 90 70 01 00 00 89 c8 c1 ef 08 25 00 00 ff 00 c1 e8 10 74 RIP [<ffffffff8859e2cf>] :be2iscsi:beiscsi_process_cq+0x155/0x689 RSP <ffffffff804afe80> CR2: 0000000000000170 <0>Kernel panic - not syncing: Fatal exception ===== Version-Release number of selected component (if applicable): kernel -298 be2iscsi WITHOUT CHAP How reproducible: no sure. Hit by a beaker job https://beaker.engineering.redhat.com/recipes/347188. Retrying at https://beaker.engineering.redhat.com/jobs/165698 Steps to Reproduce: 1. iscsi discovery via be2iscsi target 2. Loging. 3. IO stress by dt. Actual results: kernel panic Expected results: I/O stress finished correctly. Additional info: Sorry, no kdump, it's in my plan for adding kdump setup in auto test. This is look like Bug #738934, a customer said they hit it without CHAP. Then I hit it in RHEL 5 without CHAP.
When I shutting down the OS, I got stable reproduce (2 out of 2) for this panic ==== Kernel BUG at lib/iomap.c:93 invalid opcode: 0000 [1] SMP last sysfs file: /devices/system/cpu/cpu1/cpufreq/cpuinfo_max_freq CPU 0 Modules linked in: sg lockd sunrpc acpi_cpufreq freq_table mperf ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 libiscsi_tcp dm_mirror dm_multipath scsi_dh video backlight sbs power_meter i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport be2iscsi i5k_amb libiscsi2 be2net ide_cd hwmon scsi_transport_iscsi2 tpm_tis cdrom bnx2 8021q scsi_transport_iscsi pcspkr tpm tpm_bios serio_raw i5000_edac hpilo edac_mc dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 3100, comm: reboot Not tainted 2.6.18-298.el5 #1 RIP: 0010:[<ffffffff8015a776>] [<ffffffff8015a776>] iowrite32+0x17/0x2d RSP: 0018:ffffffff804afdc8 EFLAGS: 00010287 RAX: 0000000000000000 RBX: ffff810005eb4840 RCX: ffff8100cfa2d400 RDX: 0000000000000021 RSI: 0000000000000040 RDI: 0000000001210002 RBP: 0000000001210002 R08: 000000006a072100 R09: ffff8101a6764d80 R10: 0000000000000000 R11: 0000000000000100 R12: ffff8101a08ba1c0 R13: ffff8101a08ba240 R14: 0000000000000000 R15: ffff8101a889c000 FS: 00002ad044e196e0(0000) GS:ffffffff8042f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fff211c3ff0 CR3: 00000001a0b0c000 CR4: 00000000000006e0 Process reboot (pid: 3100, threadinfo ffff8101a06e2000, task ffff8101a5a337e0) Stack: ffffffff882e57d6 0000000180062bb4 ffffffff804aff08 0000000037fe3000 ffff8101a08c7a90 0000000000000000 ffffffff804afe70 ffff8101a08b8c88 0000000000000000 ffff8101a889c000 ffffffff882c6140 0000000000000000 Call Trace: <IRQ> [<ffffffff882e57d6>] :be2iscsi:beiscsi_task_xmit+0x4d2/0x51d [<ffffffff882c6140>] :libiscsi2:__iscsi_conn_send_pdu+0x1dd/0x241 [<ffffffff882c7264>] :libiscsi2:iscsi_check_transport_timeouts+0x0/0x19b [<ffffffff882c666d>] :libiscsi2:iscsi_send_nopout+0x5b/0xc3 [<ffffffff8006ec13>] do_timer_jiffy+0x9/0x1d [<ffffffff882c738d>] :libiscsi2:iscsi_check_transport_timeouts+0x129/0x19b [<ffffffff882c7264>] :libiscsi2:iscsi_check_transport_timeouts+0x0/0x19b [<ffffffff8009a091>] run_timer_softirq+0x18d/0x23a [<ffffffff8001251d>] __do_softirq+0x89/0x133 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006d646>] do_softirq+0x2c/0x7d [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff8000caf6>] __delay+0x8/0x10 [<ffffffff882eb37c>] :be2iscsi:be_mbox_db_ready_wait+0xcd/0xf3 [<ffffffff882eb402>] :be2iscsi:be_mbox_notify+0x60/0x141 [<ffffffff882eb5a7>] :be2iscsi:beiscsi_cmd_q_destroy+0xc4/0xd4 [<ffffffff882e5d8c>] :be2iscsi:hwi_cleanup+0x48/0x195 [<ffffffff882e6015>] :be2iscsi:beiscsi_clean_port+0x13c/0x17a [<ffffffff882e616b>] :be2iscsi:beiscsi_quiesce+0x118/0x17c [<ffffffff882e6206>] :be2iscsi:beiscsi_shutdown+0x37/0x40 [<ffffffff801d1bf0>] device_shutdown+0x56/0x88 [<ffffffff8009da2e>] kernel_restart+0x9/0x46 [<ffffffff8009dbb8>] sys_reboot+0x146/0x1c7 [<ffffffff8003ac75>] hrtimer_try_to_cancel+0x4a/0x53 [<ffffffff8005a085>] hrtimer_cancel+0xc/0x16 [<ffffffff80063cf9>] do_nanosleep+0x47/0x70 [<ffffffff80059f72>] hrtimer_nanosleep+0x58/0x118 [<ffffffff800ba605>] audit_syscall_entry+0x1a8/0x1d3 [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Code: 0f 0b 68 b1 9b 2c 80 c2 5d 00 eb fe 0f b7 d6 89 f8 ef c3 89 RIP [<ffffffff8015a776>] iowrite32+0x17/0x2d RSP <ffffffff804afdc8> <0>Kernel panic - not syncing: Fatal exception ====== If this is another issue, inform me to create new one.
Got stable reproduce (5 out of 5).: https://beaker.engineering.redhat.com/jobs/166657 The same task is still running on RHEL 5.7 GA, will set regression later. Requesting blocker as this bug is not race condition and cause kernel panic. Mike, Can you take a look on this bug? It looks the same with Bug #738934 in RHEL 6. Let me know if you need the core dump.
Ah shoot, missed it by 1. Could you try the -99 kernel? The fix we did for Bug #738934 when in under bz https://bugzilla.redhat.com/show_bug.cgi?id=744343 in kernel-2.6.18-299.el5.
Tried 3 times I/O stress on be2iscsi with kernel -300. No kernel panic found. CLOSE - CURRENT RELEASE.