Red Hat Bugzilla – Bug 1071340
FCoE target: kernel panic when initiator connects to target
Last modified: 2015-03-05 06:41:04 EST
Description of problem: When FCoE initiator server is booting, it causes kernel panic on FCoE target server. Version-Release number of selected component (if applicable): 3.10.0-97.el7.x86_64 targetcli-2.1.fb34-1.el7.noarch # modinfo ixgbe filename: /lib/modules/3.10.0-97.el7.x86_64/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko version: 3.15.1-k How reproducible: sometimes Steps to Reproduce: 1.Configure FCoE target to present a LUN to initiator 2.Power on initiator 3.kernel panic on server [ 2457.927134] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2 [ 2457.962440] CPU: 2 PID: 1362 Comm: fcoethread/2 Not tainted 3.10.0-97.el7.x86_64 #1 [ 2457.997955] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013 [ 2458.029072] 0000000000000000 ffff88020b446c68 ffffffff815c2e83 ffff88020b446ce0 [ 2458.063218] ffffffff815bcc4e 0000000000000010 ffff88020b446cf0 ffff88020b446c90 [ 2458.099154] 0000000000000000 0000000000000002 0000000000000261 0000000000000002 [ 2458.133645] Call Trace: [ 2458.145929] <NMI> [<ffffffff815c2e83>] dump_stack+0x19/0x1b [ 2458.173613] [<ffffffff815bcc4e>] panic+0xc8/0x1d7 [ 2458.196216] [<ffffffff810ed9e0>] ? watchdog_enable_all_cpus.part.2+0x40/0x40 [ 2458.229922] [<ffffffff810edaa2>] watchdog_overflow_callback+0xc2/0xd0 [ 2458.259868] [<ffffffff8112d51e>] __perf_event_overflow+0x8e/0x230 [ 2458.292100] [<ffffffff8112c2e9>] ? perf_event_update_userpage+0x19/0x100 [ 2458.324039] [<ffffffff8112e094>] perf_event_overflow+0x14/0x20 [ 2458.355086] [<ffffffff8102867d>] intel_pmu_handle_irq+0x1bd/0x3c0 [ 2458.384155] [<ffffffff815cbf8b>] perf_event_nmi_handler+0x2b/0x50 [ 2458.415420] [<ffffffff815cb729>] nmi_handle.isra.0+0x59/0x90 [ 2458.444751] [<ffffffff815cb8c9>] do_nmi+0x169/0x340 [ 2458.469814] [<ffffffff815cabb1>] end_repeat_nmi+0x1e/0x2e [ 2458.499774] [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 [ 2458.529466] [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 [ 2458.560565] [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 [ 2458.591734] <<EOE>> [<ffffffffa05fa680>] ft_acl_get+0x30/0x160 [tcm_fc] [ 2458.626535] [<ffffffffa05fb547>] ft_prli+0x47/0x2c0 [tcm_fc] [ 2458.656563] [<ffffffffa0447af3>] fc_rport_enter_prli+0xe3/0x2b0 [libfc] [ 2458.689046] [<ffffffffa04493fb>] fc_rport_recv_req+0x53b/0x1280 [libfc] [ 2458.724200] [<ffffffff8101a0b3>] ? native_sched_clock+0x13/0x80 [ 2458.753588] [<ffffffff8101a129>] ? sched_clock+0x9/0x10 [ 2458.782632] [<ffffffffa0445068>] fc_lport_recv_els_req+0x78/0x150 [libfc] [ 2458.818103] [<ffffffffa0443d0a>] fc_lport_recv_req+0x8a/0xd0 [libfc] [ 2458.851128] [<ffffffffa0441513>] fc_exch_recv+0x413/0x640 [libfc] [ 2458.880624] [<ffffffffa047b329>] fcoe_percpu_receive_thread+0x299/0x53c [fcoe] [ 2458.915983] [<ffffffffa047b090>] ? fcoe_set_port_id+0x50/0x50 [fcoe] [ 2458.945653] [<ffffffff8107fc10>] kthread+0xc0/0xd0 [ 2458.968750] [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 [ 2459.000551] [<ffffffff815d2bec>] ret_from_fork+0x7c/0xb0 [ 2459.025325] [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 [ 2459.056905] drm_kms_helper: panic occurred, switching back to text console [ 2459.092846] ------------[ cut here ]------------ [ 2459.114816] WARNING: at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5f/0x70() [ 2459.151968] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache tcm_fc target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod dm_service_time bnx2fc cnic uio fcoe 8021q garp libfcoe stp mrp libfc llc scsi_transport_fc scsi_tgt sg coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel iTCO_wdt ghash_clmulni_intel iTCO_vendor_support aesni_intel lrw gf128mul glue_helper ablk_helper cryptd microcode serio_raw e1000e pcspkr ixgbe lpc_ich mfd_core ptp mdio hpilo hpwdt pps_core dca shpchp ipmi_si ipmi_msghandler mperf nfsd auth_rpcgss nfs_acl lockd sunrpc dm_multipath xfs libcrc32c sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci libahci libata i2c_core hpsa dm_mirror dm_region_hash dm_log dm_mod [ 2459.508241] CPU: 2 PID: 1362 Comm: fcoethread/2 Not tainted 3.10.0-97.el7.x86_64 #1 [ 2459.546639] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013 [ 2459.579601] 0000000000000009 ffff88020b443d98 ffffffff815c2e83 ffff88020b443dd0 [ 2459.614167] ffffffff81059bd1 0000000000000000 ffff88020b454540 000000010020ecbc [ 2459.649782] ffff88020b414540 0000000000000002 ffff88020b443de0 ffffffff81059caa [ 2459.684606] Call Trace: [ 2459.696722] <IRQ> [<ffffffff815c2e83>] dump_stack+0x19/0x1b [ 2459.723685] [<ffffffff81059bd1>] warn_slowpath_common+0x61/0x80 [ 2459.751411] [<ffffffff81059caa>] warn_slowpath_null+0x1a/0x20 [ 2459.779036] [<ffffffff81036e5f>] native_smp_send_reschedule+0x5f/0x70 [ 2459.808782] [<ffffffff8109dd5d>] trigger_load_balance+0x16d/0x200 [ 2459.838727] [<ffffffff8108fe03>] scheduler_tick+0x103/0x150 [ 2459.864726] [<ffffffff8106aee6>] update_process_times+0x66/0x80 [ 2459.893545] [<ffffffff810b6835>] tick_sched_handle.isra.16+0x25/0x60 [ 2459.923929] [<ffffffff810b68b1>] tick_sched_timer+0x41/0x60 [ 2459.950436] [<ffffffff81083887>] __run_hrtimer+0x77/0x1d0 [ 2459.976394] [<ffffffff810b6870>] ? tick_sched_handle.isra.16+0x60/0x60 [ 2460.007303] [<ffffffff8108408f>] hrtimer_interrupt+0xef/0x230 [ 2460.035256] [<ffffffff81037f57>] local_apic_timer_interrupt+0x37/0x60 [ 2460.065892] [<ffffffff815d4faf>] smp_apic_timer_interrupt+0x3f/0x60 [ 2460.096689] [<ffffffff815d391d>] apic_timer_interrupt+0x6d/0x80 [ 2460.125365] <EOI> <NMI> [<ffffffff81085772>] ? up+0x32/0x50 [ 2460.153627] [<ffffffff815bcd19>] ? panic+0x193/0x1d7 [ 2460.177540] [<ffffffff815bcc83>] ? panic+0xfd/0x1d7 [ 2460.200556] [<ffffffff810ed9e0>] ? watchdog_enable_all_cpus.part.2+0x40/0x40 [ 2460.234351] [<ffffffff810edaa2>] watchdog_overflow_callback+0xc2/0xd0 [ 2460.264699] [<ffffffff8112d51e>] __perf_event_overflow+0x8e/0x230 [ 2460.296476] [<ffffffff8112c2e9>] ? perf_event_update_userpage+0x19/0x100 [ 2460.327933] [<ffffffff8112e094>] perf_event_overflow+0x14/0x20 [ 2460.358240] [<ffffffff8102867d>] intel_pmu_handle_irq+0x1bd/0x3c0 [ 2460.387838] [<ffffffff815cbf8b>] perf_event_nmi_handler+0x2b/0x50 [ 2460.418081] [<ffffffff815cb729>] nmi_handle.isra.0+0x59/0x90 [ 2460.447745] [<ffffffff815cb8c9>] do_nmi+0x169/0x340 [ 2460.470891] [<ffffffff815cabb1>] end_repeat_nmi+0x1e/0x2e [ 2460.499971] [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 [ 2460.530675] [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 [ 2460.561530] [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 [ 2460.593590] <<EOE>> [<ffffffffa05fa680>] ft_acl_get+0x30/0x160 [tcm_fc] [ 2460.628198] [<ffffffffa05fb547>] ft_prli+0x47/0x2c0 [tcm_fc] [ 2460.659372] [<ffffffffa0447af3>] fc_rport_enter_prli+0xe3/0x2b0 [libfc] [ 2460.692845] [<ffffffffa04493fb>] fc_rport_recv_req+0x53b/0x1280 [libfc] [ 2460.729646] [<ffffffff8101a0b3>] ? native_sched_clock+0x13/0x80 [ 2460.759043] [<ffffffff8101a129>] ? sched_clock+0x9/0x10 [ 2460.786126] [<ffffffffa0445068>] fc_lport_recv_els_req+0x78/0x150 [libfc] [ 2460.822090] [<ffffffffa0443d0a>] fc_lport_recv_req+0x8a/0xd0 [libfc] [ 2460.855206] [<ffffffffa0441513>] fc_exch_recv+0x413/0x640 [libfc] [ 2460.885149] [<ffffffffa047b329>] fcoe_percpu_receive_thread+0x299/0x53c [fcoe] [ 2460.921712] [<ffffffffa047b090>] ? fcoe_set_port_id+0x50/0x50 [fcoe] [ 2460.953981] [<ffffffff8107fc10>] kthread+0xc0/0xd0 [ 2460.976836] [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 [ 2461.010859] [<ffffffff815d2bec>] ret_from_fork+0x7c/0xb0 [ 2461.038076] [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 [ 2461.072842] ---[ end trace 874881bfbaa680ef ]---
Created attachment 870027 [details] vmcore-dmesg
Created attachment 915871 [details] Comment (This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).
have a proposed fix, pushing it upstream.
Patch(es) available on kernel-3.10.0-125.el7
Reproduced on kernel 3.10.0-97.el7 Verified on 3.10.0-125.el7, more than 10 reboots and there was no crash.
This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request.
*** Bug 1099051 has been marked as a duplicate of this bug. ***
(In reply to Ludek Smid from comment #20) > This request was resolved in Red Hat Enterprise Linux 7.0. > > Contact your manager or support representative in case you have further > questions about the request. No it wasn't. 123.el7 was the 7.0 kernel, this went into a 7.1 kernel (125.el7).
Verified since kernel -125
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0290.html