Bug 1071340

Summary: FCoE target: kernel panic when initiator connects to target
Product: Red Hat Enterprise Linux 7 Reporter: Bruno Goncalves <bgoncalv>
Component: kernelAssignee: Andy Grover <agrover>
kernel sub component: Storage QA Contact: Bruno Goncalves <bgoncalv>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: agrover, coughlan, dhoward, jkurik, qcai, revers
Version: 7.0Keywords: Reopened, Triaged, ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-3.10.0-125.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1084646 (view as bug list) Environment:
Last Closed: 2015-03-05 11:41:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1070517, 1070921, 1073810, 1077078, 1083244, 1084646, 1086308, 1088110, 1094654    
Attachments:
Description Flags
vmcore-dmesg
none
Comment none

Description Bruno Goncalves 2014-02-28 14:32:45 UTC
Description of problem:
When FCoE initiator server is booting, it causes kernel panic on FCoE target server.

Version-Release number of selected component (if applicable):
3.10.0-97.el7.x86_64

targetcli-2.1.fb34-1.el7.noarch

# modinfo ixgbe
filename:       /lib/modules/3.10.0-97.el7.x86_64/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko
version:        3.15.1-k


How reproducible:
sometimes

Steps to Reproduce:
1.Configure FCoE target to present a LUN to initiator
2.Power on initiator
3.kernel panic on server

[ 2457.927134] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2 
[ 2457.962440] CPU: 2 PID: 1362 Comm: fcoethread/2 Not tainted 3.10.0-97.el7.x86_64 #1 
[ 2457.997955] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013 
[ 2458.029072]  0000000000000000 ffff88020b446c68 ffffffff815c2e83 ffff88020b446ce0 
[ 2458.063218]  ffffffff815bcc4e 0000000000000010 ffff88020b446cf0 ffff88020b446c90 
[ 2458.099154]  0000000000000000 0000000000000002 0000000000000261 0000000000000002 
[ 2458.133645] Call Trace: 
[ 2458.145929]  <NMI>  [<ffffffff815c2e83>] dump_stack+0x19/0x1b 
[ 2458.173613]  [<ffffffff815bcc4e>] panic+0xc8/0x1d7 
[ 2458.196216]  [<ffffffff810ed9e0>] ? watchdog_enable_all_cpus.part.2+0x40/0x40 
[ 2458.229922]  [<ffffffff810edaa2>] watchdog_overflow_callback+0xc2/0xd0 
[ 2458.259868]  [<ffffffff8112d51e>] __perf_event_overflow+0x8e/0x230 
[ 2458.292100]  [<ffffffff8112c2e9>] ? perf_event_update_userpage+0x19/0x100 
[ 2458.324039]  [<ffffffff8112e094>] perf_event_overflow+0x14/0x20 
[ 2458.355086]  [<ffffffff8102867d>] intel_pmu_handle_irq+0x1bd/0x3c0 
[ 2458.384155]  [<ffffffff815cbf8b>] perf_event_nmi_handler+0x2b/0x50 
[ 2458.415420]  [<ffffffff815cb729>] nmi_handle.isra.0+0x59/0x90 
[ 2458.444751]  [<ffffffff815cb8c9>] do_nmi+0x169/0x340 
[ 2458.469814]  [<ffffffff815cabb1>] end_repeat_nmi+0x1e/0x2e 
[ 2458.499774]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2458.529466]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2458.560565]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2458.591734]  <<EOE>>  [<ffffffffa05fa680>] ft_acl_get+0x30/0x160 [tcm_fc] 
[ 2458.626535]  [<ffffffffa05fb547>] ft_prli+0x47/0x2c0 [tcm_fc] 
[ 2458.656563]  [<ffffffffa0447af3>] fc_rport_enter_prli+0xe3/0x2b0 [libfc] 
[ 2458.689046]  [<ffffffffa04493fb>] fc_rport_recv_req+0x53b/0x1280 [libfc] 
[ 2458.724200]  [<ffffffff8101a0b3>] ? native_sched_clock+0x13/0x80 
[ 2458.753588]  [<ffffffff8101a129>] ? sched_clock+0x9/0x10 
[ 2458.782632]  [<ffffffffa0445068>] fc_lport_recv_els_req+0x78/0x150 [libfc] 
[ 2458.818103]  [<ffffffffa0443d0a>] fc_lport_recv_req+0x8a/0xd0 [libfc] 
[ 2458.851128]  [<ffffffffa0441513>] fc_exch_recv+0x413/0x640 [libfc] 
[ 2458.880624]  [<ffffffffa047b329>] fcoe_percpu_receive_thread+0x299/0x53c [fcoe] 
[ 2458.915983]  [<ffffffffa047b090>] ? fcoe_set_port_id+0x50/0x50 [fcoe] 
[ 2458.945653]  [<ffffffff8107fc10>] kthread+0xc0/0xd0 
[ 2458.968750]  [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 
[ 2459.000551]  [<ffffffff815d2bec>] ret_from_fork+0x7c/0xb0 
[ 2459.025325]  [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 
[ 2459.056905] drm_kms_helper: panic occurred, switching back to text console 
[ 2459.092846] ------------[ cut here ]------------ 
[ 2459.114816] WARNING: at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5f/0x70() 
[ 2459.151968] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache tcm_fc target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod dm_service_time bnx2fc cnic uio fcoe 8021q garp libfcoe stp mrp libfc llc scsi_transport_fc scsi_tgt sg coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel iTCO_wdt ghash_clmulni_intel iTCO_vendor_support aesni_intel lrw gf128mul glue_helper ablk_helper cryptd microcode serio_raw e1000e pcspkr ixgbe lpc_ich mfd_core ptp mdio hpilo hpwdt pps_core dca shpchp ipmi_si ipmi_msghandler mperf nfsd auth_rpcgss nfs_acl lockd sunrpc dm_multipath xfs libcrc32c sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci libahci libata i2c_core hpsa dm_mirror dm_region_hash dm_log dm_mod 
[ 2459.508241] CPU: 2 PID: 1362 Comm: fcoethread/2 Not tainted 3.10.0-97.el7.x86_64 #1 
[ 2459.546639] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013 
[ 2459.579601]  0000000000000009 ffff88020b443d98 ffffffff815c2e83 ffff88020b443dd0 
[ 2459.614167]  ffffffff81059bd1 0000000000000000 ffff88020b454540 000000010020ecbc 
[ 2459.649782]  ffff88020b414540 0000000000000002 ffff88020b443de0 ffffffff81059caa 
[ 2459.684606] Call Trace: 
[ 2459.696722]  <IRQ>  [<ffffffff815c2e83>] dump_stack+0x19/0x1b 
[ 2459.723685]  [<ffffffff81059bd1>] warn_slowpath_common+0x61/0x80 
[ 2459.751411]  [<ffffffff81059caa>] warn_slowpath_null+0x1a/0x20 
[ 2459.779036]  [<ffffffff81036e5f>] native_smp_send_reschedule+0x5f/0x70 
[ 2459.808782]  [<ffffffff8109dd5d>] trigger_load_balance+0x16d/0x200 
[ 2459.838727]  [<ffffffff8108fe03>] scheduler_tick+0x103/0x150 
[ 2459.864726]  [<ffffffff8106aee6>] update_process_times+0x66/0x80 
[ 2459.893545]  [<ffffffff810b6835>] tick_sched_handle.isra.16+0x25/0x60 
[ 2459.923929]  [<ffffffff810b68b1>] tick_sched_timer+0x41/0x60 
[ 2459.950436]  [<ffffffff81083887>] __run_hrtimer+0x77/0x1d0 
[ 2459.976394]  [<ffffffff810b6870>] ? tick_sched_handle.isra.16+0x60/0x60 
[ 2460.007303]  [<ffffffff8108408f>] hrtimer_interrupt+0xef/0x230 
[ 2460.035256]  [<ffffffff81037f57>] local_apic_timer_interrupt+0x37/0x60 
[ 2460.065892]  [<ffffffff815d4faf>] smp_apic_timer_interrupt+0x3f/0x60 
[ 2460.096689]  [<ffffffff815d391d>] apic_timer_interrupt+0x6d/0x80 
[ 2460.125365]  <EOI>  <NMI>  [<ffffffff81085772>] ? up+0x32/0x50 
[ 2460.153627]  [<ffffffff815bcd19>] ? panic+0x193/0x1d7 
[ 2460.177540]  [<ffffffff815bcc83>] ? panic+0xfd/0x1d7 
[ 2460.200556]  [<ffffffff810ed9e0>] ? watchdog_enable_all_cpus.part.2+0x40/0x40 
[ 2460.234351]  [<ffffffff810edaa2>] watchdog_overflow_callback+0xc2/0xd0 
[ 2460.264699]  [<ffffffff8112d51e>] __perf_event_overflow+0x8e/0x230 
[ 2460.296476]  [<ffffffff8112c2e9>] ? perf_event_update_userpage+0x19/0x100 
[ 2460.327933]  [<ffffffff8112e094>] perf_event_overflow+0x14/0x20 
[ 2460.358240]  [<ffffffff8102867d>] intel_pmu_handle_irq+0x1bd/0x3c0 
[ 2460.387838]  [<ffffffff815cbf8b>] perf_event_nmi_handler+0x2b/0x50 
[ 2460.418081]  [<ffffffff815cb729>] nmi_handle.isra.0+0x59/0x90 
[ 2460.447745]  [<ffffffff815cb8c9>] do_nmi+0x169/0x340 
[ 2460.470891]  [<ffffffff815cabb1>] end_repeat_nmi+0x1e/0x2e 
[ 2460.499971]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2460.530675]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2460.561530]  [<ffffffff815ca26a>] ? _raw_spin_lock_irq+0x3a/0x60 
[ 2460.593590]  <<EOE>>  [<ffffffffa05fa680>] ft_acl_get+0x30/0x160 [tcm_fc] 
[ 2460.628198]  [<ffffffffa05fb547>] ft_prli+0x47/0x2c0 [tcm_fc] 
[ 2460.659372]  [<ffffffffa0447af3>] fc_rport_enter_prli+0xe3/0x2b0 [libfc] 
[ 2460.692845]  [<ffffffffa04493fb>] fc_rport_recv_req+0x53b/0x1280 [libfc] 
[ 2460.729646]  [<ffffffff8101a0b3>] ? native_sched_clock+0x13/0x80 
[ 2460.759043]  [<ffffffff8101a129>] ? sched_clock+0x9/0x10 
[ 2460.786126]  [<ffffffffa0445068>] fc_lport_recv_els_req+0x78/0x150 [libfc] 
[ 2460.822090]  [<ffffffffa0443d0a>] fc_lport_recv_req+0x8a/0xd0 [libfc] 
[ 2460.855206]  [<ffffffffa0441513>] fc_exch_recv+0x413/0x640 [libfc] 
[ 2460.885149]  [<ffffffffa047b329>] fcoe_percpu_receive_thread+0x299/0x53c [fcoe] 
[ 2460.921712]  [<ffffffffa047b090>] ? fcoe_set_port_id+0x50/0x50 [fcoe] 
[ 2460.953981]  [<ffffffff8107fc10>] kthread+0xc0/0xd0 
[ 2460.976836]  [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 
[ 2461.010859]  [<ffffffff815d2bec>] ret_from_fork+0x7c/0xb0 
[ 2461.038076]  [<ffffffff8107fb50>] ? kthread_create_on_node+0x110/0x110 
[ 2461.072842] ---[ end trace 874881bfbaa680ef ]---

Comment 5 Bruno Goncalves 2014-03-03 16:20:44 UTC
Created attachment 870027 [details]
vmcore-dmesg

Comment 8 Bruno Goncalves 2014-03-19 10:48:43 UTC
Created attachment 915871 [details]
Comment

(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).

Comment 12 Andy Grover 2014-04-04 17:32:53 UTC
have a proposed fix, pushing it upstream.

Comment 17 Jarod Wilson 2014-06-04 15:51:27 UTC
Patch(es) available on kernel-3.10.0-125.el7

Comment 19 Bruno Goncalves 2014-06-09 13:23:08 UTC
Reproduced on kernel 3.10.0-97.el7



Verified on 3.10.0-125.el7, more than 10 reboots and there was no crash.

Comment 20 Ludek Smid 2014-06-13 10:14:19 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Comment 21 Bruno Goncalves 2014-07-08 09:22:05 UTC
*** Bug 1099051 has been marked as a duplicate of this bug. ***

Comment 22 Jarod Wilson 2014-09-29 19:10:31 UTC
(In reply to Ludek Smid from comment #20)
> This request was resolved in Red Hat Enterprise Linux 7.0.
> 
> Contact your manager or support representative in case you have further
> questions about the request.

No it wasn't. 123.el7 was the 7.0 kernel, this went into a 7.1 kernel (125.el7).

Comment 24 Bruno Goncalves 2014-09-30 11:56:43 UTC
Verified since kernel -125

Comment 26 errata-xmlrpc 2015-03-05 11:41:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0290.html