Bug 1278224

Summary: panic in iscsi_target.c
Product: Red Hat Enterprise Linux 7 Reporter: artem
Component: kernelAssignee: Andy Grover <agrover>
kernel sub component: Kernel Target (LIO) QA Contact: Martin Hoyer <mhoyer>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: agrover, alex, artem, cleech, mhoyer, svpcom
Version: 7.2   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-3.10.0-482.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-03 14:05:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
cfg for 100 targets none

Description artem 2015-11-05 01:01:00 UTC
Description of problem:
Kernel panic
[ 3724.152026] kernel BUG at drivers/target/iscsi/iscsi_target.c:4473!
[ 3724.160310] invalid opcode: 0000 [#1] SMP 

Version-Release number of selected component (if applicable):
 3.10.0-306.0.1.el7.x86_64

How reproducible:
Happened on cycle 84 of restarting four iSER targets with four initiator connecting/disconnecting to each target.

Steps to Reproduce:
1. 3-node FlashGrid cluster, 4 iSER targets per node
2. in a loop restart flashgrid service on all nodes with 30 sec pause

Additional info:
-----------------------------------
[ 3724.135050] 1 connection(s) still exist for iSCSI session to iqn.2015-04.io.flashgrid:rac1
[ 3724.145603] ------------[ cut here ]------------
[ 3724.152026] kernel BUG at drivers/target/iscsi/iscsi_target.c:4473!
[ 3724.160310] invalid opcode: 0000 [#1] SMP 
[ 3724.166176] Modules linked in: nfsv3 nfs_acl dm_queue_length target_core_pscsi target_core_file target_core_iblock rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache dm_multipath intel_powerclamp coretemp ib_isert iscsi_target_mod target_core_mod intel_rapl ib_iser kvm_intel kvm libiscsi crct10dif_pclmul scsi_transport_iscsi crc32_pclmul crc32c_intel ghash_clmulni_intel ib_ipoib aesni_intel lrw rdma_ucm gf128mul glue_helper ib_ucm ablk_helper ib_uverbs cryptd ib_umad rdma_cm ib_cm iw_cm iTCO_wdt ipmi_ssif dm_mod ipmi_devintf iTCO_vendor_support ipmi_si sb_edac mei_me lpc_ich dcdbas mei ipmi_msghandler edac_core shpchp pcspkr mfd_core wmi acpi_power_meter binfmt_misc ext4 mbcache jbd2 mlx4_ib mlx4_en ib_sa ib_mad vxlan ip6_udp_tunnel ib_core udp_tunnel ib_addr sd_mod crc_t10dif
[ 3724.255074]  mgag200 crct10dif_common syscopyarea sysfillrect sysimgblt i2c_algo_bit bnx2x drm_kms_helper ttm mdio libcrc32c drm ptp mlx4_core nvme i2c_core megaraid_sas pps_core
[ 3724.274767] CPU: 3 PID: 45592 Comm: kworker/u289:4 Not tainted 3.10.0-306.0.1.el7.x86_64 #1
[ 3724.285694] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.3.6 06/03/2015
[ 3724.295847] Workqueue: isert_comp_wq isert_do_control_comp [ib_isert]
[ 3724.304661] task: ffff880c37adc500 ti: ffff880bee56c000 task.ti: ffff880bee56c000
[ 3724.314660] RIP: 0010:[<ffffffffa074190b>]  [<ffffffffa074190b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod]
[ 3724.328164] RSP: 0018:ffff880bee56fd78  EFLAGS: 00010246
[ 3724.335841] RAX: 000000000000004e RBX: ffff880bfba4b400 RCX: 0000000000000000
[ 3724.345552] RDX: 0000000000000000 RSI: ffff880c4e46d6c8 RDI: 0000000000000246
[ 3724.355277] RBP: ffff880bee56fd90 R08: 0000000000000086 R09: 0000000000005519
[ 3724.364994] R10: 00000000000003ff R11: 0000000000000001 R12: 0000000000000000
[ 3724.368819] sd 1331:0:0:0: alua: Detached
[ 3724.374798] sd 1343:0:0:0: alua: Detached
[ 3724.383821] sd 1341:0:0:0: alua: Detached
[ 3724.389164] sd 1342:0:0:0: alua: Detached
[ 3724.392272] sd 1338:0:0:0: alua: Detached
[ 3724.395615] sd 1336:0:0:0: alua: Detached
[ 3724.397048] sd 1340:0:0:0: alua: Detached
[ 3724.398468] sd 1334:0:0:0: alua: Detached
[ 3724.423751] R13: ffff880be322a000 R14: ffff880c3318e800 R15: 0000000000002420
[ 3724.433361] FS:  0000000000000000(0000) GS:ffff880c4e460000(0000) knlGS:0000000000000000
[ 3724.444093] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3724.452241] CR2: 00007f37a42f4000 CR3: 0000000001946000 CR4: 00000000001407e0
[ 3724.461976] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3724.471699] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3724.481420] Stack:
[ 3724.485420]  ffff880c34e54800 0000000000000000 ffff880bfba4b400 ffff880bee56fda0
[ 3724.495554]  ffffffffa0747cd2 ffff880bee56fdb0 ffffffffa06c508e ffff880bee56fdc0
[ 3724.505709]  ffffffffa06c6b20 ffff880bee56fdf0 ffffffffa07419e5 ffff880bf5ca9988
[ 3724.515845] Call Trace:
[ 3724.520394]  [<ffffffffa0747cd2>] lio_tpg_close_session+0x12/0x20 [iscsi_target_mod]
[ 3724.530906]  [<ffffffffa06c508e>] target_release_session+0x1e/0x20 [target_core_mod]
[ 3724.541412]  [<ffffffffa06c6b20>] target_put_session+0x40/0x50 [target_core_mod]
[ 3724.551518]  [<ffffffffa07419e5>] iscsit_logout_post_handler+0xb5/0x200 [iscsi_target_mod]
[ 3724.562621]  [<ffffffffa070e0c4>] isert_do_control_comp+0xc4/0xf0 [ib_isert]
[ 3724.572366]  [<ffffffff8109c59b>] process_one_work+0x17b/0x470
[ 3724.580777]  [<ffffffff8109d36b>] worker_thread+0x11b/0x400
[ 3724.588858]  [<ffffffff8109d250>] ? rescuer_thread+0x400/0x400
[ 3724.597209]  [<ffffffff810a4a8f>] kthread+0xcf/0xe0
[ 3724.604472]  [<ffffffff810a49c0>] ? kthread_create_on_node+0x140/0x140
[ 3724.613580]  [<ffffffff81643458>] ret_from_fork+0x58/0x90
[ 3724.621380]  [<ffffffff810a49c0>] ? kthread_create_on_node+0x140/0x140
[ 3724.630395] Code: 60 dd 75 a0 31 c0 e8 c5 1b bd e0 e9 12 ff ff ff 8b b7 b8 00 00 00 48 8b 97 f0 01 00 00 31 c0 48 c7 c7 60 52 75 a0 e8 4f af ee e0 <0f> 0b 48 c7 c6 38 52 75 a0 48 c7 c7 b0 dd 75 a0 31 c0 e8 8e 1b 
[ 3724.655568] RIP  [<ffffffffa074190b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod]
[ 3724.666412]  RSP <ffff880bee56fd78>

Comment 3 Andy Grover 2015-11-05 19:29:49 UTC
Hi Artem,

I think this is a race between the login and logout code. Would it be ok for me to make this bug publicly viewable? This would let me reference it when discussing the issue upstream.

Comment 4 artem 2015-11-05 19:36:50 UTC
(In reply to Andy Grover from comment #3)
> Hi Artem,
> 
> I think this is a race between the login and logout code. Would it be ok for
> me to make this bug publicly viewable? This would let me reference it when
> discussing the issue upstream.

Sure

Comment 5 artem 2015-12-28 23:39:17 UTC
Any luck with this bug?
We got almost identical panic on RH 3.10-327.3.1.el7


[ 1116.996324] sd 1100:0:0:0: alua: Detached
[ 1117.041915] sd 1136:0:0:0: alua: Detached
[ 1117.077978] show_signal_msg: 34 callbacks suppressed
[ 1117.084445] iscsiadm[63090]: segfault at 40 ip 00007f0ed969c5b0 sp 00007ffed6d67450 error 4 in iscsiadm[7f0ed966e000+7a000]
[ 1117.120251] sd 1057:0:0:0: alua: Detached
[ 1117.133158] sd 1061:0:0:0: alua: Detached
[ 1117.147288] sd 1059:0:0:0: alua: Detached
[ 1117.154597]  connection1057:0: detected conn error (1011)
[ 1117.246026] sd 1074:0:0:0: alua: Detached
[ 1117.261870] sd 1070:0:0:0: alua: Detached
[ 1117.293683] sd 1086:0:0:0: alua: Detached
[ 1117.303633] sd 1058:0:0:0: alua: Detached
[ 1117.313978] sd 1062:0:0:0: alua: Detached
[ 1117.330080]  connection1062:0: detected conn error (1011)
[ 1117.342118] sd 1063:0:0:0: alua: Detached
[ 1117.357164] sd 1125:0:0:0: alua: Detached
[ 1117.357218] sd 1084:0:0:0: alua: Detached
[ 1117.371230] sd 1118:0:0:0: alua: Detached
[ 1117.376824] sd 1088:0:0:0: alua: Detached
[ 1117.401932]  connection1088:0: detected conn error (1011)
[ 1117.405973] sd 1090:0:0:0: alua: Detached
[ 1117.406556] sd 1069:0:0:0: alua: Detached
[ 1117.423590]  connection1118:0: detected conn error (1011)
[ 1117.442144] sd 1102:0:0:0: alua: Detached
[ 1117.445572] sd 1081:0:0:0: alua: Detached
[ 1117.451705] sd 1087:0:0:0: alua: Detached
[ 1117.457592] sd 1060:0:0:0: alua: Detached
[ 1117.458261] sd 1134:0:0:0: alua: Detached
[ 1117.460056] sd 1130:0:0:0: alua: Detached
[ 1117.461869] sd 1072:0:0:0: alua: Detached
[ 1117.480426] sd 1067:0:0:0: alua: Detached
[ 1117.488646] sd 1138:0:0:0: alua: Detached
[ 1117.508845]  connection1102:0: detected conn error (1011)
[ 1117.513740] sd 1094:0:0:0: alua: Detached
[ 1117.531806]  connection1134:0: detected conn error (1011)
[ 1117.531856] sd 1101:0:0:0: alua: Detached
[ 1117.544218]  connection1081:0: detected conn error (1011)
[ 1117.545087]  connection1072:0: detected conn error (1011)
[ 1117.545090]  connection1067:0: detected conn error (1011)
[ 1117.546620] sd 1066:0:0:0: alua: Detached
[ 1117.570188]  connection1060:0: detected conn error (1011)
[ 1117.591323] sd 1078:0:0:0: alua: Detached
[ 1117.591344]  connection1101:0: detected conn error (1011)
[ 1117.594550]  connection1094:0: detected conn error (1011)
[ 1117.598501]  connection1066:0: detected conn error (1011)
[ 1117.615319] sd 1083:0:0:0: alua: Detached
[ 1117.615511] sd 1065:0:0:0: alua: Detached
[ 1117.622793] sd 1089:0:0:0: alua: Detached
[ 1117.629740] sd 1092:0:0:0: alua: Detached
[ 1117.630165] sd 1077:0:0:0: alua: Detached
[ 1117.630290] sd 1071:0:0:0: alua: Detached
[ 1117.630662] sd 1075:0:0:0: alua: Detached
[ 1117.631041] sd 1117:0:0:0: alua: Detached
[ 1117.631922] sd 1143:0:0:0: alua: Detached
[ 1117.632156] sd 1109:0:0:0: alua: Detached
[ 1117.636650] sd 1144:0:0:0: alua: Detached
[ 1117.640147] sd 1127:0:0:0: alua: Detached
[ 1117.693098] sd 1108:0:0:0: alua: Detached
[ 1117.694817] sd 1085:0:0:0: alua: Detached
[ 1117.703864]  connection1078:0: detected conn error (1011)
[ 1117.711156] sd 1110:0:0:0: alua: Detached
[ 1117.727565] sd 1082:0:0:0: alua: Detached
[ 1117.729281]  connection1143:0: detected conn error (1011)
[ 1117.729835]  connection1127:0: detected conn error (1011)
[ 1117.730450]  connection1117:0: detected conn error (1011)
[ 1117.731420]  connection1144:0: detected conn error (1011)
[ 1117.731464] sd 1113:0:0:0: alua: Detached
[ 1117.732038]  connection1065:0: detected conn error (1011)
[ 1117.733177]  connection1077:0: detected conn error (1011)
[ 1117.733749]  connection1092:0: detected conn error (1011)
[ 1117.735274]  connection1085:0: detected conn error (1011)
[ 1117.735304]  connection1109:0: detected conn error (1011)
[ 1117.738090]  connection1075:0: detected conn error (1011)
[ 1117.739409]  connection1089:0: detected conn error (1011)
[ 1117.749207]  connection1130:0: detected conn error (1011)
[ 1117.769466] sd 1133:0:0:0: alua: Detached
[ 1117.797064] sd 1099:0:0:0: alua: Detached
[ 1117.797322] sd 1103:0:0:0: alua: Detached
[ 1117.797651] sd 1104:0:0:0: alua: Detached
[ 1117.800058] sd 1107:0:0:0: alua: Detached
[ 1117.802510] sd 1129:0:0:0: alua: Detached
[ 1117.802622] sd 1095:0:0:0: alua: Detached
[ 1117.802824] sd 1119:0:0:0: alua: Detached
[ 1117.803104] sd 1115:0:0:0: alua: Detached
[ 1117.803184] sd 1073:0:0:0: alua: Detached
[ 1117.803244] sd 1120:0:0:0: alua: Detached
[ 1117.803306] sd 1091:0:0:0: alua: Detached
[ 1117.803369] sd 1079:0:0:0: alua: Detached
[ 1117.803421] sd 1064:0:0:0: alua: Detached
[ 1117.803496] sd 1105:0:0:0: alua: Detached
[ 1117.803535] sd 1131:0:0:0: alua: Detached
[ 1117.805940] sd 1139:0:0:0: alua: Detached
[ 1117.806669] sd 1068:0:0:0: alua: Detached
[ 1117.807928] sd 1141:0:0:0: alua: Detached
[ 1117.811465] sd 1114:0:0:0: alua: Detached
[ 1117.812912] sd 1098:0:0:0: alua: Detached
[ 1117.813091] sd 1135:0:0:0: alua: Detached
[ 1117.815014] sd 1112:0:0:0: alua: Detached
[ 1117.828987] sd 1096:0:0:0: alua: Detached
[ 1117.834454] sd 1142:0:0:0: alua: Detached
[ 1117.838657] sd 1106:0:0:0: alua: Detached
[ 1117.839686] sd 1080:0:0:0: alua: Detached
[ 1117.840115] sd 1097:0:0:0: alua: Detached
[ 1117.841528] sd 1116:0:0:0: alua: Detached
[ 1117.851700] sd 1123:0:0:0: alua: Detached
[ 1117.856030] sd 1121:0:0:0: alua: Detached
[ 1117.856359] sd 1122:0:0:0: alua: Detached
[ 1117.858753] sd 1093:0:0:0: alua: Detached
[ 1117.865049] sd 1140:0:0:0: alua: Detached
[ 1117.866797] sd 1124:0:0:0: alua: Detached
[ 1117.872904] sd 1126:0:0:0: alua: Detached
[ 1117.875825] sd 1132:0:0:0: alua: Detached
[ 1117.986967] 1 connection(s) still exist for iSCSI session to iqn.2015-04.io.flashgrid:rac1
[ 1117.986969] 1 connection(s) still exist for iSCSI session to iqn.2015-04.io.flashgrid:rac1
[ 1117.986971] 1 connection(s) still exist for iSCSI session to iqn.2015-04.io.flashgrid:rac1
[ 1117.987001] ------------[ cut here ]------------
[ 1117.987003] kernel BUG at drivers/target/iscsi/iscsi_target.c:4472!
[ 1117.987005] invalid opcode: 0000 [#1] SMP 
[ 1117.987030] Modules linked in: nfsv3 nfs_acl dm_queue_length target_core_pscsi target_core_file target_core_iblock rpcsec_gss_krb5 a0
[ 1117.987034]  crc_t10dif syscopyarea crct10dif_common sysfillrect sysimgblt bnx2x i2c_algo_bit drm_kms_helper ttm mdio libcrc32c drm e
[ 1117.987036] CPU: 1 PID: 2546 Comm: kworker/u289:1 Not tainted 3.10.0-327.3.1.el7.x86_64 #1
[ 1117.987037] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.3.6 06/03/2015
[ 1117.987041] Workqueue: isert_comp_wq isert_do_control_comp [ib_isert]
[ 1117.987042] task: ffff880bff258b80 ti: ffff880bf4168000 task.ti: ffff880bf4168000
[ 1117.987053] RIP: 0010:[<ffffffffa073d88b>]  [<ffffffffa073d88b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod]
[ 1117.987054] RSP: 0018:ffff880bf416bd78  EFLAGS: 00010246
[ 1117.987055] RAX: 000000000000004e RBX: ffff880bba78f000 RCX: 000000000000137e
[ 1117.987055] RDX: 0000000000000000 RSI: 0000000000000086 RDI: 0000000000000246
[ 1117.987056] RBP: ffff880bf416bd90 R08: 0000000000000086 R09: ffffffff81d5a22e
[ 1117.987057] R10: 00000000000504ac R11: 0000000000100000 R12: 0000000000000000
[ 1117.987057] R13: ffff880bc758b000 R14: ffff880c316ae900 R15: 0000000000002420
[ 1117.987058] FS:  0000000000000000(0000) GS:ffff880c4e420000(0000) knlGS:0000000000000000
[ 1117.987059] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1117.987060] CR2: 00007f689a28fbe8 CR3: 000000000194a000 CR4: 00000000001407e0
[ 1117.987060] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1117.987061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1117.987061] Stack:
[ 1117.987064]  ffff880bdc5a3800 0000000000000000 ffff880bba78f000 ffff880bf416bda0
[ 1117.987067]  ffffffffa0743c52 ffff880bf416bdb0 ffffffffa06dc09e ffff880bf416bdc0
[ 1117.987069]  ffffffffa06ddb30 ffff880bf416bdf0 ffffffffa073d965 ffff880bdffc1ca0
[ 1117.987069] Call Trace:
[ 1117.987077]  [<ffffffffa0743c52>] lio_tpg_close_session+0x12/0x20 [iscsi_target_mod]
[ 1117.987087]  [<ffffffffa06dc09e>] target_release_session+0x1e/0x20 [target_core_mod]
[ 1117.987093]  [<ffffffffa06ddb30>] target_put_session+0x40/0x50 [target_core_mod]
[ 1117.987098]  [<ffffffffa073d965>] iscsit_logout_post_handler+0xb5/0x200 [iscsi_target_mod]
[ 1117.987101]  [<ffffffffa05340b4>] isert_do_control_comp+0xc4/0xf0 [ib_isert]
[ 1117.987106]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[ 1117.987108]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[ 1117.987110]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[ 1117.987113]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
[ 1117.987116]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 1117.987120]  [<ffffffff81645818>] ret_from_fork+0x58/0x90
[ 1117.987123]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 1117.987138] Code: 60 9d 75 a0 31 c0 e8 f5 7a bd e0 e9 12 ff ff ff 8b b7 b8 00 00 00 48 8b 97 f0 01 00 00 31 c0 48 c7 c7 88 12 75 a0  
[ 1117.987142] RIP  [<ffffffffa073d88b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod]
[ 1117.987143]  RSP <ffff880bf416bd78>

Comment 6 artem 2016-03-23 19:19:11 UTC
Andy,
Do you know of any progress with this issue?
It keeps happening with newer kernel 3.10.0-327.10.1.el7.x86_64
In a very large configuration with 120 targets and 4 initiators per target we are seeing it very frequently.


[48092.485432] kernel BUG at drivers/target/iscsi/iscsi_target.c:4472!
[48092.486410] invalid opcode: 0000 [#1] SMP 
[48092.487401] Modules linked in: dm_queue_length ib_iser(OE) libiscsi target_core_pscsi target_core_file target_core_iblock ib_isert(OE) iscsi_target_mod target_core_mod nfsv3 rpcsec_gss_krb5 nfsv4 dm_multipath dns_resolver nfs fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_en(OE) vxlan ip6_udp_tunnel udp_tunnel mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) mlx4_core(OE) mlx_compat(OE) intel_powerclamp coretemp intel_rapl kvm_intel kvm sb_edac pcspkr iTCO_wdt edac_core iTCO_vendor_support dcdbas ipmi_ssif sg ipmi_devintf lpc_ich ipmi_si ipmi_msghandler mfd_core mei_me mei shpchp acpi_power_meter nfsd auth_rpcgss binfmt_misc nfs_acl lockd grace ip_tables xfs sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul
[48092.493891]  crct10dif_common crc32_pclmul crc32c_intel mgag200 ghash_clmulni_intel syscopyarea sysfillrect sysimgblt i2c_algo_bit aesni_intel drm_kms_helper lrw gf128mul ttm glue_helper bnx2x ablk_helper cryptd drm ptp i2c_core pps_core megaraid_sas nvme mdio libcrc32c wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod scsi_transport_iscsi [last unloaded: mlx5_core]
[48092.498732] CPU: 16 PID: 294498 Comm: kworker/u577:3 Tainted: G           OE  ------------   3.10.0-327.10.1.el7.x86_64 #1
[48092.499881] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.5.4 10/002/2015
[48092.501065] Workqueue: isert_comp_wq isert_do_control_comp [ib_isert]
[48092.502231] task: ffff880a74f87300 ti: ffff880b17160000 task.ti: ffff880b17160000
[48092.503495] RIP: 0010:[<ffffffffa095f88b>]  [<ffffffffa095f88b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod]
[48092.504686] RSP: 0018:ffff880b17163d78  EFLAGS: 00010246
[48092.505815] RAX: 000000000000004f RBX: ffff880bbde05400 RCX: 0000000000000000
[48092.507044] RDX: 0000000000000000 RSI: ffff880c4d40d6c8 RDI: 0000000000000246
[48092.508165] RBP: ffff880b17163d90 R08: 0000000000000086 R09: 000000000000ea80
[48092.509268] R10: 00000000000003ff R11: 0000000000000001 R12: 0000000000000000
[48092.510495] R13: ffff88006fbd1000 R14: ffff880bc1f68900 R15: 0000000000004820
[48092.511787] FS:  0000000000000000(0000) GS:ffff880c4d400000(0000) knlGS:0000000000000000
[48092.512937] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[48092.514241] CR2: 00007ff3f4d290a0 CR3: 000000000194a000 CR4: 00000000001407e0
[48092.515525] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[48092.516681] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[48092.517804] Stack:
[48092.519036]  ffff880b384c4000 0000000000000000 ffff880bbde05400 ffff880b17163da0
[48092.520322]  ffffffffa0965c52 ffff880b17163db0 ffffffffa08fe09e ffff880b17163dc0
[48092.521552]  ffffffffa08ffb30 ffff880b17163df0 ffffffffa095f965 ffff880ba94024b8
[48092.522791] Call Trace:
[48092.524062]  [<ffffffffa0965c52>] lio_tpg_close_session+0x12/0x20 [iscsi_target_mod]
[48092.525236]  [<ffffffffa08fe09e>] target_release_session+0x1e/0x20 [target_core_mod]
[48092.526502]  [<ffffffffa08ffb30>] target_put_session+0x40/0x50 [target_core_mod]
[48092.527903]  [<ffffffffa095f965>] iscsit_logout_post_handler+0xb5/0x200 [iscsi_target_mod]
[48092.529167]  [<ffffffffa0850e64>] isert_do_control_comp+0xc4/0xf0 [ib_isert]
[48092.529934]  [<ffffffff8109d5db>] process_one_work+0x17b/0x470
[48092.530725]  [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
[48092.531492]  [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
[48092.532255]  [<ffffffff810a5acf>] kthread+0xcf/0xe0
[48092.533008]  [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
[48092.533770]  [<ffffffff81645998>] ret_from_fork+0x58/0x90
[48092.534513]  [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
[48092.535263] Code: 60 bd 97 a0 31 c0 e8 d5 5b 9b e0 e9 12 ff ff ff 8b b7 b8 00 00 00 48 8b 97 f0 01 00 00 31 c0 48 c7 c7 88 32 97 a0 e8 2d f4 cc e0 <0f> 0b 48 c7 c6 60 32 97 a0 48 c7 c7 b0 bd 97 a0 31 c0 e8 9e 5b 
[48092.536812] RIP  [<ffffffffa095f88b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod]
[48092.537677]  RSP <ffff880b17163d78>

Comment 7 artem 2016-03-24 00:01:49 UTC
It looks like we are hitting this BUG():


int iscsit_close_session(struct iscsi_session *sess)
{
        struct iscsi_portal_group *tpg = sess->tpg;
        struct se_portal_group *se_tpg = &tpg->tpg_se_tpg;
​
        if (atomic_read(&sess->nconn)) {
                pr_err("%d connection(s) still exist for iSCSI session"
                        " to %s\n", atomic_read(&sess->nconn),
                        sess->sess_ops->InitiatorName);
                BUG();
        }
​
        spin_lock_bh(&se_tpg->session_lock);
        atomic_set(&sess->session_logout, 1);
        atomic_set(&sess->session_reinstatement, 1);
        iscsit_stop_time2retain_timer(sess);
        spin_unlock_bh(&se_tpg->session_lock);

Comment 8 artem 2016-03-24 00:25:38 UTC
Including the dump again with two additional messages at the top that provide important pointers to the source of the problem.

[ 3633.644221] isert: isert_disconnected_handler: conn ffff8801e5a96000 teminating in state 5
[ 3633.665332] 1 connection(s) still exist for iSCSI session to iqn.2015-04.io.flashgrid:ora03
[ 3633.666229] ------------[ cut here ]------------
[ 3633.667025] kernel BUG at drivers/target/iscsi/iscsi_target.c:4472!
[ 3633.667187] invalid opcode: 0000 [#1] SMP 
[ 3633.667187] Modules linked in: dm_queue_length sd_mod sg target_core_pscsi target_core_file target_core_iblock nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver dm_multipath dm_mod nfs fscache xprtrdma(OE) ib_isert(OE) iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic ib_iser(OE) libiscsi scsi_transport_iscsi ib_srpt(OE) ib_srp(OE) scsi_transport_srp(OE) ib_ipoib(OE) rdma_ucm(OE) ib_ucm(OE) ib_uverbs(OE) ib_umad(OE) rdma_cm(OE) ib_cm(OE) iw_cm(OE) ib_sa(OE) ib_mad(OE) intel_rapl crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ppdev parport_pc i2c_piix4 parport pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c ata_generic mlx5_ib(OE) ib_core(OE) ib_addr(OE) pata_acpi cirrus syscopyarea xen_blkfront sysfillrect xen_netfront
[ 3633.667187]  sysimgblt mlx5_core(OE) drm_kms_helper vxlan ttm crct10dif_pclmul crct10dif_common ip6_udp_tunnel crc32c_intel udp_tunnel mlx_compat(OE) drm ptp ata_piix nvme serio_raw libata pps_core i2c_core floppy
[ 3633.675826] CPU: 2 PID: 25112 Comm: kworker/u31:5 Tainted: G           OE  ------------   3.10.0-327.10.1.el7.x86_64 #1
[ 3633.675826] Hardware name: Xen HVM domU, BIOS 4.3.1OVM 01/24/2016
[ 3633.675826] Workqueue: isert_comp_wq isert_do_control_comp [ib_isert]
[ 3633.675826] task: ffff8800c8abdc00 ti: ffff8801ca0b0000 task.ti: ffff8801ca0b0000
[ 3633.679905] RIP: 0010:[<ffffffffa05e488b>]  [<ffffffffa05e488b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod]
[ 3633.679905] RSP: 0018:ffff8801ca0b3d78  EFLAGS: 00010246
[ 3633.679905] RAX: 000000000000004f RBX: ffff8801fa0f3800 RCX: 0000000000000000
[ 3633.683363] RDX: 0000000000000000 RSI: ffff88020fc4d6c8 RDI: 0000000000000246
[ 3633.683363] RBP: ffff8801ca0b3d90 R08: 0000000000000086 R09: 0000000000004b99
[ 3633.683363] R10: 00000000000003ff R11: 0000000000000002 R12: 0000000000000000
[ 3633.683363] R13: ffff8800d6888000 R14: ffff8800e0ddf400 R15: 00000000000003e0
[ 3633.683363] FS:  0000000000000000(0000) GS:ffff88020fc40000(0000) knlGS:0000000000000000
[ 3633.683363] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3633.689750] CR2: 00007fe1a3a9e000 CR3: 00000001c0ec9000 CR4: 00000000001406e0
[ 3633.689750] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3633.689750] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3633.689750] Stack:
[ 3633.689750]  ffff8800c84c9000 0000000000000000 ffff8801fa0f3800 ffff8801ca0b3da0
[ 3633.689750]  ffffffffa05eac52 ffff8801ca0b3db0 ffffffffa056809e ffff8801ca0b3dc0
[ 3633.689750]  ffffffffa0569b30 ffff8801ca0b3df0 ffffffffa05e4965 ffff8800c031dd58
[ 3633.689750] Call Trace:
[ 3633.689750]  [<ffffffffa05eac52>] lio_tpg_close_session+0x12/0x20 [iscsi_target_mod]
[ 3633.689750]  [<ffffffffa056809e>] target_release_session+0x1e/0x20 [target_core_mod]
[ 3633.689750]  [<ffffffffa0569b30>] target_put_session+0x40/0x50 [target_core_mod]
[ 3633.689750]  [<ffffffffa05e4965>] iscsit_logout_post_handler+0xb5/0x200 [iscsi_target_mod]
[ 3633.689750]  [<ffffffffa05b0e64>] isert_do_control_comp+0xc4/0xf0 [ib_isert]
[ 3633.689750]  [<ffffffff8109d5db>] process_one_work+0x17b/0x470
[ 3633.689750]  [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
[ 3633.689750]  [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
[ 3633.689750]  [<ffffffff810a5acf>] kthread+0xcf/0xe0
[ 3633.689750]  [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
[ 3633.689750]  [<ffffffff81645998>] ret_from_fork+0x58/0x90
[ 3633.689750]  [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
[ 3633.689750] Code: 60 0d 60 a0 31 c0 e8 d5 0b d3 e0 e9 12 ff ff ff 8b b7 b8 00 00 00 48 8b 97 f0 01 00 00 31 c0 48 c7 c7 88 82 5f a0 e8 2d a4 04 e1 <0f> 0b 48 c7 c6 60 82 5f a0 48 c7 c7 b0 0d 60 a0 31 c0 e8 9e 0b 
[ 3633.689750] RIP  [<ffffffffa05e488b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod]
[ 3633.689750]  RSP <ffff8801ca0b3d78>

Comment 9 Andy Grover 2016-03-24 00:39:00 UTC
isert state 5 is ISER_CONN_DOWN.

Comment 10 artem 2016-06-08 14:05:39 UTC
Andy, do you have any updates regarding this bug by any chance?

Comment 11 artem 2016-07-13 22:57:14 UTC
The problem happens during simultaneous logouts (no logins) - either from the same initiator system or from different initiator systems.

Below are steps for the easiest way to reproduce the problem on one target system + one initiator system.
It happens on 3.10.0-327.22.2.el7.x86_64 with either inbox or Mellanox OFED 3.3 IB/iser/isert drivers.
The problem does NOT happen on kernel 4.6.2 with inbox drivers.


1) on TARGET SYSTEM create 100 targets on loopback devices:

losetup -D
for i in {1..100}; do
   dd if=/dev/zero of=/dev/shm/loop$i.src bs=1M count=10 2>/dev/null
   losetup /dev/loop$i /dev/shm/loop$i.src
done
/usr/bin/targetctl restore 100targets.cfg  #100targets.cfg attached to bz


2) on INITIATOR SYSTEM create and connect iSER initiators:

  for t in {1..100}; do
    iscsiadm -m node --op new  -p 192.168.100.1:3260,1 -T iqn.2015-04.io.flashgrid:ora01.cq542000c0400agn-p$t -I iser
    iscsiadm -m node --login -p 192.168.100.1:3260,1 -T iqn.2015-04.io.flashgrid:ora01.cq542000c0400agn-p$t -I iser
  done;


3) on TARGET SYSTEM start high priority processes (#processes = 8 X #cores):

 for i in {1..32}; do nice -n -20 dd if=/dev/zero of=/dev/zero bs=512 & done;


4) on INITIATOR SYSTEM log out all initiators simultaneously:

iscsiadm -m node --logout


RESULT: the target system crashed.
100% reproducible on a system with 4 cores.


[26916.770371] 1 connection(s) still exist for iSCSI session to iqn.2015-04.io.flashgrid:rac2
[26916.771860] ------------[ cut here ]------------
[26916.773268] kernel BUG at drivers/target/iscsi/iscsi_target.c:4472!
[26916.774682] invalid opcode: 0000 [#1] SMP
[26916.776080] Modules linked in: loop oracleacfs(POE) oracleadvm(POE) oracleoks(POE) dm_queue_length iscsi_tcp libiscsi_tcp target_core_pscsi target_core_file target_core_iblock nfsv3 dm_multipath rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad intel_powerclamp coretemp intel_rapl kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sb_edac mei_me iTCO_wdt ipmi_devintf mei sg edac_core iTCO_vendor_support ipmi_ssif dcdbas lpc_ich pcspkr ipmi_si mfd_core ipmi_msghandler mxm_wmi wmi shpchp acpi_power_meter nfsd auth_rpcgss nfs_acl lockd binfmt_misc grace sunrpc ip_tables xfs sd_mod crc_t10dif crct10dif_generic
[26916.785156]  mlx5_ib ib_core ib_addr mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit crct10dif_pclmul drm_kms_helper crct10dif_common bnx2x crc32c_intel ttm drm ahci mdio libahci ptp libata i2c_core pps_core nvme megaraid_sas mlx5_core libcrc32c dm_mirror dm_region_hash dm_log dm_mod
[26916.789880] CPU: 11 PID: 263167 Comm: kworker/u769:95 Tainted: P           OE  ------------   3.10.0-327.22.2.el7.x86_64 #1
[26916.791479] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 2.0.2 03/15/2016
[26916.793084] Workqueue: isert_comp_wq isert_do_control_comp [ib_isert]
[26916.794678] task: ffff880c32f4e780 ti: ffff880b708e8000 task.ti: ffff880b708e8000
[26916.796279] RIP: 0010:[<ffffffffa07d288b>]  [<ffffffffa07d288b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod]
[26916.797913] RSP: 0018:ffff880b708ebd78  EFLAGS: 00010246
[26916.799522] RAX: 000000000000004e RBX: ffff880c13a14000 RCX: 0000000000000000
[26916.801131] RDX: 0000000000000000 RSI: ffff880c4c76d6c8 RDI: 0000000000000246
[26916.802728] RBP: ffff880b708ebd90 R08: 0000000000000086 R09: 0000000000000c1b
[26916.804308] R10: 00000000000003ff R11: 0000000000000001 R12: 0000000000000000
[26916.805877] R13: ffff880b32149000 R14: ffff880bfe173f00 R15: 0000000000006020
[26916.807432] FS:  0000000000000000(0000) GS:ffff880c4c760000(0000) knlGS:0000000000000000
[26916.808996] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[26916.810538] CR2: 0000000001549200 CR3: 0000000b6fab7000 CR4: 00000000001407e0
[26916.812088] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[26916.813617] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[26916.815129] Stack:
[26916.816611]  ffff880b7113a800 0000000000000000 ffff880c13a14000 ffff880b708ebda0
[26916.818131]  ffffffffa07d8c52 ffff880b708ebdb0 ffffffffa077109e ffff880b708ebdc0
[26916.819637]  ffffffffa0772b30 ffff880b708ebdf0 ffffffffa07d2965 ffff880bb2943218
[26916.821130] Call Trace:
[26916.822590]  [<ffffffffa07d8c52>] lio_tpg_close_session+0x12/0x20 [iscsi_target_mod]
[26916.824081]  [<ffffffffa077109e>] target_release_session+0x1e/0x20 [target_core_mod]
[26916.825557]  [<ffffffffa0772b30>] target_put_session+0x40/0x50 [target_core_mod]
[26916.827025]  [<ffffffffa07d2965>] iscsit_logout_post_handler+0xb5/0x200 [iscsi_target_mod]
[26916.828483]  [<ffffffffa07090b4>] isert_do_control_comp+0xc4/0xf0 [ib_isert]
[26916.829938]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[26916.831372]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[26916.832798]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[26916.834203]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
[26916.835594]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[26916.836980]  [<ffffffff816467d8>] ret_from_fork+0x58/0x90
[26916.838351]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[26916.839710] Code: 60 ed 7e a0 31 c0 e8 45 33 b4 e0 e9 12 ff ff ff 8b b7 b8 00 00 00 48 8b 97 f0 01 00 00 31 c0 48 c7 c7 88 62 7e a0 e8 7d d2 e5 e0 <0f> 0b 48 c7 c6 60 62 7e a0 48 c7 c7 b0 ed 7e a0 31 c0 e8 0e 33
[26916.842617] RIP  [<ffffffffa07d288b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod]
[26916.844023]  RSP <ffff880b708ebd78>

Comment 12 artem 2016-07-13 22:59:19 UTC
Created attachment 1179494 [details]
cfg for 100 targets

Comment 13 artem 2016-07-21 21:58:53 UTC
With the following iscsi_target.c patches applied to 3.10.0-327.22.2.el7.x86_64 the problem goes away

https://patchwork.kernel.org/patch/6856841/
https://patchwork.kernel.org/patch/6856851/
https://patchwork.kernel.org/patch/6856861/
https://patchwork.kernel.org/patch/6856871/

These patches are the only differences in iscsi_target.c between 3.10.0-327.22.2.el7.x86_64 and vanilla 4.3.6 kernels.

Comment 14 Andy Grover 2016-07-26 15:14:59 UTC
*** Bug 1312456 has been marked as a duplicate of this bug. ***

Comment 15 Andy Grover 2016-07-27 03:55:21 UTC
The fourth patch should not be needed, because the rhel kernel has had it applied already, and in fact superseded by bd3792205aaeb.

Can you please check to see if the first three patches alone resolve the issue, or if the four patches plus the additional commit above resolve this issue? Thanks.

Comment 16 artem 2016-07-27 17:25:03 UTC
The first three patches resolve the issue when used with ib_isert from MOFED 3.3. We actually tested with ib_isert from Mellanox OFED 3.3. So, the fourth patch that applies to ib_isert was not really used in our last round of testing.

Comment 17 Rafael Aquini 2016-08-01 16:43:00 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 19 Rafael Aquini 2016-08-02 13:45:33 UTC
Patch(es) available on kernel-3.10.0-482.el7

Comment 21 Vasily Evseenko 2016-08-02 14:24:32 UTC
Are there download link for kernel-3.10.0-482.el7?

Comment 22 Martin Hoyer 2016-08-03 14:55:34 UTC
With updated kernel to 3.10.0-482.el7 it is not reproducible anymore.

Comment 24 Andy Grover 2016-08-03 15:45:47 UTC
(In reply to Vasily Evseenko from comment #21)
> Are there download link for kernel-3.10.0-482.el7?

I'm afraid not -- first public release of a kernel with these fixes will be in RHEL 7.3 beta, whose release date has not yet been announced.

Comment 26 errata-xmlrpc 2016-11-03 14:05:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2574.html