Bug 1278224
Summary: | panic in iscsi_target.c | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | artem | ||||
Component: | kernel | Assignee: | Andy Grover <agrover> | ||||
kernel sub component: | Kernel Target (LIO) | QA Contact: | Martin Hoyer <mhoyer> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | urgent | ||||||
Priority: | unspecified | CC: | agrover, alex, artem, cleech, mhoyer, svpcom | ||||
Version: | 7.2 | ||||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-3.10.0-482.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-11-03 14:05:42 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
artem
2015-11-05 01:01:00 UTC
Hi Artem, I think this is a race between the login and logout code. Would it be ok for me to make this bug publicly viewable? This would let me reference it when discussing the issue upstream. (In reply to Andy Grover from comment #3) > Hi Artem, > > I think this is a race between the login and logout code. Would it be ok for > me to make this bug publicly viewable? This would let me reference it when > discussing the issue upstream. Sure Any luck with this bug? We got almost identical panic on RH 3.10-327.3.1.el7 [ 1116.996324] sd 1100:0:0:0: alua: Detached [ 1117.041915] sd 1136:0:0:0: alua: Detached [ 1117.077978] show_signal_msg: 34 callbacks suppressed [ 1117.084445] iscsiadm[63090]: segfault at 40 ip 00007f0ed969c5b0 sp 00007ffed6d67450 error 4 in iscsiadm[7f0ed966e000+7a000] [ 1117.120251] sd 1057:0:0:0: alua: Detached [ 1117.133158] sd 1061:0:0:0: alua: Detached [ 1117.147288] sd 1059:0:0:0: alua: Detached [ 1117.154597] connection1057:0: detected conn error (1011) [ 1117.246026] sd 1074:0:0:0: alua: Detached [ 1117.261870] sd 1070:0:0:0: alua: Detached [ 1117.293683] sd 1086:0:0:0: alua: Detached [ 1117.303633] sd 1058:0:0:0: alua: Detached [ 1117.313978] sd 1062:0:0:0: alua: Detached [ 1117.330080] connection1062:0: detected conn error (1011) [ 1117.342118] sd 1063:0:0:0: alua: Detached [ 1117.357164] sd 1125:0:0:0: alua: Detached [ 1117.357218] sd 1084:0:0:0: alua: Detached [ 1117.371230] sd 1118:0:0:0: alua: Detached [ 1117.376824] sd 1088:0:0:0: alua: Detached [ 1117.401932] connection1088:0: detected conn error (1011) [ 1117.405973] sd 1090:0:0:0: alua: Detached [ 1117.406556] sd 1069:0:0:0: alua: Detached [ 1117.423590] connection1118:0: detected conn error (1011) [ 1117.442144] sd 1102:0:0:0: alua: Detached [ 1117.445572] sd 1081:0:0:0: alua: Detached [ 1117.451705] sd 1087:0:0:0: alua: Detached [ 1117.457592] sd 1060:0:0:0: alua: Detached [ 1117.458261] sd 1134:0:0:0: alua: Detached [ 1117.460056] sd 1130:0:0:0: alua: Detached [ 1117.461869] sd 1072:0:0:0: alua: Detached [ 1117.480426] sd 1067:0:0:0: alua: Detached [ 1117.488646] sd 1138:0:0:0: alua: Detached [ 1117.508845] connection1102:0: detected conn error (1011) [ 1117.513740] sd 1094:0:0:0: alua: Detached [ 1117.531806] connection1134:0: detected conn error (1011) [ 1117.531856] sd 1101:0:0:0: alua: Detached [ 1117.544218] connection1081:0: detected conn error (1011) [ 1117.545087] connection1072:0: detected conn error (1011) [ 1117.545090] connection1067:0: detected conn error (1011) [ 1117.546620] sd 1066:0:0:0: alua: Detached [ 1117.570188] connection1060:0: detected conn error (1011) [ 1117.591323] sd 1078:0:0:0: alua: Detached [ 1117.591344] connection1101:0: detected conn error (1011) [ 1117.594550] connection1094:0: detected conn error (1011) [ 1117.598501] connection1066:0: detected conn error (1011) [ 1117.615319] sd 1083:0:0:0: alua: Detached [ 1117.615511] sd 1065:0:0:0: alua: Detached [ 1117.622793] sd 1089:0:0:0: alua: Detached [ 1117.629740] sd 1092:0:0:0: alua: Detached [ 1117.630165] sd 1077:0:0:0: alua: Detached [ 1117.630290] sd 1071:0:0:0: alua: Detached [ 1117.630662] sd 1075:0:0:0: alua: Detached [ 1117.631041] sd 1117:0:0:0: alua: Detached [ 1117.631922] sd 1143:0:0:0: alua: Detached [ 1117.632156] sd 1109:0:0:0: alua: Detached [ 1117.636650] sd 1144:0:0:0: alua: Detached [ 1117.640147] sd 1127:0:0:0: alua: Detached [ 1117.693098] sd 1108:0:0:0: alua: Detached [ 1117.694817] sd 1085:0:0:0: alua: Detached [ 1117.703864] connection1078:0: detected conn error (1011) [ 1117.711156] sd 1110:0:0:0: alua: Detached [ 1117.727565] sd 1082:0:0:0: alua: Detached [ 1117.729281] connection1143:0: detected conn error (1011) [ 1117.729835] connection1127:0: detected conn error (1011) [ 1117.730450] connection1117:0: detected conn error (1011) [ 1117.731420] connection1144:0: detected conn error (1011) [ 1117.731464] sd 1113:0:0:0: alua: Detached [ 1117.732038] connection1065:0: detected conn error (1011) [ 1117.733177] connection1077:0: detected conn error (1011) [ 1117.733749] connection1092:0: detected conn error (1011) [ 1117.735274] connection1085:0: detected conn error (1011) [ 1117.735304] connection1109:0: detected conn error (1011) [ 1117.738090] connection1075:0: detected conn error (1011) [ 1117.739409] connection1089:0: detected conn error (1011) [ 1117.749207] connection1130:0: detected conn error (1011) [ 1117.769466] sd 1133:0:0:0: alua: Detached [ 1117.797064] sd 1099:0:0:0: alua: Detached [ 1117.797322] sd 1103:0:0:0: alua: Detached [ 1117.797651] sd 1104:0:0:0: alua: Detached [ 1117.800058] sd 1107:0:0:0: alua: Detached [ 1117.802510] sd 1129:0:0:0: alua: Detached [ 1117.802622] sd 1095:0:0:0: alua: Detached [ 1117.802824] sd 1119:0:0:0: alua: Detached [ 1117.803104] sd 1115:0:0:0: alua: Detached [ 1117.803184] sd 1073:0:0:0: alua: Detached [ 1117.803244] sd 1120:0:0:0: alua: Detached [ 1117.803306] sd 1091:0:0:0: alua: Detached [ 1117.803369] sd 1079:0:0:0: alua: Detached [ 1117.803421] sd 1064:0:0:0: alua: Detached [ 1117.803496] sd 1105:0:0:0: alua: Detached [ 1117.803535] sd 1131:0:0:0: alua: Detached [ 1117.805940] sd 1139:0:0:0: alua: Detached [ 1117.806669] sd 1068:0:0:0: alua: Detached [ 1117.807928] sd 1141:0:0:0: alua: Detached [ 1117.811465] sd 1114:0:0:0: alua: Detached [ 1117.812912] sd 1098:0:0:0: alua: Detached [ 1117.813091] sd 1135:0:0:0: alua: Detached [ 1117.815014] sd 1112:0:0:0: alua: Detached [ 1117.828987] sd 1096:0:0:0: alua: Detached [ 1117.834454] sd 1142:0:0:0: alua: Detached [ 1117.838657] sd 1106:0:0:0: alua: Detached [ 1117.839686] sd 1080:0:0:0: alua: Detached [ 1117.840115] sd 1097:0:0:0: alua: Detached [ 1117.841528] sd 1116:0:0:0: alua: Detached [ 1117.851700] sd 1123:0:0:0: alua: Detached [ 1117.856030] sd 1121:0:0:0: alua: Detached [ 1117.856359] sd 1122:0:0:0: alua: Detached [ 1117.858753] sd 1093:0:0:0: alua: Detached [ 1117.865049] sd 1140:0:0:0: alua: Detached [ 1117.866797] sd 1124:0:0:0: alua: Detached [ 1117.872904] sd 1126:0:0:0: alua: Detached [ 1117.875825] sd 1132:0:0:0: alua: Detached [ 1117.986967] 1 connection(s) still exist for iSCSI session to iqn.2015-04.io.flashgrid:rac1 [ 1117.986969] 1 connection(s) still exist for iSCSI session to iqn.2015-04.io.flashgrid:rac1 [ 1117.986971] 1 connection(s) still exist for iSCSI session to iqn.2015-04.io.flashgrid:rac1 [ 1117.987001] ------------[ cut here ]------------ [ 1117.987003] kernel BUG at drivers/target/iscsi/iscsi_target.c:4472! [ 1117.987005] invalid opcode: 0000 [#1] SMP [ 1117.987030] Modules linked in: nfsv3 nfs_acl dm_queue_length target_core_pscsi target_core_file target_core_iblock rpcsec_gss_krb5 a0 [ 1117.987034] crc_t10dif syscopyarea crct10dif_common sysfillrect sysimgblt bnx2x i2c_algo_bit drm_kms_helper ttm mdio libcrc32c drm e [ 1117.987036] CPU: 1 PID: 2546 Comm: kworker/u289:1 Not tainted 3.10.0-327.3.1.el7.x86_64 #1 [ 1117.987037] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.3.6 06/03/2015 [ 1117.987041] Workqueue: isert_comp_wq isert_do_control_comp [ib_isert] [ 1117.987042] task: ffff880bff258b80 ti: ffff880bf4168000 task.ti: ffff880bf4168000 [ 1117.987053] RIP: 0010:[<ffffffffa073d88b>] [<ffffffffa073d88b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod] [ 1117.987054] RSP: 0018:ffff880bf416bd78 EFLAGS: 00010246 [ 1117.987055] RAX: 000000000000004e RBX: ffff880bba78f000 RCX: 000000000000137e [ 1117.987055] RDX: 0000000000000000 RSI: 0000000000000086 RDI: 0000000000000246 [ 1117.987056] RBP: ffff880bf416bd90 R08: 0000000000000086 R09: ffffffff81d5a22e [ 1117.987057] R10: 00000000000504ac R11: 0000000000100000 R12: 0000000000000000 [ 1117.987057] R13: ffff880bc758b000 R14: ffff880c316ae900 R15: 0000000000002420 [ 1117.987058] FS: 0000000000000000(0000) GS:ffff880c4e420000(0000) knlGS:0000000000000000 [ 1117.987059] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1117.987060] CR2: 00007f689a28fbe8 CR3: 000000000194a000 CR4: 00000000001407e0 [ 1117.987060] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1117.987061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1117.987061] Stack: [ 1117.987064] ffff880bdc5a3800 0000000000000000 ffff880bba78f000 ffff880bf416bda0 [ 1117.987067] ffffffffa0743c52 ffff880bf416bdb0 ffffffffa06dc09e ffff880bf416bdc0 [ 1117.987069] ffffffffa06ddb30 ffff880bf416bdf0 ffffffffa073d965 ffff880bdffc1ca0 [ 1117.987069] Call Trace: [ 1117.987077] [<ffffffffa0743c52>] lio_tpg_close_session+0x12/0x20 [iscsi_target_mod] [ 1117.987087] [<ffffffffa06dc09e>] target_release_session+0x1e/0x20 [target_core_mod] [ 1117.987093] [<ffffffffa06ddb30>] target_put_session+0x40/0x50 [target_core_mod] [ 1117.987098] [<ffffffffa073d965>] iscsit_logout_post_handler+0xb5/0x200 [iscsi_target_mod] [ 1117.987101] [<ffffffffa05340b4>] isert_do_control_comp+0xc4/0xf0 [ib_isert] [ 1117.987106] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [ 1117.987108] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [ 1117.987110] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [ 1117.987113] [<ffffffff810a5aef>] kthread+0xcf/0xe0 [ 1117.987116] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 1117.987120] [<ffffffff81645818>] ret_from_fork+0x58/0x90 [ 1117.987123] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 1117.987138] Code: 60 9d 75 a0 31 c0 e8 f5 7a bd e0 e9 12 ff ff ff 8b b7 b8 00 00 00 48 8b 97 f0 01 00 00 31 c0 48 c7 c7 88 12 75 a0 [ 1117.987142] RIP [<ffffffffa073d88b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod] [ 1117.987143] RSP <ffff880bf416bd78> Andy, Do you know of any progress with this issue? It keeps happening with newer kernel 3.10.0-327.10.1.el7.x86_64 In a very large configuration with 120 targets and 4 initiators per target we are seeing it very frequently. [48092.485432] kernel BUG at drivers/target/iscsi/iscsi_target.c:4472! [48092.486410] invalid opcode: 0000 [#1] SMP [48092.487401] Modules linked in: dm_queue_length ib_iser(OE) libiscsi target_core_pscsi target_core_file target_core_iblock ib_isert(OE) iscsi_target_mod target_core_mod nfsv3 rpcsec_gss_krb5 nfsv4 dm_multipath dns_resolver nfs fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_en(OE) vxlan ip6_udp_tunnel udp_tunnel mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) mlx4_core(OE) mlx_compat(OE) intel_powerclamp coretemp intel_rapl kvm_intel kvm sb_edac pcspkr iTCO_wdt edac_core iTCO_vendor_support dcdbas ipmi_ssif sg ipmi_devintf lpc_ich ipmi_si ipmi_msghandler mfd_core mei_me mei shpchp acpi_power_meter nfsd auth_rpcgss binfmt_misc nfs_acl lockd grace ip_tables xfs sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul [48092.493891] crct10dif_common crc32_pclmul crc32c_intel mgag200 ghash_clmulni_intel syscopyarea sysfillrect sysimgblt i2c_algo_bit aesni_intel drm_kms_helper lrw gf128mul ttm glue_helper bnx2x ablk_helper cryptd drm ptp i2c_core pps_core megaraid_sas nvme mdio libcrc32c wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod scsi_transport_iscsi [last unloaded: mlx5_core] [48092.498732] CPU: 16 PID: 294498 Comm: kworker/u577:3 Tainted: G OE ------------ 3.10.0-327.10.1.el7.x86_64 #1 [48092.499881] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.5.4 10/002/2015 [48092.501065] Workqueue: isert_comp_wq isert_do_control_comp [ib_isert] [48092.502231] task: ffff880a74f87300 ti: ffff880b17160000 task.ti: ffff880b17160000 [48092.503495] RIP: 0010:[<ffffffffa095f88b>] [<ffffffffa095f88b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod] [48092.504686] RSP: 0018:ffff880b17163d78 EFLAGS: 00010246 [48092.505815] RAX: 000000000000004f RBX: ffff880bbde05400 RCX: 0000000000000000 [48092.507044] RDX: 0000000000000000 RSI: ffff880c4d40d6c8 RDI: 0000000000000246 [48092.508165] RBP: ffff880b17163d90 R08: 0000000000000086 R09: 000000000000ea80 [48092.509268] R10: 00000000000003ff R11: 0000000000000001 R12: 0000000000000000 [48092.510495] R13: ffff88006fbd1000 R14: ffff880bc1f68900 R15: 0000000000004820 [48092.511787] FS: 0000000000000000(0000) GS:ffff880c4d400000(0000) knlGS:0000000000000000 [48092.512937] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [48092.514241] CR2: 00007ff3f4d290a0 CR3: 000000000194a000 CR4: 00000000001407e0 [48092.515525] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [48092.516681] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [48092.517804] Stack: [48092.519036] ffff880b384c4000 0000000000000000 ffff880bbde05400 ffff880b17163da0 [48092.520322] ffffffffa0965c52 ffff880b17163db0 ffffffffa08fe09e ffff880b17163dc0 [48092.521552] ffffffffa08ffb30 ffff880b17163df0 ffffffffa095f965 ffff880ba94024b8 [48092.522791] Call Trace: [48092.524062] [<ffffffffa0965c52>] lio_tpg_close_session+0x12/0x20 [iscsi_target_mod] [48092.525236] [<ffffffffa08fe09e>] target_release_session+0x1e/0x20 [target_core_mod] [48092.526502] [<ffffffffa08ffb30>] target_put_session+0x40/0x50 [target_core_mod] [48092.527903] [<ffffffffa095f965>] iscsit_logout_post_handler+0xb5/0x200 [iscsi_target_mod] [48092.529167] [<ffffffffa0850e64>] isert_do_control_comp+0xc4/0xf0 [ib_isert] [48092.529934] [<ffffffff8109d5db>] process_one_work+0x17b/0x470 [48092.530725] [<ffffffff8109e3ab>] worker_thread+0x11b/0x400 [48092.531492] [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400 [48092.532255] [<ffffffff810a5acf>] kthread+0xcf/0xe0 [48092.533008] [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 [48092.533770] [<ffffffff81645998>] ret_from_fork+0x58/0x90 [48092.534513] [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 [48092.535263] Code: 60 bd 97 a0 31 c0 e8 d5 5b 9b e0 e9 12 ff ff ff 8b b7 b8 00 00 00 48 8b 97 f0 01 00 00 31 c0 48 c7 c7 88 32 97 a0 e8 2d f4 cc e0 <0f> 0b 48 c7 c6 60 32 97 a0 48 c7 c7 b0 bd 97 a0 31 c0 e8 9e 5b [48092.536812] RIP [<ffffffffa095f88b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod] [48092.537677] RSP <ffff880b17163d78> It looks like we are hitting this BUG(): int iscsit_close_session(struct iscsi_session *sess) { struct iscsi_portal_group *tpg = sess->tpg; struct se_portal_group *se_tpg = &tpg->tpg_se_tpg; if (atomic_read(&sess->nconn)) { pr_err("%d connection(s) still exist for iSCSI session" " to %s\n", atomic_read(&sess->nconn), sess->sess_ops->InitiatorName); BUG(); } spin_lock_bh(&se_tpg->session_lock); atomic_set(&sess->session_logout, 1); atomic_set(&sess->session_reinstatement, 1); iscsit_stop_time2retain_timer(sess); spin_unlock_bh(&se_tpg->session_lock); Including the dump again with two additional messages at the top that provide important pointers to the source of the problem. [ 3633.644221] isert: isert_disconnected_handler: conn ffff8801e5a96000 teminating in state 5 [ 3633.665332] 1 connection(s) still exist for iSCSI session to iqn.2015-04.io.flashgrid:ora03 [ 3633.666229] ------------[ cut here ]------------ [ 3633.667025] kernel BUG at drivers/target/iscsi/iscsi_target.c:4472! [ 3633.667187] invalid opcode: 0000 [#1] SMP [ 3633.667187] Modules linked in: dm_queue_length sd_mod sg target_core_pscsi target_core_file target_core_iblock nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver dm_multipath dm_mod nfs fscache xprtrdma(OE) ib_isert(OE) iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic ib_iser(OE) libiscsi scsi_transport_iscsi ib_srpt(OE) ib_srp(OE) scsi_transport_srp(OE) ib_ipoib(OE) rdma_ucm(OE) ib_ucm(OE) ib_uverbs(OE) ib_umad(OE) rdma_cm(OE) ib_cm(OE) iw_cm(OE) ib_sa(OE) ib_mad(OE) intel_rapl crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ppdev parport_pc i2c_piix4 parport pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c ata_generic mlx5_ib(OE) ib_core(OE) ib_addr(OE) pata_acpi cirrus syscopyarea xen_blkfront sysfillrect xen_netfront [ 3633.667187] sysimgblt mlx5_core(OE) drm_kms_helper vxlan ttm crct10dif_pclmul crct10dif_common ip6_udp_tunnel crc32c_intel udp_tunnel mlx_compat(OE) drm ptp ata_piix nvme serio_raw libata pps_core i2c_core floppy [ 3633.675826] CPU: 2 PID: 25112 Comm: kworker/u31:5 Tainted: G OE ------------ 3.10.0-327.10.1.el7.x86_64 #1 [ 3633.675826] Hardware name: Xen HVM domU, BIOS 4.3.1OVM 01/24/2016 [ 3633.675826] Workqueue: isert_comp_wq isert_do_control_comp [ib_isert] [ 3633.675826] task: ffff8800c8abdc00 ti: ffff8801ca0b0000 task.ti: ffff8801ca0b0000 [ 3633.679905] RIP: 0010:[<ffffffffa05e488b>] [<ffffffffa05e488b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod] [ 3633.679905] RSP: 0018:ffff8801ca0b3d78 EFLAGS: 00010246 [ 3633.679905] RAX: 000000000000004f RBX: ffff8801fa0f3800 RCX: 0000000000000000 [ 3633.683363] RDX: 0000000000000000 RSI: ffff88020fc4d6c8 RDI: 0000000000000246 [ 3633.683363] RBP: ffff8801ca0b3d90 R08: 0000000000000086 R09: 0000000000004b99 [ 3633.683363] R10: 00000000000003ff R11: 0000000000000002 R12: 0000000000000000 [ 3633.683363] R13: ffff8800d6888000 R14: ffff8800e0ddf400 R15: 00000000000003e0 [ 3633.683363] FS: 0000000000000000(0000) GS:ffff88020fc40000(0000) knlGS:0000000000000000 [ 3633.683363] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3633.689750] CR2: 00007fe1a3a9e000 CR3: 00000001c0ec9000 CR4: 00000000001406e0 [ 3633.689750] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3633.689750] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 3633.689750] Stack: [ 3633.689750] ffff8800c84c9000 0000000000000000 ffff8801fa0f3800 ffff8801ca0b3da0 [ 3633.689750] ffffffffa05eac52 ffff8801ca0b3db0 ffffffffa056809e ffff8801ca0b3dc0 [ 3633.689750] ffffffffa0569b30 ffff8801ca0b3df0 ffffffffa05e4965 ffff8800c031dd58 [ 3633.689750] Call Trace: [ 3633.689750] [<ffffffffa05eac52>] lio_tpg_close_session+0x12/0x20 [iscsi_target_mod] [ 3633.689750] [<ffffffffa056809e>] target_release_session+0x1e/0x20 [target_core_mod] [ 3633.689750] [<ffffffffa0569b30>] target_put_session+0x40/0x50 [target_core_mod] [ 3633.689750] [<ffffffffa05e4965>] iscsit_logout_post_handler+0xb5/0x200 [iscsi_target_mod] [ 3633.689750] [<ffffffffa05b0e64>] isert_do_control_comp+0xc4/0xf0 [ib_isert] [ 3633.689750] [<ffffffff8109d5db>] process_one_work+0x17b/0x470 [ 3633.689750] [<ffffffff8109e3ab>] worker_thread+0x11b/0x400 [ 3633.689750] [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400 [ 3633.689750] [<ffffffff810a5acf>] kthread+0xcf/0xe0 [ 3633.689750] [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 [ 3633.689750] [<ffffffff81645998>] ret_from_fork+0x58/0x90 [ 3633.689750] [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 [ 3633.689750] Code: 60 0d 60 a0 31 c0 e8 d5 0b d3 e0 e9 12 ff ff ff 8b b7 b8 00 00 00 48 8b 97 f0 01 00 00 31 c0 48 c7 c7 88 82 5f a0 e8 2d a4 04 e1 <0f> 0b 48 c7 c6 60 82 5f a0 48 c7 c7 b0 0d 60 a0 31 c0 e8 9e 0b [ 3633.689750] RIP [<ffffffffa05e488b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod] [ 3633.689750] RSP <ffff8801ca0b3d78> isert state 5 is ISER_CONN_DOWN. Andy, do you have any updates regarding this bug by any chance? The problem happens during simultaneous logouts (no logins) - either from the same initiator system or from different initiator systems. Below are steps for the easiest way to reproduce the problem on one target system + one initiator system. It happens on 3.10.0-327.22.2.el7.x86_64 with either inbox or Mellanox OFED 3.3 IB/iser/isert drivers. The problem does NOT happen on kernel 4.6.2 with inbox drivers. 1) on TARGET SYSTEM create 100 targets on loopback devices: losetup -D for i in {1..100}; do dd if=/dev/zero of=/dev/shm/loop$i.src bs=1M count=10 2>/dev/null losetup /dev/loop$i /dev/shm/loop$i.src done /usr/bin/targetctl restore 100targets.cfg #100targets.cfg attached to bz 2) on INITIATOR SYSTEM create and connect iSER initiators: for t in {1..100}; do iscsiadm -m node --op new -p 192.168.100.1:3260,1 -T iqn.2015-04.io.flashgrid:ora01.cq542000c0400agn-p$t -I iser iscsiadm -m node --login -p 192.168.100.1:3260,1 -T iqn.2015-04.io.flashgrid:ora01.cq542000c0400agn-p$t -I iser done; 3) on TARGET SYSTEM start high priority processes (#processes = 8 X #cores): for i in {1..32}; do nice -n -20 dd if=/dev/zero of=/dev/zero bs=512 & done; 4) on INITIATOR SYSTEM log out all initiators simultaneously: iscsiadm -m node --logout RESULT: the target system crashed. 100% reproducible on a system with 4 cores. [26916.770371] 1 connection(s) still exist for iSCSI session to iqn.2015-04.io.flashgrid:rac2 [26916.771860] ------------[ cut here ]------------ [26916.773268] kernel BUG at drivers/target/iscsi/iscsi_target.c:4472! [26916.774682] invalid opcode: 0000 [#1] SMP [26916.776080] Modules linked in: loop oracleacfs(POE) oracleadvm(POE) oracleoks(POE) dm_queue_length iscsi_tcp libiscsi_tcp target_core_pscsi target_core_file target_core_iblock nfsv3 dm_multipath rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad intel_powerclamp coretemp intel_rapl kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sb_edac mei_me iTCO_wdt ipmi_devintf mei sg edac_core iTCO_vendor_support ipmi_ssif dcdbas lpc_ich pcspkr ipmi_si mfd_core ipmi_msghandler mxm_wmi wmi shpchp acpi_power_meter nfsd auth_rpcgss nfs_acl lockd binfmt_misc grace sunrpc ip_tables xfs sd_mod crc_t10dif crct10dif_generic [26916.785156] mlx5_ib ib_core ib_addr mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit crct10dif_pclmul drm_kms_helper crct10dif_common bnx2x crc32c_intel ttm drm ahci mdio libahci ptp libata i2c_core pps_core nvme megaraid_sas mlx5_core libcrc32c dm_mirror dm_region_hash dm_log dm_mod [26916.789880] CPU: 11 PID: 263167 Comm: kworker/u769:95 Tainted: P OE ------------ 3.10.0-327.22.2.el7.x86_64 #1 [26916.791479] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 2.0.2 03/15/2016 [26916.793084] Workqueue: isert_comp_wq isert_do_control_comp [ib_isert] [26916.794678] task: ffff880c32f4e780 ti: ffff880b708e8000 task.ti: ffff880b708e8000 [26916.796279] RIP: 0010:[<ffffffffa07d288b>] [<ffffffffa07d288b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod] [26916.797913] RSP: 0018:ffff880b708ebd78 EFLAGS: 00010246 [26916.799522] RAX: 000000000000004e RBX: ffff880c13a14000 RCX: 0000000000000000 [26916.801131] RDX: 0000000000000000 RSI: ffff880c4c76d6c8 RDI: 0000000000000246 [26916.802728] RBP: ffff880b708ebd90 R08: 0000000000000086 R09: 0000000000000c1b [26916.804308] R10: 00000000000003ff R11: 0000000000000001 R12: 0000000000000000 [26916.805877] R13: ffff880b32149000 R14: ffff880bfe173f00 R15: 0000000000006020 [26916.807432] FS: 0000000000000000(0000) GS:ffff880c4c760000(0000) knlGS:0000000000000000 [26916.808996] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [26916.810538] CR2: 0000000001549200 CR3: 0000000b6fab7000 CR4: 00000000001407e0 [26916.812088] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [26916.813617] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [26916.815129] Stack: [26916.816611] ffff880b7113a800 0000000000000000 ffff880c13a14000 ffff880b708ebda0 [26916.818131] ffffffffa07d8c52 ffff880b708ebdb0 ffffffffa077109e ffff880b708ebdc0 [26916.819637] ffffffffa0772b30 ffff880b708ebdf0 ffffffffa07d2965 ffff880bb2943218 [26916.821130] Call Trace: [26916.822590] [<ffffffffa07d8c52>] lio_tpg_close_session+0x12/0x20 [iscsi_target_mod] [26916.824081] [<ffffffffa077109e>] target_release_session+0x1e/0x20 [target_core_mod] [26916.825557] [<ffffffffa0772b30>] target_put_session+0x40/0x50 [target_core_mod] [26916.827025] [<ffffffffa07d2965>] iscsit_logout_post_handler+0xb5/0x200 [iscsi_target_mod] [26916.828483] [<ffffffffa07090b4>] isert_do_control_comp+0xc4/0xf0 [ib_isert] [26916.829938] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [26916.831372] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [26916.832798] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [26916.834203] [<ffffffff810a5aef>] kthread+0xcf/0xe0 [26916.835594] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [26916.836980] [<ffffffff816467d8>] ret_from_fork+0x58/0x90 [26916.838351] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [26916.839710] Code: 60 ed 7e a0 31 c0 e8 45 33 b4 e0 e9 12 ff ff ff 8b b7 b8 00 00 00 48 8b 97 f0 01 00 00 31 c0 48 c7 c7 88 62 7e a0 e8 7d d2 e5 e0 <0f> 0b 48 c7 c6 60 62 7e a0 48 c7 c7 b0 ed 7e a0 31 c0 e8 0e 33 [26916.842617] RIP [<ffffffffa07d288b>] iscsit_close_session+0x20b/0x230 [iscsi_target_mod] [26916.844023] RSP <ffff880b708ebd78> Created attachment 1179494 [details]
cfg for 100 targets
With the following iscsi_target.c patches applied to 3.10.0-327.22.2.el7.x86_64 the problem goes away https://patchwork.kernel.org/patch/6856841/ https://patchwork.kernel.org/patch/6856851/ https://patchwork.kernel.org/patch/6856861/ https://patchwork.kernel.org/patch/6856871/ These patches are the only differences in iscsi_target.c between 3.10.0-327.22.2.el7.x86_64 and vanilla 4.3.6 kernels. *** Bug 1312456 has been marked as a duplicate of this bug. *** The fourth patch should not be needed, because the rhel kernel has had it applied already, and in fact superseded by bd3792205aaeb. Can you please check to see if the first three patches alone resolve the issue, or if the four patches plus the additional commit above resolve this issue? Thanks. The first three patches resolve the issue when used with ib_isert from MOFED 3.3. We actually tested with ib_isert from Mellanox OFED 3.3. So, the fourth patch that applies to ib_isert was not really used in our last round of testing. Patch(es) committed on kernel repository and an interim kernel build is undergoing testing Patch(es) available on kernel-3.10.0-482.el7 Are there download link for kernel-3.10.0-482.el7? With updated kernel to 3.10.0-482.el7 it is not reproducible anymore. (In reply to Vasily Evseenko from comment #21) > Are there download link for kernel-3.10.0-482.el7? I'm afraid not -- first public release of a kernel with these fixes will be in RHEL 7.3 beta, whose release date has not yet been announced. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2574.html |