Hide Forgot
Description of problem: Nov 30 08:16:30 i-8D377A5F kernel: tgtd[2168]: segfault at 1af9088 ip 00000000004067bb sp 00007fffc2283f10 error 6 in tgtd (deleted)[400000+3e000] Nov 30 08:16:31 i-8D377A5F abrt[28913]: File '/usr/sbin/tgtd' seems to be deleted Nov 30 08:16:31 i-8D377A5F abrtd: Directory 'ccpp-2016-11-30-08:16:31-2168' creation detected Nov 30 08:16:31 i-8D377A5F abrt[28913]: Saved core dump of pid 2168 (/usr/sbin/tgtd) to /var/spool/abrt/ccpp-2016-11-30-08:16:31-2168 (168697856 bytes) Version-Release number of selected component (if applicable): scsi-target-utils-1.0.24-10.el6.x86_64 How reproducible: Customer just noticed tgtd's status is "tgtd dead but subsys locked" when client failed to login to the scsi server Additional info: [tgt@dhcp-192-23 ccpp-2016-11-30-08:16:31-2168]$ gdb coredump /usr/bin/tgtd GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... "/home/tgt/01749253/var/spool/abrt/ccpp-2016-11-30-08:16:31-2168/coredump" is a core file. Please specify an executable to debug. /usr/bin/tgtd: No such file or directory. (gdb) quit [tgt@dhcp-192-23 ccpp-2016-11-30-08:16:31-2168]$ [tgt@dhcp-192-23 ccpp-2016-11-30-08:16:31-2168]$ gdb /usr/sbin/tgtd coredump GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/tgtd...Reading symbols from /usr/lib/debug/usr/sbin/tgtd.debug...done. done. [New Thread 2168] [New Thread 2382] [New Thread 2373] [New Thread 2384] [New Thread 2376] [New Thread 2377] [New Thread 2372] [New Thread 2386] [New Thread 2374] [New Thread 2378] [New Thread 2381] [New Thread 2385] [New Thread 2379] [New Thread 2383] [New Thread 2380] [New Thread 2375] [New Thread 2387] Missing separate debuginfo for Try: yum --enablerepo='*-debug*' install /usr/lib/debug/.build-id/81/a81be2e44c93640adedb62adc93a47f4a09dd1 Reading symbols from /lib64/libaio.so.1.0.1...Reading symbols from /usr/lib/debug/lib64/libaio.so.1.0.1.debug...done. done. Loaded symbols for /lib64/libaio.so.1.0.1 Reading symbols from /usr/lib64/libibverbs.so.1.0.0...Reading symbols from /usr/lib/debug/usr/lib64/libibverbs.so.1.0.0.debug...done. done. Loaded symbols for /usr/lib64/libibverbs.so.1.0.0 Reading symbols from /usr/lib64/librdmacm.so.1.0.0...Reading symbols from /usr/lib/debug/usr/lib64/librdmacm.so.1.0.0.debug...done. done. Loaded symbols for /usr/lib64/librdmacm.so.1.0.0 Reading symbols from /lib64/libpthread-2.12.so...Reading symbols from /usr/lib/debug/lib64/libpthread-2.12.so.debug...done. [Thread debugging using libthread_db enabled] done. Loaded symbols for /lib64/libpthread-2.12.so Reading symbols from /lib64/libc-2.12.so...Reading symbols from /usr/lib/debug/lib64/libc-2.12.so.debug...done. done. Loaded symbols for /lib64/libc-2.12.so Reading symbols from /lib64/libdl-2.12.so...Reading symbols from /usr/lib/debug/lib64/libdl-2.12.so.debug...done. done. Loaded symbols for /lib64/libdl-2.12.so Reading symbols from /lib64/ld-2.12.so...Reading symbols from /usr/lib/debug/lib64/ld-2.12.so.debug...done. done. Loaded symbols for /lib64/ld-2.12.so Core was generated by `tgtd'. Program terminated with signal 11, Segmentation fault. #0 __list_del (conn=0x1ab8328) at ./list.h:82 82 next->prev = prev; (gdb) where #0 __list_del (conn=0x1ab8328) at ./list.h:82 #1 list_del (conn=0x1ab8328) at ./list.h:88 #2 iscsi_task_tx_start (conn=0x1ab8328) at iscsi/iscsid.c:1916 #3 iscsi_tx_handler (conn=0x1ab8328) at iscsi/iscsid.c:2191 #4 0x000000000040e3f8 in iscsi_tcp_event_handler (fd=<value optimized out>, events=<value optimized out>, data=0x1ab8328) at iscsi/iscsi_tcp.c:164 #5 0x000000000041a0f9 in event_loop () at tgtd.c:411 #6 0x000000000041a7aa in main (argc=<value optimized out>, argv=<value optimized out>) at tgtd.c:583 (gdb) #0 __list_del (conn=0x1ab8328) at ./list.h:82 #1 list_del (conn=0x1ab8328) at ./list.h:88 #2 iscsi_task_tx_start (conn=0x1ab8328) at iscsi/iscsid.c:1916 #3 iscsi_tx_handler (conn=0x1ab8328) at iscsi/iscsid.c:2191 #4 0x000000000040e3f8 in iscsi_tcp_event_handler (fd=<value optimized out>, events=<value optimized out>, data=0x1ab8328) at iscsi/iscsi_tcp.c:164 #5 0x000000000041a0f9 in event_loop () at tgtd.c:411 #6 0x000000000041a7aa in main (argc=<value optimized out>, argv=<value optimized out>) at tgtd.c:583 (gdb) f 3 #3 iscsi_tx_handler (conn=0x1ab8328) at iscsi/iscsid.c:2191 2191 ret = iscsi_task_tx_start(conn); (gdb) l 2186 ddigest = p[ISCSI_PARAM_DATADGST_EN].val & DIGEST_CRC32C; 2187 } else 2188 hdigest = ddigest = 0; 2189 2190 if (conn->state == STATE_SCSI && !conn->tx_task) { 2191 ret = iscsi_task_tx_start(conn); 2192 if (ret) 2193 goto out; 2194 } 2195 (gdb) p conn $1 = (struct iscsi_connection *) 0x1ab8328 (gdb) p *conn $2 = {state = 12, closed = 0, rx_iostate = 1, tx_iostate = 13, refcount = 1, clist = {next = 0x1ab7950, prev = 0x1ab7950}, session = 0x1ab7910, tid = 1, session_param = {{state = 0, val = 8192}, {state = 2, val = 8192}, {state = 2, val = 1}, {state = 2, val = 1}, {state = 2, val = 1}, {state = 2, val = 1}, {state = 2, val = 1}, {state = 2, val = 65536}, {state = 2, val = 262144}, {state = 2, val = 1}, {state = 2, val = 1}, {state = 2, val = 0}, {state = 0, val = 0}, {state = 0, val = 0}, {state = 2, val = 2}, {state = 2, val = 20}, {state = 0, val = 2048}, {state = 0, val = 2048}, {state = 2, val = 1}, {state = 0, val = 0}, {state = 0, val = 262144}, {state = 0, val = 262144}, { state = 0, val = 0}}, initiator = 0x1ab78c0 "iqn.1991-05.com.microsoft:langchaobackup.intranet.gzaic.gov.cn", isid = "@\000\001\067\000", tsih = 143, cid = 1, session_type = 0, auth_method = 0, stat_sn = 3, exp_stat_sn = 2, cmd_sn = 0, exp_cmd_sn = 4523092, max_cmd_sn = 4523093, req = {bhs = {opcode = 1 '\001', flags = 193 '\301', rsvd2 = "\000", hlength = 0 '\000', dlength = "\000\000", lun = "\000\001\000\000\000\000\000", itt = 1409565952, ttt = 1024, statsn = 1409565952, exp_statsn = 33554432, max_statsn = 136, other = "\000\006d\361sp\000\000\002\000\000"}, ahs = 0x1ab38a0, ahssize = 0, data = 0x1afa000, datasize = 0}, req_buffer = 0x1ab38a0, rsp = {bhs = {opcode = 0 '\000', flags = 0 '\000', rsvd2 = "\000", hlength = 0 '\000', dlength = "\000\000", lun = "\000\000\000\000\000\000\000", itt = 0, ttt = 0, statsn = 0, exp_statsn = 0, max_statsn = 0, other = '\000' <repeats 11 times>}, ahs = 0x0, ahssize = 0, data = 0x0, datasize = 0}, rsp_buffer = 0x1ab58b0, rsp_buffer_size = 8192, rx_buffer = 0x1ab8448 "\001\301", tx_buffer = 0x1ab84a0 "", rx_size = 48, tx_size = 48, ttt = 0, text_datasize = 0, text_rsp_buffer = 0x0, rx_task = 0x0, tx_task = 0x0, tx_clist = {next = 0x1aaeba8, prev = 0x1ab88f0}, task_list = {next = 0x1ab8548, prev = 0x1ab8548}, rx_digest = "\000\000\000", tx_digest = "\000\000\000", auth_state = 0, auth = {chap = {digest_alg = 0, id = 0, challenge_size = 0, challenge = 0x0}}, tp = 0x63e7e0, stats = {txdata_octets = 264140, rxdata_octets = 612, noptx_pdus = 0, scsicmd_pdus = 1, tmfcmd_pdus = 0, login_pdus = 0, text_pdus = 0, dataout_pdus = 0, logout_pdus = 0, snack_pdus = 0, noprx_pdus = 0, scsirsp_pdus = 1, tmfrsp_pdus = 0, textrsp_pdus = 0, datain_pdus = 32, logoutrsp_pdus = 0, r2t_pdus = 0, async_pdus = 0, rjt_pdus = 0, digest_err = 0, timeout_err = 0, custom_length = 0, custom = 0x1ab8328}} (gdb) f 2 #2 iscsi_task_tx_start (conn=0x1ab8328) at iscsi/iscsid.c:1916 1916 list_del(&task->c_list); (gdb) l 1911 dprintf("found a task %" PRIx64 " %u %u %u\n", task->tag, 1912 ntohl(((struct iscsi_cmd *) (&task->req))->data_length), 1913 task->offset, 1914 task->r2t_count); 1915 1916 list_del(&task->c_list); 1917 1918 switch (task->req.opcode & ISCSI_OPCODE_MASK) { 1919 case ISCSI_OP_SCSI_CMD: 1920 err = iscsi_scsi_cmd_tx_start(task); (gdb) p task $3 = (struct iscsi_task *) 0x1aaeb28 (gdb) p *task $4 = {req = {opcode = 46 '.', flags = 105 'i', rsvd2 = "nt", hlength = 114 'r', dlength = "ane", lun = "t.gzaic.", itt = 779513703, ttt = 28259, statsn = 0, exp_statsn = 0, max_statsn = 417, other = "\000\000\000\000P\360\252\001\000\000\000"}, rsp = {opcode = 216 '\330', flags = 254 '\376', rsvd2 = "8", <incomplete sequence \351>, hlength = 58 ':', dlength = "\000\000", lun = "\000\000\000\000\000\000\000", itt = 0, ttt = 0, statsn = 0, exp_statsn = 0, max_statsn = 27978912, other = "\000\000\000\000\300i\252\001\000\000\000"}, tag = 39969447615725632, conn = 0x1aaeb90, c_hlist = {next = 0x1aaeb90, prev = 0x1}, c_list = {next = 0x1af9080, prev = 0x1ab8538}, c_siblings = {next = 0x1aaebb8, prev = 0x1aaebb8}, flags = 4523092, result = 8192, len = 2, offset = 8192, r2t_count = 2, unsol_count = 1, exp_r2tsn = 2, ahs = 0x200000001, data = 0x200000001, scmd = { c_target = 0x200000001, c_hlist = {next = 0x200000001, prev = 0x200010000}, qlist = {next = 0x200040000, prev = 0x200000001}, dev_id = 8589934593, dev = 0x0, state = 0, data_dir = DATA_NONE, in_sdb = {resid = 2, length = 2, buffer = 20}, out_sdb = {resid = 2048, length = 0, buffer = 8589936640}, cmd_itn_id = 1, offset = 0, tl = 262144, scb = 0x40000 <Address 0x40000 out of bounds>, scb_len = 0, lun = "\000\000\000\000\200\060\253\001", attribute = 0, tag = 336, result = 80, mreq = 0x1aaeaf0, sense_buffer = "-05.com.microsoft:langchaobackup.intranet.gzaic.gov.cn\000\000\360\001\000\000\000\000\000\000`", '\000' <repeats 15 times>, "]\032>X\000\000\000\000\000\355\252\001\000\000\000\000\000\355\252\001\000\000\000\000pH\252\001", '\000' <repeats 28 times>, "0\355\252\001\000\000\000\000\060\355\252\001\000\000\000\000\200\060\253\001\000\000\000\000a\000\000\000\000\000\000\000P\177\253\001\000\000\000\000P\360\252\001", '\000' <repeats 12 times>, "h\355\252\001\000\000\000\000h\355\252\001", '\000' <repeats 20 times>, "!\000\000\000\000\000\000\000\350\376\070\351:\000\000\000\020o\252\001\000\000\000\000\260\002\000", sense_len = 0, bs_list = {next = 0x2b0, prev = 0xc101}, it_nexus = 0x100, itn_lu_info = 0x400d6d74400}, extdata = 0x1aaeb28} (gdb) bt #0 __list_del (conn=0x1ab8328) at ./list.h:82 #1 list_del (conn=0x1ab8328) at ./list.h:88 #2 iscsi_task_tx_start (conn=0x1ab8328) at iscsi/iscsid.c:1916 #3 iscsi_tx_handler (conn=0x1ab8328) at iscsi/iscsid.c:2191 #4 0x000000000040e3f8 in iscsi_tcp_event_handler (fd=<value optimized out>, events=<value optimized out>, data=0x1ab8328) at iscsi/iscsi_tcp.c:164 #5 0x000000000041a0f9 in event_loop () at tgtd.c:411 #6 0x000000000041a7aa in main (argc=<value optimized out>, argv=<value optimized out>) at tgtd.c:583 (gdb) f 1 #1 list_del (conn=0x1ab8328) at ./list.h:88 88 __list_del(entry->prev, entry->next); (gdb) l 83 prev->next = next; 84 } 85 86 static inline void list_del(struct list_head *entry) 87 { 88 __list_del(entry->prev, entry->next); 89 entry->next = entry->prev = NULL; 90 } 91 92 static inline void list_del_init(struct list_head *entry) (gdb) p *entry $5 = {next = 0x1af9080, prev = 0x1ab8538} (gdb) p *0x1af9080 Cannot access memory at address 0x1af9080 <------ invalid address (gdb) p *0x1ab8538 $6 = 27978664 (gdb) The code of frame 2 looks to be where the issue lies, 1900 static int iscsi_task_tx_start(struct iscsi_connection *conn) 1901 { 1902 struct iscsi_task *task; 1903 int is_rsp, err = 0; 1904 (gdb) 1905 if (list_empty(&conn->tx_clist)) 1906 goto nodata; 1907 1908 conn_write_pdu(conn); 1909 1910 task = list_first_entry(&conn->tx_clist, struct iscsi_task, c_list); <---- what is this c_list we pass as last argument? Is it a global defined somewhere else? 1911 dprintf("found a task %" PRIx64 " %u %u %u\n", task->tag, 1912 ntohl(((struct iscsi_cmd *) (&task->req))->data_length), 1913 task->offset, 1914 task->r2t_count); (gdb) 1915 1916 list_del(&task->c_list); <--- task->c_list->next is invalid 1917 80 static inline void __list_del(struct list_head * prev, struct list_head * next) 81 { 82 next->prev = prev; <--- crashes here 83 prev->next = next; 84 } looking at list.h list_first_entry is a macro expanding to list_entry, which again expands to container_of , defined here like so, #define container_of(ptr, type, member) ({ \ const typeof( ((type *)0)->member ) *__mptr = (ptr); \ (type *)( (char *)__mptr - offsetof(type,member) );}) #define list_first_entry(ptr, type, member) \ list_entry((ptr)->next, type, member) #define list_entry(ptr, type, member) \ container_of(ptr, type, member) typeof is explained here https://gcc.gnu.org/onlinedocs/gcc/Typeof.html I wonder if this is all fine, and we simply have a corruption elsewhere in the code, which might be causing the invalid memory in task->c_list ? Could we ask them to run the same tgtd command they run, through valgrind to spot potential corruptions ? Or does someone have a better idea to debug this macro, #define container_of(ptr, type, member) ({ \ const typeof( ((type *)0)->member ) *__mptr = (ptr); \ (type *)( (char *)__mptr - offsetof(type,member) );}) I doubt there would be a bug in this macro. So it seems more likely that a corruption is causing the task->c_list->next to be invalid memory. Customer is not ready to run valgrind tests on his reproducer, so I guess the best option is for Engineering to run valgrind on this version of tgtd and check if there is a cause that will explain a corruption.
Red Hat Enterprise Linux 6 transitioned to the Production 3 Phase on May 10, 2017. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not appear to meet the inclusion criteria for the Production Phase 3 and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com