Bug 848585

Summary: tgtd hangs on recvfrom() after series of failed iscsi creation requests
Product: Red Hat Enterprise Linux 6 Reporter: Kendrick Gay <kgay>
Component: scsi-target-utilsAssignee: Andy Grover <agrover>
Status: CLOSED ERRATA QA Contact: Bruno Goncalves <bgoncalv>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.3CC: agrover, bgoncalv, cww, gborsuk, jherrman, jkurik, vanhoof, wburrows
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, the Linux SCSI target administration utility, tgtadm, did not correctly handle backing-store errors. As a consequence, calling tgtadm with an invalid backing-store parameter in some cases caused the tgtd daemon to become unresponsive. With this update, the bug in tgtadm has been fixed and tgtd now recovers after an invalid request as intended.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-14 08:27:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1093141, 1093142, 1093143    

Description Kendrick Gay 2012-08-15 21:56:25 UTC
Description of problem:
When attempting to create multiple iscsi targets in succession, if non-existent backing-store is specified, tgtd will hang on recvfrom() on socket connection.

Version-Release number of selected component (if applicable):

scsi-target-utils-1.0.14-4.el6

How reproducible:

Inconsistent - possible race condition.

Steps to Reproduce:
1. Run "tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b /dev/fake" in a for loop for X number of times.
2. Await hang.
3.
  
Actual results:

tgtadm hangs on socket return.

14:32:49 socket(PF_FILE, SOCK_STREAM, 0) = 3
14:32:49 connect(3, {sa_family=AF_FILE, path="/var/run/tgtd.ipc_abstract_namespace.0"}, 110) = 0
14:32:49 write(3, "\2\0\0\0\0\0\0\0iscsi\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 134) = 134
14:32:49 recvfrom(3,

Expected results:

tgtadm should recover gracefully.

17:42:49 socket(PF_FILE, SOCK_STREAM, 0) = 3
17:42:49 connect(3, {sa_family=AF_FILE, path="/var/run/tgtd.ipc_abstract_namespace.0"}, 110) = 0
17:42:49 write(3, "\2\0\0\0\0\0\0\0iscsi\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 134) = 134
17:42:49 recvfrom(3, "\16\0\0\0\10\0\0\0", 8, MSG_WAITALL, NULL, NULL) = 8
17:42:49 write(2, "tgtadm: invalid request\n", 24tgtadm: invalid request
) = 24
17:42:49 close(3)                       = 0
17:42:49 exit_group(22)                 = ?

Additional info:

Easily reproducible with the following bash script:

------->8------->8------->8-------

#!/bin/bash

killall -9 tgtd;service tgtd restart

for i in `seq 1 $1`; do timeout 10 strace -tv tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 2 -b /dev/fake; done

echo "Done!"

------->8------->8------->8-------

Since this varies, I generally just run "./testtgtadm 20" or 30, but I've seen it happen with as few as 5 iterations.

Comment 2 RHEL Program Management 2012-09-07 05:38:39 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 4 RHEL Program Management 2012-09-18 22:09:41 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 7 Andy Grover 2012-10-03 22:58:16 UTC
Trying to repro using script in initial bug description, can't get it to hang. Can you try scsi-target-utils-1.0.24-2 (what's in rhel 6.3) and see what happens?

BTW I'm getting:
15:52:49 recvfrom(3, "\4\0\0\0\10\0\0\0", 8, MSG_WAITALL, NULL, NULL) = 8
tgtadm: can't find the target

this is error 4 --

instead of "invalid request" (error 14, "\16" in octal), am I doing something wrong?

Comment 8 Andy Grover 2012-10-09 18:47:15 UTC
ok I am getting "invalid request" (had to create the target, duh). Still can't reproduce, even putting sleeps in backstore open path to try to widen potential races.

Comment 12 Andy Grover 2012-11-05 20:05:57 UTC
I can't work on this issue until I can reproduce it. Can reporter give me some more hints on how to reproduce?

Comment 19 Bruno Goncalves 2014-04-16 08:24:34 UTC
I was able to reproduce it using RHEL-6.3 (scsi-target-utils-1.0.24-2.el6.x86_64).

# cat /etc/tgt/targets.conf 
<target iqn.2008-09.com.example:server.target1>
</target>

And using the following script:

# cat testtgtadm.sh 
#!/bin/bash

killall -9 tgtd;service tgtd restart
for i in `seq 1 $1`; do timeout 10 strace -tv tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 2 -b /dev/fake; done

echo "Done!"


And calling it with # ./testtgtadm.sh 200

-------
Result:
16:23:03 munmap(0x7f3090820000, 50255)  = 0
16:23:03 brk(0)                         = 0x14e6000
16:23:03 brk(0x1509000)                 = 0x1509000
16:23:03 socket(PF_FILE, SOCK_STREAM, 0) = 3
16:23:03 connect(3, {sa_family=AF_FILE, path="/var/run/tgtd.ipc_abstract_namespace.0"}, 110) = 0
16:23:03 write(3, "\2\0\0\0\0\0\0\0iscsi\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 134) = 134
16:23:03 recvfrom(3,

Comment 20 Bruno Goncalves 2014-04-16 09:17:07 UTC
Also reproducible on RHEL-6.5 (scsi-target-utils-1.0.24-10.el6.x86_64)

Comment 22 Andy Grover 2014-04-18 01:31:58 UTC
Thanks Bruno, I'll try that!

Comment 25 Andy Grover 2014-04-28 23:28:17 UTC
I can reproduce it and I'm working on fixing it.

Comment 26 Andy Grover 2014-04-29 01:53:27 UTC
generated a fix and posted to upstream.

Comment 33 Bruno Goncalves 2014-06-26 14:07:58 UTC
I reproduced it with scsi-target-utils-1.0.24-2.el6.

and confirm the fix using scsi-target-utils-1.0.24-13.el6.

using the reproducer from comment#19

Comment 35 errata-xmlrpc 2014-10-14 08:27:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1599.html