Bug 848585 - tgtd hangs on recvfrom() after series of failed iscsi creation requests
tgtd hangs on recvfrom() after series of failed iscsi creation requests
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: scsi-target-utils (Show other bugs)
6.3
All Linux
urgent Severity high
: rc
: ---
Assigned To: Andy Grover
Bruno Goncalves
: ZStream
Depends On:
Blocks: 1093141 1093142 1093143
  Show dependency treegraph
 
Reported: 2012-08-15 17:56 EDT by Kendrick Gay
Modified: 2014-10-14 04:27 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, the Linux SCSI target administration utility, tgtadm, did not correctly handle backing-store errors. As a consequence, calling tgtadm with an invalid backing-store parameter in some cases caused the tgtd daemon to become unresponsive. With this update, the bug in tgtadm has been fixed and tgtd now recovers after an invalid request as intended.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-10-14 04:27:09 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1599 normal SHIPPED_LIVE scsi-target-utils bug fix update 2014-10-13 21:39:44 EDT

  None (edit)
Description Kendrick Gay 2012-08-15 17:56:25 EDT
Description of problem:
When attempting to create multiple iscsi targets in succession, if non-existent backing-store is specified, tgtd will hang on recvfrom() on socket connection.

Version-Release number of selected component (if applicable):

scsi-target-utils-1.0.14-4.el6

How reproducible:

Inconsistent - possible race condition.

Steps to Reproduce:
1. Run "tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b /dev/fake" in a for loop for X number of times.
2. Await hang.
3.
  
Actual results:

tgtadm hangs on socket return.

14:32:49 socket(PF_FILE, SOCK_STREAM, 0) = 3
14:32:49 connect(3, {sa_family=AF_FILE, path="/var/run/tgtd.ipc_abstract_namespace.0"}, 110) = 0
14:32:49 write(3, "\2\0\0\0\0\0\0\0iscsi\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 134) = 134
14:32:49 recvfrom(3,

Expected results:

tgtadm should recover gracefully.

17:42:49 socket(PF_FILE, SOCK_STREAM, 0) = 3
17:42:49 connect(3, {sa_family=AF_FILE, path="/var/run/tgtd.ipc_abstract_namespace.0"}, 110) = 0
17:42:49 write(3, "\2\0\0\0\0\0\0\0iscsi\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 134) = 134
17:42:49 recvfrom(3, "\16\0\0\0\10\0\0\0", 8, MSG_WAITALL, NULL, NULL) = 8
17:42:49 write(2, "tgtadm: invalid request\n", 24tgtadm: invalid request
) = 24
17:42:49 close(3)                       = 0
17:42:49 exit_group(22)                 = ?

Additional info:

Easily reproducible with the following bash script:

------->8------->8------->8-------

#!/bin/bash

killall -9 tgtd;service tgtd restart

for i in `seq 1 $1`; do timeout 10 strace -tv tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 2 -b /dev/fake; done

echo "Done!"

------->8------->8------->8-------

Since this varies, I generally just run "./testtgtadm 20" or 30, but I've seen it happen with as few as 5 iterations.
Comment 2 RHEL Product and Program Management 2012-09-07 01:38:39 EDT
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.
Comment 4 RHEL Product and Program Management 2012-09-18 18:09:41 EDT
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.
Comment 7 Andy Grover 2012-10-03 18:58:16 EDT
Trying to repro using script in initial bug description, can't get it to hang. Can you try scsi-target-utils-1.0.24-2 (what's in rhel 6.3) and see what happens?

BTW I'm getting:
15:52:49 recvfrom(3, "\4\0\0\0\10\0\0\0", 8, MSG_WAITALL, NULL, NULL) = 8
tgtadm: can't find the target

this is error 4 --

instead of "invalid request" (error 14, "\16" in octal), am I doing something wrong?
Comment 8 Andy Grover 2012-10-09 14:47:15 EDT
ok I am getting "invalid request" (had to create the target, duh). Still can't reproduce, even putting sleeps in backstore open path to try to widen potential races.
Comment 12 Andy Grover 2012-11-05 15:05:57 EST
I can't work on this issue until I can reproduce it. Can reporter give me some more hints on how to reproduce?
Comment 19 Bruno Goncalves 2014-04-16 04:24:34 EDT
I was able to reproduce it using RHEL-6.3 (scsi-target-utils-1.0.24-2.el6.x86_64).

# cat /etc/tgt/targets.conf 
<target iqn.2008-09.com.example:server.target1>
</target>

And using the following script:

# cat testtgtadm.sh 
#!/bin/bash

killall -9 tgtd;service tgtd restart
for i in `seq 1 $1`; do timeout 10 strace -tv tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 2 -b /dev/fake; done

echo "Done!"


And calling it with # ./testtgtadm.sh 200

-------
Result:
16:23:03 munmap(0x7f3090820000, 50255)  = 0
16:23:03 brk(0)                         = 0x14e6000
16:23:03 brk(0x1509000)                 = 0x1509000
16:23:03 socket(PF_FILE, SOCK_STREAM, 0) = 3
16:23:03 connect(3, {sa_family=AF_FILE, path="/var/run/tgtd.ipc_abstract_namespace.0"}, 110) = 0
16:23:03 write(3, "\2\0\0\0\0\0\0\0iscsi\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 134) = 134
16:23:03 recvfrom(3,
Comment 20 Bruno Goncalves 2014-04-16 05:17:07 EDT
Also reproducible on RHEL-6.5 (scsi-target-utils-1.0.24-10.el6.x86_64)
Comment 22 Andy Grover 2014-04-17 21:31:58 EDT
Thanks Bruno, I'll try that!
Comment 25 Andy Grover 2014-04-28 19:28:17 EDT
I can reproduce it and I'm working on fixing it.
Comment 26 Andy Grover 2014-04-28 21:53:27 EDT
generated a fix and posted to upstream.
Comment 33 Bruno Goncalves 2014-06-26 10:07:58 EDT
I reproduced it with scsi-target-utils-1.0.24-2.el6.

and confirm the fix using scsi-target-utils-1.0.24-13.el6.

using the reproducer from comment#19
Comment 35 errata-xmlrpc 2014-10-14 04:27:09 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1599.html

Note You need to log in before you can comment on or make changes to this bug.