Bug 848942 - tgtd fails to start due to possible race condition
Summary: tgtd fails to start due to possible race condition
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: scsi-target-utils
Version: 18
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Andy Grover
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-16 21:34 UTC by Rolf Fokkens
Modified: 2014-06-09 13:39 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-05 22:47:00 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Rolf Fokkens 2012-08-16 21:34:43 UTC
Description of problem:
tgtd fails to start, syslog shows:
Aug 16 23:20:58 home01 tgtadm[19568]: tgtadm: can't send the request to the tgt daemon, Transport endpoint is not connected
Aug 16 23:20:58 home01 tgtd[19567]: tgtd: work_timer_start(146) use timer_fd based scheduler
Aug 16 23:20:58 home01 tgtd[19567]: tgtd: bs_init(313) use signalfd notification
Aug 16 23:20:58 home01 systemd[1]: tgtd.service: control process exited, code=exited status=107
Aug 16 23:20:58 home01 systemd[1]: Unit tgtd.service entered failed state.

Version-Release number of selected component (if applicable):
scsi-target-utils-1.0.24-1.fc17.x86_64

How reproducible:
100%

Steps to Reproduce:
1. systemctl start tgtd.service
2. Job failed. See system journal and 'systemctl status' for details 
  
Actual results:
Job failed. See system journal and 'systemctl status' for details

Expected results:
Running tgtd

Additional info:
It seams like the ExecStartPost's in the tgtd.service file are called to soon after tgtd startup. Possible fix, works for me:
--- /tmp/tgtd.service	2012-08-16 23:32:29.754130410 +0200
+++ /lib/systemd/system/tgtd.service	2012-08-16 23:21:48.560706857 +0200
@@ -6,6 +6,7 @@
 EnvironmentFile=/etc/sysconfig/tgtd
 
 ExecStart=/usr/sbin/tgtd -f $TGTD_OPTS
+ExecStartPost=sleep 5
 # Put tgtd into "offline" state until all the targets are configured.
 # We don't want initiators to (re)connect and fail the connection
 # if it's not ready.

Comment 1 Andy Grover 2012-09-28 18:18:16 UTC
This solution works but is not ideal. Opened an upstream bug at https://bugs.freedesktop.org/show_bug.cgi?id=55431 . In the meantime I guess we'll do your proposed fix.

Comment 2 Roderick Johnstone 2013-01-31 12:10:07 UTC
We are still seeing this on F18 with scsi-target-utils-1.0.32-2.fc18.x86_64.

Jan 31 09:54:19 xxx systemd[1]: Starting tgtd iSCSI target daemon...
Jan 31 09:54:19 xxx tgtd[698]: librdmacm: Warning: couldn't read ABI version.
Jan 31 09:54:19 xxx tgtd[698]: librdmacm: Warning: assuming: 4
Jan 31 09:54:19 xxx tgtd[698]: librdmacm: Fatal: unable to get RDMA device list
Jan 31 09:54:19 xxx tgtd[698]: tgtd: iser_ib_init(3376) Failed to initialize RDMA; load kernel modules?
Jan 31 09:54:19 xxx tgtadm[700]: tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected
Jan 31 09:54:19 xxx tgtd[698]: tgtd: work_timer_start(146) use timer_fd based scheduler
Jan 31 09:54:19 xxx tgtd[698]: tgtd: bs_init(313) use signalfd notification
Jan 31 09:54:19 xxx systemd[1]: tgtd.service: control process exited, code=exited status=107
Jan 31 09:54:21 xxx tgtd[698]: tgtd: iscsi/iser.c:3429: iser_ib_release: Assertion `list_empty(&iser_conn_list)' failed.
Jan 31 09:54:21 xxx systemd[1]: tgtd.service: main process exited, code=killed, status=6/ABRT
Jan 31 09:54:21 xxx systemd[1]: Failed to start tgtd iSCSI target daemon.
Jan 31 09:54:21 xxx systemd[1]: Unit tgtd.service entered failed state

As mentioned above it looks as if the tgtadm commands are trying to run before tgtd is ready.

The sleep 5 fix seems to work but needs to have the path explicitly included, ie
ExecStartPost=/usr/bin/sleep 5

Is there a possibility to get the suggested sleep fix into fedora packages while we are waiting for progress on the upstream bug?

Comment 3 Dan Prince 2013-03-14 13:06:26 UTC
Andy, Comment on the upstream ticket says this:

Could you paste your unit file?

Maybe the best would be to split the ExecStartPost out into its own unit an simply order that after your main unit?

----

We require tgtd for OpenStack Cinder and this is currently a show stopper on Fedora 17/18. I see no reason not to package the work around ('ExecStartPost=/usr/bin/sleep 5') while we wait on the upstream to get fixed (and released). It has just been too long...

Comment 4 Andy Grover 2013-03-21 23:18:18 UTC
Sorry for the delay. Yes, we'll do the workaround right away for F17/18.

Comment 5 Fedora Update System 2013-03-26 00:00:46 UTC
scsi-target-utils-1.0.24-7.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/scsi-target-utils-1.0.24-7.fc17

Comment 6 Fedora Update System 2013-03-26 00:01:04 UTC
scsi-target-utils-1.0.32-4.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/scsi-target-utils-1.0.32-4.fc18

Comment 7 Fedora Update System 2013-04-05 22:55:52 UTC
scsi-target-utils-1.0.32-4.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 8 Fedora Update System 2013-04-05 22:56:53 UTC
scsi-target-utils-1.0.24-7.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 9 Jeff Raber 2013-04-09 20:11:16 UTC
I have scsi-target-utils-1.0.32-4.fc18.x86_64 installed, but am still seeing this issue.  I don't think the sleep is being executed; the service fails very quickly.

# time systemctl start tgtd.service
Job for tgtd.service failed. See 'systemctl status tgtd.service' and 'journalctl -xn' for details.

real	0m0.138s
user	0m0.003s
sys	0m0.013s

Comment 10 Andy Grover 2013-04-10 00:39:14 UTC
Does it work if the line is changed to "/bin/sleep 5" ?

Comment 11 Dan Prince 2013-04-10 18:46:52 UTC
Andy:

I think it should be '/bin/sleep 5' yes.

Here is a copy of the init script I've been using:

https://github.com/redhat-openstack/openstack-puppet/blob/master/modules/cinder/files/tgtd.service#L12

Comment 12 Andy Grover 2013-04-10 19:05:04 UTC
OK respinning pkgs.

Comment 13 Fedora Update System 2013-04-10 19:38:41 UTC
scsi-target-utils-1.0.24-8.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/scsi-target-utils-1.0.24-8.fc17

Comment 14 Fedora Update System 2013-04-10 19:38:53 UTC
scsi-target-utils-1.0.32-5.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/scsi-target-utils-1.0.32-5.fc18

Comment 15 Andy Grover 2013-04-10 19:40:07 UTC
ok should be good to go. Turn on updates-testing repo to get these pkgs before they go stable.

Comment 16 Jeff Raber 2013-04-11 16:53:00 UTC
Works for me.  Thanks!

Comment 17 Fedora Update System 2013-04-11 23:20:53 UTC
Package scsi-target-utils-1.0.24-8.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing scsi-target-utils-1.0.24-8.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-5441/scsi-target-utils-1.0.24-8.fc17
then log in and leave karma (feedback).

Comment 18 Fedora Update System 2013-04-21 03:23:51 UTC
scsi-target-utils-1.0.32-5.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 19 Fedora Update System 2013-04-21 03:25:51 UTC
scsi-target-utils-1.0.24-8.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 20 Krishnaprasad K 2013-07-24 14:27:39 UTC
I still see this issue in Fedora 18 though sleep 5 is incorporated in /lib/systemd/system/tgtd.service

<snip>
...
ExecStart=/usr/sbin/tgtd -f $TGTD_OPTS
# see bz 848942. workaround for a race for now.
ExecStartPost=/bin/sleep 5
...
</snip>

scsi-target-utils RPM details
=============================
~# rpm -qa | grep scsi-target-utils
scsi-target-utils-1.0.32-5.fc18.x86_64

Service status
==============
~# systemctl status tgtd.service
tgtd.service - tgtd iSCSI target daemon
          Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled)
          Active: failed (Result: core-dump) since Wed, 2013-07-24 18:43:08 IST; 1h 10min ago
         Process: 3585 ExecStop=/usr/sbin/tgtadm --op delete --mode system (code=exited, status=0/SUCCESS)
         Process: 3582 ExecStop=/usr/sbin/tgt-admin --update ALL -c /dev/null (code=exited, status=0/SUCCESS)
         Process: 3581 ExecStop=/usr/sbin/tgtadm --op update --mode sys --name State -v offline (code=exited, status=0/SUCCESS)
         Process: 3565 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v offline (code=exited, status=0/SUCCESS)
         Process: 3557 ExecStartPost=/bin/sleep 10 (code=exited, status=0/SUCCESS)
         Process: 3556 ExecStart=/usr/sbin/tgtd -f $TGTD_OPTS (code=dumped, signal=ABRT)
          CGroup: name=systemd:/system/tgtd.service

Jul 24 18:41:27 dhcp-10-10-2-8.helab.bdc tgtd[3556]: librdmacm: couldn't read ABI version.
Jul 24 18:41:27 dhcp-10-10-2-8.helab.bdc tgtd[3556]: librdmacm: assuming: 4
Jul 24 18:41:27 dhcp-10-10-2-8.helab.bdc tgtd[3556]: tgtd: iser_ib_init(3376) Failed to initialize RDMA; load kernel modules?
Jul 24 18:41:27 dhcp-10-10-2-8.helab.bdc tgtd[3556]: tgtd: work_timer_start(146) use timer_fd based scheduler
Jul 24 18:41:27 dhcp-10-10-2-8.helab.bdc tgtd[3556]: tgtd: bs_init(313) use signalfd notification
Jul 24 18:43:08 dhcp-10-10-2-8.helab.bdc tgtd[3556]: tgtd: iscsi/iser.c:3429: iser_ib_release: Assertion `list_empty(&iser_conn_list)' failed.
Jul 24 18:43:08 dhcp-10-10-2-8.helab.bdc tgtd[3556]: CMA: unable to get RDMA device list
Jul 24 18:43:08 dhcp-10-10-2-8.helab.bdc systemd[1]: tgtd.service: main process exited, code=dumped, status=6/ABRT
Jul 24 18:43:08 dhcp-10-10-2-8.helab.bdc systemd[1]: Failed to start tgtd iSCSI target daemon.
Jul 24 18:43:08 dhcp-10-10-2-8.helab.bdc systemd[1]: Unit tgtd.service entered failed state

Comment 21 Fedora End Of Life 2013-08-01 16:48:43 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 22 Andy Grover 2013-08-02 18:32:15 UTC
Please retry with 1.0.38-1 in updates-testing on Fedora 18, can you still reproduce?

Comment 23 Fedora End Of Life 2013-12-21 15:04:30 UTC
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 24 Fedora End Of Life 2014-02-05 22:47:00 UTC
Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 25 Krishnaprasad K 2014-06-09 11:32:37 UTC
Please mark this issue as closed as it's a wont fix for F17.

Comment 26 Rolf Fokkens 2014-06-09 13:39:37 UTC
Currently running F20, scsi-target-utils-1.0.46-1.fc20.x86_64.

The issue seems to be gone.


Note You need to log in before you can comment on or make changes to this bug.