Bug 848942
Summary: | tgtd fails to start due to possible race condition | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Rolf Fokkens <rolf> |
Component: | scsi-target-utils | Assignee: | Andy Grover <agrover> |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 18 | CC: | agrover, dprince, eglynn, eharney, jeff.raber, krishnaprasad_k, mailings, markmc, mchristi, pbrady, rmj, terje.rosten |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-02-05 22:47:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Rolf Fokkens
2012-08-16 21:34:43 UTC
This solution works but is not ideal. Opened an upstream bug at https://bugs.freedesktop.org/show_bug.cgi?id=55431 . In the meantime I guess we'll do your proposed fix. We are still seeing this on F18 with scsi-target-utils-1.0.32-2.fc18.x86_64. Jan 31 09:54:19 xxx systemd[1]: Starting tgtd iSCSI target daemon... Jan 31 09:54:19 xxx tgtd[698]: librdmacm: Warning: couldn't read ABI version. Jan 31 09:54:19 xxx tgtd[698]: librdmacm: Warning: assuming: 4 Jan 31 09:54:19 xxx tgtd[698]: librdmacm: Fatal: unable to get RDMA device list Jan 31 09:54:19 xxx tgtd[698]: tgtd: iser_ib_init(3376) Failed to initialize RDMA; load kernel modules? Jan 31 09:54:19 xxx tgtadm[700]: tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected Jan 31 09:54:19 xxx tgtd[698]: tgtd: work_timer_start(146) use timer_fd based scheduler Jan 31 09:54:19 xxx tgtd[698]: tgtd: bs_init(313) use signalfd notification Jan 31 09:54:19 xxx systemd[1]: tgtd.service: control process exited, code=exited status=107 Jan 31 09:54:21 xxx tgtd[698]: tgtd: iscsi/iser.c:3429: iser_ib_release: Assertion `list_empty(&iser_conn_list)' failed. Jan 31 09:54:21 xxx systemd[1]: tgtd.service: main process exited, code=killed, status=6/ABRT Jan 31 09:54:21 xxx systemd[1]: Failed to start tgtd iSCSI target daemon. Jan 31 09:54:21 xxx systemd[1]: Unit tgtd.service entered failed state As mentioned above it looks as if the tgtadm commands are trying to run before tgtd is ready. The sleep 5 fix seems to work but needs to have the path explicitly included, ie ExecStartPost=/usr/bin/sleep 5 Is there a possibility to get the suggested sleep fix into fedora packages while we are waiting for progress on the upstream bug? Andy, Comment on the upstream ticket says this: Could you paste your unit file? Maybe the best would be to split the ExecStartPost out into its own unit an simply order that after your main unit? ---- We require tgtd for OpenStack Cinder and this is currently a show stopper on Fedora 17/18. I see no reason not to package the work around ('ExecStartPost=/usr/bin/sleep 5') while we wait on the upstream to get fixed (and released). It has just been too long... Sorry for the delay. Yes, we'll do the workaround right away for F17/18. scsi-target-utils-1.0.24-7.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/scsi-target-utils-1.0.24-7.fc17 scsi-target-utils-1.0.32-4.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/scsi-target-utils-1.0.32-4.fc18 scsi-target-utils-1.0.32-4.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. scsi-target-utils-1.0.24-7.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report. I have scsi-target-utils-1.0.32-4.fc18.x86_64 installed, but am still seeing this issue. I don't think the sleep is being executed; the service fails very quickly. # time systemctl start tgtd.service Job for tgtd.service failed. See 'systemctl status tgtd.service' and 'journalctl -xn' for details. real 0m0.138s user 0m0.003s sys 0m0.013s Does it work if the line is changed to "/bin/sleep 5" ? Andy: I think it should be '/bin/sleep 5' yes. Here is a copy of the init script I've been using: https://github.com/redhat-openstack/openstack-puppet/blob/master/modules/cinder/files/tgtd.service#L12 OK respinning pkgs. scsi-target-utils-1.0.24-8.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/scsi-target-utils-1.0.24-8.fc17 scsi-target-utils-1.0.32-5.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/scsi-target-utils-1.0.32-5.fc18 ok should be good to go. Turn on updates-testing repo to get these pkgs before they go stable. Works for me. Thanks! Package scsi-target-utils-1.0.24-8.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing scsi-target-utils-1.0.24-8.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-5441/scsi-target-utils-1.0.24-8.fc17 then log in and leave karma (feedback). scsi-target-utils-1.0.32-5.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. scsi-target-utils-1.0.24-8.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report. I still see this issue in Fedora 18 though sleep 5 is incorporated in /lib/systemd/system/tgtd.service <snip> ... ExecStart=/usr/sbin/tgtd -f $TGTD_OPTS # see bz 848942. workaround for a race for now. ExecStartPost=/bin/sleep 5 ... </snip> scsi-target-utils RPM details ============================= ~# rpm -qa | grep scsi-target-utils scsi-target-utils-1.0.32-5.fc18.x86_64 Service status ============== ~# systemctl status tgtd.service tgtd.service - tgtd iSCSI target daemon Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled) Active: failed (Result: core-dump) since Wed, 2013-07-24 18:43:08 IST; 1h 10min ago Process: 3585 ExecStop=/usr/sbin/tgtadm --op delete --mode system (code=exited, status=0/SUCCESS) Process: 3582 ExecStop=/usr/sbin/tgt-admin --update ALL -c /dev/null (code=exited, status=0/SUCCESS) Process: 3581 ExecStop=/usr/sbin/tgtadm --op update --mode sys --name State -v offline (code=exited, status=0/SUCCESS) Process: 3565 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v offline (code=exited, status=0/SUCCESS) Process: 3557 ExecStartPost=/bin/sleep 10 (code=exited, status=0/SUCCESS) Process: 3556 ExecStart=/usr/sbin/tgtd -f $TGTD_OPTS (code=dumped, signal=ABRT) CGroup: name=systemd:/system/tgtd.service Jul 24 18:41:27 dhcp-10-10-2-8.helab.bdc tgtd[3556]: librdmacm: couldn't read ABI version. Jul 24 18:41:27 dhcp-10-10-2-8.helab.bdc tgtd[3556]: librdmacm: assuming: 4 Jul 24 18:41:27 dhcp-10-10-2-8.helab.bdc tgtd[3556]: tgtd: iser_ib_init(3376) Failed to initialize RDMA; load kernel modules? Jul 24 18:41:27 dhcp-10-10-2-8.helab.bdc tgtd[3556]: tgtd: work_timer_start(146) use timer_fd based scheduler Jul 24 18:41:27 dhcp-10-10-2-8.helab.bdc tgtd[3556]: tgtd: bs_init(313) use signalfd notification Jul 24 18:43:08 dhcp-10-10-2-8.helab.bdc tgtd[3556]: tgtd: iscsi/iser.c:3429: iser_ib_release: Assertion `list_empty(&iser_conn_list)' failed. Jul 24 18:43:08 dhcp-10-10-2-8.helab.bdc tgtd[3556]: CMA: unable to get RDMA device list Jul 24 18:43:08 dhcp-10-10-2-8.helab.bdc systemd[1]: tgtd.service: main process exited, code=dumped, status=6/ABRT Jul 24 18:43:08 dhcp-10-10-2-8.helab.bdc systemd[1]: Failed to start tgtd iSCSI target daemon. Jul 24 18:43:08 dhcp-10-10-2-8.helab.bdc systemd[1]: Unit tgtd.service entered failed state Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. Please retry with 1.0.38-1 in updates-testing on Fedora 18, can you still reproduce? This message is a reminder that Fedora 18 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 18. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '18'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 18's end of life. Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 18 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 18's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. Please mark this issue as closed as it's a wont fix for F17. Currently running F20, scsi-target-utils-1.0.46-1.fc20.x86_64. The issue seems to be gone. |