Bug 787366 - systemd-sysv hangs starting a legacy init script
Summary: systemd-sysv hangs starting a legacy init script
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 16
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-04 13:46 UTC by Tom Horsley
Modified: 2012-02-12 11:28 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-04 16:28:36 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
/etc/rc.d/init.d/nstar.d (2.90 KB, text/plain)
2012-02-04 13:49 UTC, Tom Horsley
no flags Details

Description Tom Horsley 2012-02-04 13:46:06 UTC
Description of problem:

Up until I installed systemd-sysv-37-11.fc16.x86_64 this morning, the nstar.d
init script on my system correctly started the nstar.d daemon on my system.
This morning, when I boot it hangs forever, yet the daemon actually appears
to be running, systemctl is waiting for something that never happens.


Version-Release number of selected component (if applicable):
systemd-sysv-37-11.fc16.x86_64

How reproducible:
Every time

Steps to Reproduce:
1. Try to boot with nstar.d enabled
2. Or try service nstar.d start after booting with nstar.d disabled
3.
  
Actual results:
hangs starting daemon

Expected results:
background deamon and keep going

Additional info:
I'll attach this particular legacy init script so someone can tell me what
a fool I am for expecting it to work (even though it has worked forever :-).

Comment 1 Tom Horsley 2012-02-04 13:49:25 UTC
Created attachment 559415 [details]
/etc/rc.d/init.d/nstar.d

If you want, you too can download and install the nstar tools to get all the other binaries and libraries and wot-not that go with this service. See:

http://news.ccur.com/isd_solutions_nightstarlinux.asp?actbut=demo

Comment 2 Jóhann B. Guðmundsson 2012-02-04 14:36:03 UTC
Usually this happens if the legacy sysvinit script is broken anyway I need you to boot with 'log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg' then save the output of the dmesg command and attach
it here

You can also migrate the legacy sysvinit script as in create a new unit in /etc/systemd/system directory named nstard.service should something like

### nstar.d.service ###

[Unit]
Description=NightStar Server
After=network.target

[Service]
Type=forking
PIDFile=/run/nstar.d.pid
ExecStart=/usr/bin/nightstar -d

[Install]
WantedBy=multi-user.target

Then run systemctl daemon-reload and reboot

Comment 3 Tom Horsley 2012-02-04 16:04:52 UTC
Here's the tail of dmesg after manually starting the service (couldn't figure out how to hang at boot and still get dmesg :-), and I think it is now obvious what is going on (though why I just changed, I don't know):

[   86.454933] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.StartUnit() on /org/freedesktop/systemd1
[   86.454948] systemd[1]: Trying to enqueue job nstar.d.service/start/replace
[   86.455176] systemd[1]: Installed new job nstar.d.service/start as 522
[   86.455179] systemd[1]: Enqueued job nstar.d.service/start as 522
[   86.455221] systemd[1]: About to execute: /etc/rc.d/init.d/nstar.d start
[   86.484875] systemd[1]: Forked /etc/rc.d/init.d/nstar.d as 2509
[   86.484894] systemd[1]: nstar.d.service changed dead -> start
[   86.484973] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.GetUnit() on /org/freedesktop/systemd1
[   86.485091] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Properties.Get() on /org/freedesktop/systemd1/unit/nstar_2ed_2eservice
[   86.494249] systemd[1]: Received SIGCHLD from PID 2512 (nstar.d).
[   86.494284] systemd[1]: Got SIGCHLD for process 2512 (nstar.d)
[   86.494335] systemd[1]: Child 2512 died (code=exited, status=0/SUCCESS)
[   86.495361] systemd[1]: Received SIGCHLD from PID 2509 (nstar.d).
[   86.495384] systemd[1]: Got SIGCHLD for process 2509 (nstar.d)
[   86.495411] systemd[1]: Child 2509 died (code=exited, status=0/SUCCESS)
[   86.495414] systemd[1]: Child 2509 belongs to nstar.d.service
[   86.495419] systemd[1]: nstar.d.service: control process exited, code=exited status=0
[   86.495422] systemd[1]: nstar.d.service got final SIGCHLD for state start
[   86.495431] systemd[1]: PID file /var/run/nstar.d.pid or /var/nightstar/nstar.d.pid not readable (yet?) after start.
[   86.495435] systemd[1]: Setting watch for nstar.d.service's PID file /var/run/nstar.d.pid or /var/nightstar/nstar.d.pid

The init script contains this line at the top:

# pidfile:     /var/run/nstar.d.pid or /var/nightstar/nstar.d.pid

I betcha anything if I change it to just:

# pidfile:     /var/run/nstar.d.pid

That it would work fine.

This script is used on a slew of different linux versions and someone probably thought that a shell comment line was actually a comment (how foolish :-).

It does seem like systemd ought to think about timing out if the watch never find the file though. Hanging the whole boot process seems like a bad idea even with a broken init script.

Comment 4 Tom Horsley 2012-02-04 16:12:53 UTC
And fixing the pidfile: comment did indeed fix the startup. No more hang on boot and the nstar.d service is running fine.

Comment 5 Jóhann B. Guðmundsson 2012-02-04 16:28:36 UTC
Due to various reason the default timeout is set to 180s thus users will have to wait for those 3 minutes before their boot continues.

Try the migrated native systemd unit and see if that works ( or tweak it as necessary  ) once you got it working submit it upstream nstar along with the bug report against the sysv initscript not being correct.

Thanks

Comment 6 Michal Schmidt 2012-02-06 11:15:00 UTC
(In reply to comment #4)
> And fixing the pidfile: comment did indeed fix the startup. No more hang on
> boot and the nstar.d service is running fine.

Right, the "# pidfile:" header, if it is provided at all, must be correct. Otherwise systemd will treat the service as taking long to start, until a timeout kicks in and kills it.

(In reply to comment #5)
> Due to various reason the default timeout is set to 180s thus users will have
> to wait for those 3 minutes before their boot continues.

It's 5 minutes for SysV services.

Comment 7 Clemens Eisserer 2012-02-12 10:11:50 UTC
Please re-open.

After updating Fedora-16 with the latest packages, I ran into the same problem.
At boot my notebook was blocked for about 2-4min. In /var/log/boot.log I found the following entry: 

Failed to start LSB: Starts and stops login and scanning of iSCSI
devices.
             [FAILED]
See 'systemctl status iscsi.service' for details.


Quering the status of iscsi however tells me everything went fine:

$ systemctl status iscsi.service
iscsi.service - LSB: Starts and stops login and scanning of iSCSI devices.
         Loaded: loaded (/etc/rc.d/init.d/iscsi)
         Active: active (exited) since Sat, 11 Feb 2012 23:17:58 +0100; 18s ago
        Process: 1731 ExecStart=/etc/rc.d/init.d/iscsi start (code=exited,
status=0/SUCCESS)
         CGroup: name=systemd:/system/iscsi.service


Deleting the init-script from /etc/inid.d/iscsi* solved the problem for me.

This are both systemd and iscsi init script shipped by Fedora, so to be honest, I don't care who's fault it is - it would be great to have it fixed.

Comment 8 Jóhann B. Guðmundsson 2012-02-12 10:19:53 UTC
File a new bug against iscsi-initiator-utils

Thanks

Comment 9 Clemens Eisserer 2012-02-12 11:22:47 UTC
will do so, thanks

Comment 10 Clemens Eisserer 2012-02-12 11:28:38 UTC
Ok, bug 789683 filed against iscsi-initiator-utils.
Would be great if somebody could take care.


Note You need to log in before you can comment on or make changes to this bug.