Bug 1291062 - custom service files half-disappear from systemd until rename: "No such file or directory", cannot start service
custom service files half-disappear from systemd until rename: "No such file ...
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: systemd (Show other bugs)
22
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: systemd-maint
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-13 05:45 EST by Trevor Cordes
Modified: 2016-02-02 02:22 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-02 02:22:59 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Trevor Cordes 2015-12-13 05:45:56 EST
Description of problem:
Starting Dec 12, 2015, on my personal system I had a custom systemd service I've created (in a .service file in /etc/systemd/system/multi-user.target.wants/) suddenly stop working.  I thought it was very odd.  It just happened again 1 day later on a second, similar, machine I administer.  Now I'm worried it's a bug.

The symptom is the service stops, and systemd won't let me restart it.  Status shows an error.  I can run the executable the service file is supposed to run and it works fine.  But the service won't run.  *** HERE's the buggy part, if I rename the .service file a few times, and systemctl daemon-reload a few times, and keep trying to restart the service, it will eventually start under the new name!  Then after that if I rename it back to the original name and try again, it then, sometimes, works!  Non-deterministic!

COMPUTER #2:
#systemctl restart tec-dynamic-ip-update.service
Failed to restart tec-dynamic-ip-update.service: Unit tec-dynamic-ip-update.service failed to load: No such file or directory.
Exit 6

#systemctl -n30 status tec-dynamic-ip-update.service
* tec-dynamic-ip-update.service
   Loaded: not-found (Reason: No such file or directory)
   Active: failed (Result: exit-code) since Sun 2015-12-13 03:46:20 CST; 16min ago
 Main PID: 4639 (code=exited, status=143)

Dec 12 16:46:00 zzz.ca systemd[1]: Started tec dynamic-ip-update.
Dec 12 16:46:00 zzz.ca systemd[1]: Starting tec dynamic-ip-update...
Dec 13 03:46:20 zzz.ca systemd[1]: tec-dynamic-ip-update.service: main process exited, code=exited, status=143/n/a
Dec 13 03:46:20 zzz.ca systemd[1]: Unit tec-dynamic-ip-update.service entered failed state.
Dec 13 03:46:20 zzz.ca systemd[1]: tec-dynamic-ip-update.service failed.
Exit 3

#cd /etc/systemd/system/multi-user.target.wants/
#ll
...
tec-dynamic-ip-update.service

#mv tec-dynamic-ip-update.service tec-dynamic-ip.service
#systemctl daemon-reload
#systemctl restart tec-dynamic-ip.service
[no error!!!!!!]
[service is running ok now!]


Sometimes when I try to rename it back to the old file name, it keeps failing.  And then it fails in the new name and I have to come up with yet another name, and then it works again!  Huh??!?!?

Again, this happened on two different machines, for two differently-named and different-function services, at two different times about 12 hours apart.  I have never seen this before on any machine.

I just tested it on the service that failed on COMPUTER #1 earlier and just a simple restart of the service caused the bug to hit.  To fix I had to rename file 3 times until it finally worked again.  So it looks like I can reproduce this fairly easily.

The error message "No such file or directory" is key, I'm sure... exactly which file is "No such file or directory"?  The .service file, or the exec it calls, or???  Ambiguous, the error message should be more precise.


Version-Release number of selected component (if applicable):
systemd-219-26.fc22.i686 (COMPUTER #2)
systemd-219-25.fc22.x86_64 (COMPUTER #1)


How reproducible:
Once it "hits" then I can get the error forever until I start renaming .service files.  It seems that service is left a bit wonky as I can reproduce the bug more easily on that service from that point on, especially if I can get the thing to work after I rename it back to the original (desired) name.


Steps to Reproduce:
1. Make a custom .service file for some service you have created yourself
2. Wait for a service to randomly die for no reason (or reboot so all services have to come up)
3. Hopefully get lucky and this bug hits
4. systemctl restart x.service

Actual results:
* x.service
   Loaded: not-found (Reason: No such file or directory)
   Active: failed (Result: exit-code) since Sun 2015-12-13 03:46:20 CST; 16min ago
 Main PID: 4639 (code=exited, status=143)

Expected results:
- no output; service starts properly and keeps running


Additional info:
#cat   tec-dynamic.service
[Unit]
Description=tec dynamic-ip-update
Before=network.target
After=other-custom.service
After=syslog.target

[Service]
Type=simple
ExecStart=/usr/local/sbin/tec-systemd-wrapper /usr/local/sbin/dynamic-ip-update /var/log/dynamic-ip-update.log
StandardOutput=null
TimeoutSec=5
Restart=always

[Install]
WantedBy=multi-user.target


#cat /usr/local/sbin/tec-systemd-wrapper
# ya cheesy but systemd (at least at first) didn't give me a log option like inittab gave with a simple > redirect
# so I had to improvise in the meantime just to keep my custom daemons running
#!/usr/bin/perl -w

$|=1;

$ENV{'SHELL'}='/bin/bash';

$flog=pop();
if (!-f $flog) {
  open  F,">$flog" or die;
  close F;
}

chmod 0640,$flog or die;
exec join(' ',@ARGV)." >>$flog 2>&1";
Comment 1 Jan Synacek 2016-01-08 08:27:17 EST
I don't know about the disappearing stuff, but... You're not supposed to put services directly to /etc/systemd/system/multi-user.target.wants/. Put your service file to /etc/systemd/system/ and run "systemdctl enable <service>". Since your service file already contains the correct [Install] section, your service will be correctly enabled. Does this help?
Comment 2 Trevor Cordes 2016-01-19 02:50:00 EST
Haven't had this problem again, yet, since a couple of days after the report.  Perhaps some update somewhere solved it.  However, I'm not convinced so I'm waiting to see if it comes up again.  If it doesn't hit for a few months I guess it's fixed.

I will try relocating the service files as you suggest next time there is a problem to see if that helps.  However, whether a symlink or a real file systemd shouldn't care, but who knows what it's doing under the hood I guess.
Comment 3 Trevor Cordes 2016-02-02 02:06:47 EST
Just hit again on another box I run.

This time it was RIGHT after a dnf update that updated systemd:
was: systemd-219-26.fc22.i686
now: systemd-219-27.fc22.i686

#ll /etc/systemd/system/multi-user.target.wants/tec-restarter.service
-rw-r--r-- 1 root root 278 Apr  3  2015 /etc/systemd/system/multi-user.target.wants/tec-restarter.service

#systemctl restart tec-restarter.service
Failed to restart tec-restarter.service: Unit tec-restarter.service failed to load: No such file or directory.
Exit 6

#systemctl daemon-reload

#systemctl restart tec-restarter.service
Failed to restart tec-restarter.service: Unit tec-restarter.service failed to load: No such file or directory.
Exit 6

#systemctl status tec-restarter.service
* tec-restarter.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead) since Tue 2016-02-02 00:50:28 CST; 4min 39s ago
 Main PID: 14048 (code=killed, signal=TERM)
Exit 3


Looks like your idea fixes the problem!:


#mv /etc/systemd/system/multi-user.target.wants/tec-restarter.service /etc/systemd/system

#systemctl enable tec-restarter.service
Created symlink from /etc/systemd/system/multi-user.target.wants/tec-restarter.service to /etc/systemd/system/tec-restarter.service.

#ll /etc/systemd/system/multi-user.target.wants/tec-restarter.service
lrwxrwxrwx 1 root root 41 Feb  2 00:55 /etc/systemd/system/multi-user.target.wants/tec-restarter.service -> /etc/systemd/system/tec-restarter.service

#systemctl restart tec-restarter.service

#systemctl status tec-restarter.service                                        
* tec-restarter.service - tec restarter
   Loaded: loaded (/etc/systemd/system/tec-restarter.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2016-02-02 00:56:18 CST; 1min 54s ago


So I will move on all my boxes all my custom service files an enable the symlinks instead.

If no one cares that putting those files in multi-user.target.wants directly vs using the symlinks results in NON-DETERMINSTIC behaviour then we'll close this bug as NOTABUG.  I say non-deterministic because I have dozens of boxes each with about a dozen custom service files and a) worked great for, how long has systemd been in fedora, 4 years?; and b) this bug rarely pops up even with the myriad replications of the prerequisites; and c) renaming the service file but staying in multi-user.target.wants usually makes the symptom disappear; and d) the error message systemd is giving me is Really Bad (no such file? huh?).  I won't be the first or last person to put service files where they seem to go (remember, I did this before all the nice RH docs about this were in place)!

P.S. /etc/systemd/system is a real ugly place to put them... there should be a nice separate directory, rather than cluttering up the has-lots-of-subdirs-and-really-shouldn't-contain-files /etc/systemd/system.  Maybe /etc/systemd/system/local or something.  It's like systemd designers didn't think anyone would ever make their own service files, even though there's a ton of devs who used to have extensive /etc/inittab file contents.

Thanks!

Note You need to log in before you can comment on or make changes to this bug.