Red Hat Bugzilla – Bug 845553
systemd stops services on suspend
Last modified: 2013-02-13 17:28:25 EST
Since some time back, I've had problems with my HTPC machine not working properly after resume. Some key services get terminated, which generally makes the machine unusable. Since they are terminated cleanly (SIGTERM), and being such a wide variety, I'm suspecting a systemd bug.
The services I've noticed are:
- NFS mount points
Looking in /var/log/messages, there are more though:
Aug 3 14:10:13 hugin NetworkManager: <info> caught signal 15, shutting down normally.
Aug 3 14:10:13 hugin avahi-daemon: Got SIGTERM, quitting.
Aug 3 14:10:14 hugin rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="941" x-info="http://www.rsyslog.com"] exiting on signal 15.
I also have this odd thing:
Aug 3 14:10:13 hugin dbus-daemon: /images (deleted): rcercrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcercercrcrcrcrcrcrcrcrercrcrcercercercercercercercercercercerercercercrcercercercercercercercercercercrcercercercercercercercercercercrcrcrcercercerercercercererercercercercrcrcrcrcrcrcercercercercercercrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcrcerce
This is very major bug as it makes suspend completely unusable on the machine.
Bug 788911 might also be related.
I'm currently running systemd-37-25.fc16.x86_64.
Created attachment 602121 [details]
Here's the entire log, covering both suspend and resume.
Noteworthy is the fact that this line is just before services got killed:
Aug 3 14:10:13 hugin systemd: Reexecuting.
Please boot with the following kernel command line parameters:
systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M
After reproducing the bug, please store the kernel log buffer:
dmesg > dmesg.txt
and attach the resulting file here. Thanks.
Created attachment 602247 [details]
dmesg (failed mount)
Created attachment 602248 [details]
/var/log/messages (failed mount)
dmesg and messages from when it broke the NFS mount points. No other killed services as far as I can tell. Will try to reproduce the fully broken behaviour.
Created attachment 602824 [details]
dmesg (dead NM and rsyslog)
Created attachment 602825 [details]
/var/log/messages (dead NM and rsyslog)
Got a proper failure today. Attached is dmesg and messages.
Those lirc errors indicate there is something wrong with your configuration file and nfs and suspend issue is nothing new and unless it has been fixed nfs shares need to be unmounted cleanly before suspended.
Moving this report against nfs-utils Steve will just bounce it back if this is an systemd issue...
Which logs are we talking about now? The first or second set?
In the second set, both NetworkManager and rsyslog have been killed. That's the more serious issue, and is not something that nfs-utils has any bearing on.
There's an interesting point after the wakeup:
[30784.854221] systemd: Received SIGTERM from PID 29523 (fuser).
Looks like some script calls "fuser -k ..." and this results in various processes being killed.
Hmm So to try to determine what's calling fuser -k Pierre needs to run ps -AwfH > before-suspend.txt then match the pid to that fuser kill?
xbmc is also segfaulting
[30787.388735] xbmc.bin: segfault at 74 ip 000000340be178cd sp 00007fff7d2f63e0 error 4 in libfreetype.so.6.7.1[340be00000+9a000]
(In reply to comment #11)
> Hmm So to try to determine what's calling fuser -k Pierre needs to run ps
> -AwfH > before-suspend.txt then match the pid to that fuser kill?
Isn't fuser a freshly started process? So catching its brief existence with ps seems tricky.
> xbmc is also segfaulting
> [30787.388735] xbmc.bin: segfault at 74 ip 000000340be178cd sp
> 00007fff7d2f63e0 error 4 in libfreetype.so.6.7.1[340be00000+9a000]
xbmc is a temperamental piece of software. It probably crashes when some service it is using disappears.
Seems indeed that somebody calls fuser -k. Have you tried just grepping through your whole file system tree for this?
I am pretty sure this is nothing to do with systemd. We certainly do not invoke fuser -k ever.
The only non-binary I can find that uses fuser is /etc/init.d/functions, which has it in the unmounting code. That code in turn is only used by the netfs service (i.e. unmounting NFS stuff).
I'm not sure how netfs is supposed to be invoked these days, as systemd has both netfs.service, remote-fs.service, as well as individual *.mount entries for everything. And NetworkManager still has its dispatch scripts that mount and unmount things.
To get further, I've replaced fuser with a shell script that logs when it is called, the arguments, and the parent process. Hopefully that can get us closer.
Ok, got the issue again, so I have some more clues:
This is what I got in my new log:
Fri Aug 10 13:04:46 CEST 2012 Invoked by 25485 (netfs): fuser -k -m -15 /images (deleted)
And if I run that command without the "-k" bit, I get a list of a lot of processes, including 1 (systemd). So this seems to be where things go very wrong.
Now two questions pop up:
1. Where does the "(deleted)" come from? Doing umount -l doesn't produce this. And you can't remove the mount point as long as the mount is still active.
2. Should netfs even be used? It isn't on a fresh boot. But once NM's dispatch scripts trigger, it will be activated.
(In reply to comment #15)
> Now two questions pop up:
> 1. Where does the "(deleted)" come from? Doing umount -l doesn't produce
> this. And you can't remove the mount point as long as the mount is still
(deleted) comes from path_with_deleted() in the kernel. Basically, the kernel is building a pathname from a dentry (a path component object). If that dentry has been unhashed (made unavailable for lookups) then it will get " (deleted)" appended when it builds the pathname from it.
> 2. Should netfs even be used? It isn't on a fresh boot. But once NM's
> dispatch scripts trigger, it will be activated.
I'd say no. Tearing down mounts just because the network interface is going down seems wrong to me. It's quite possible (likely even) that these mounts are busy when someone goes to suspend the machine. Having your processes catch SIGKILLs just because they have an open file on a network filesystem is not what most people expect when they suspend their machines.
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '16'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 16's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 16 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged to click on
"Clone This Bug" and open it against that version of Fedora.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.