Bug 744621 - systemd apparently doesn't reap zombies
Summary: systemd apparently doesn't reap zombies
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 16
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Lennart Poettering
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-09 17:50 UTC by Andy Burns
Modified: 2011-10-11 01:09 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-10-11 01:09:44 UTC
Type: ---


Attachments (Terms of Use)
Systemd unit for mythtvbackend (374 bytes, text/plain)
2011-10-10 07:01 UTC, Jóhann B. Guðmundsson
no flags Details


Links
System ID Private Priority Status Summary Last Updated
FreeDesktop.org 41625 0 None None None Never

Description Andy Burns 2011-10-09 17:50:58 UTC
Description of problem:
=======================

When I have problems (e.g. with a hung mythbackend process) killing it leaves it <defunct> if I kill the parent process, these end up owned by PID 1 (i.e. systemd) but unlike init, it appears that systemd does not reap these zombies

example "ps" info in fd.o bugzilla

Version-Release number of selected component (if applicable):
=============================================================
systemd.x86_64                        35-1.fc16                 @fedora
systemd-sysv.x86_64                   35-1.fc16                 @fedora
systemd-units.x86_64                  35-1.fc16                 @fedora

Comment 1 Michal Schmidt 2011-10-09 23:29:37 UTC
systemd reaps zombies.
Do you get a zombie out of 'sleep' if you run it this way?:
( sleep 5 & )

Does 'systemctl' print the list of units normally, or does it say "Connection refused"?

Comment 2 Andy Burns 2011-10-10 06:32:42 UTC
The sleep command above does not result in a zombie, so reaping does normally happen.

The last time I has the problem systemctl did show proper output rather than an error message.

I've only seen this with mythbackend, which is still started as a daemon using a script not a systemd unit, could this be significant?  Or is it likely that a V4L device that mythbackend is using (mainline kernel driver, not a tainted proprietary one) has caused the process to "lockup" in a way that systemd cannot do anything about?

I have little doubt this will re-occur, is there any useful logging I can enable in advance, or collect after it happens?  Other than the mythbackend zombie, the system remains usable, though it won't shutdown cleanly due to the zombie.

Comment 3 Jóhann B. Guðmundsson 2011-10-10 07:01:05 UTC
Created attachment 527168 [details]
Systemd unit for mythtvbackend

Migrated this one a while back and it was just collecting dust locally since I dont have mythtv setup running yet thus I have not properly tested it yet or filed it upstream for that matter.

Anyway you can try it out by running systemctl stop mythtvbackend.service then drop the file into /etc/systemd/system/ directory and run systemctl daemon-reload && systemctl start mythbackend.service and systemctl enable mythtvbackend.service if it works for you and due make note of that here so I can either improve the unit and or file it upstream.

Thanks

Comment 4 Jóhann B. Guðmundsson 2011-10-10 07:04:24 UTC
Btw you should update your computer given that 36.3 release of systemd should already be in stable...

Comment 5 Andy Burns 2011-10-10 07:18:05 UTC
(In reply to comment #3)

> Migrated this one a while back and it was just collecting dust locally

Thanks, I was just considering the possibility of writing a unit for mythtv, but testing yours should be a gentler introduction to the internals of systemd.

Comment 6 Michal Schmidt 2011-10-10 08:46:44 UTC
(In reply to comment #2)
> I have little doubt this will re-occur, is there any useful logging I can
> enable in advance, or collect after it happens?

Yes. Please boot with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg". When a zombie appears, note its PID and save the log using "dmesg > dmesg.txt". Look for the "Got SIGCHLD for process" messages in the log and what follows them, or just attach the whole log here and tell us the PIDs of the zombies.

(In reply to comment #3)
> Created attachment 527168 [details]
> Systemd unit for mythtvbackend

Creating and testing systemd units is commendable, but let's not muddy the issue here. For all we know, using the native unit might hide the bug and we don't want bugs hidden, but fixed.

Comment 7 Andy Burns 2011-10-10 09:50:21 UTC
(In reply to comment #6)

> let's not muddy the
> issue here. For all we know, using the native unit might hide the bug and we
> don't want bugs hidden, but fixed.

@Michal 

Sorry, I didn't read ahead ... I have already installed Johann's unit and enabled the extra systemd logging, mythbackend starts OK, now I'll attempt to provoke it into being a zombie, if it fails to take the bait I promise I'll revert to the SYSV script to hunt the bug

@Johann

I changed the environment variables from /etc/mythtv to /var/lib/mythtv

Environment=MYTHCONFDIR=/var/lib/mythtv
Environment=HOME=/var/lib/mythtv

I install mythtv from ATRPMS, this might explain a different path used by other 3rd pary repos.

[ 2560.559176] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.StartUnit() on /org/freedesktop/systemd1
[ 2560.559192] systemd[1]: Trying to enqueue job mythbackend.service/start/replace
[ 2560.559325] systemd[1]: Installed new job mythbackend.service/start as 476
[ 2560.559334] systemd[1]: Enqueued job mythbackend.service/start as 476
[ 2560.559392] systemd[1]: About to execute: /usr/bin/mythbackend --daemon --logfile /var/log/mythtv/mythbackend.log --pidfile /run/mythbackend.pid
[ 2560.560740] systemd[1]: Forked /usr/bin/mythbackend as 1350
[ 2560.560862] systemd[1]: mythbackend.service changed dead -> start
[ 2560.567224] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.GetUnit() on /org/freedesktop/systemd1
[ 2560.567352] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Properties.Get() on /org/freedesktop/systemd1/unit/mythbackend_2eservice
[ 2560.627586] systemd[1]: Received SIGCHLD from PID 1350 (mythbackend).
[ 2560.627634] systemd[1]: Got SIGCHLD for process 1350 (mythbackend)
[ 2560.627698] systemd[1]: Child 1350 died (code=exited, status=0/SUCCESS)
[ 2560.627708] systemd[1]: Child 1350 belongs to mythbackend.service
[ 2560.627719] systemd[1]: mythbackend.service: control process exited, code=exited status=0
[ 2560.627725] systemd[1]: mythbackend.service got final SIGCHLD for state start
[ 2560.627868] systemd[1]: mythbackend.service changed start -> running
[ 2560.627882] systemd[1]: Job mythbackend.service/start finished, result=done

Comment 8 Lennart Poettering 2011-10-11 01:09:44 UTC
Closing, since this got discussed in 

https://bugs.freedesktop.org/show_bug.cgi?id=41625

already.


Note You need to log in before you can comment on or make changes to this bug.