Bug 846150 - Systemd doesn't recognize service has died
Summary: Systemd doesn't recognize service has died
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 17
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-07 00:04 UTC by Matthaus Owens
Modified: 2013-08-01 18:08 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-08-01 18:08:07 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Matthaus Owens 2012-08-07 00:04:44 UTC
Description of problem:
When using Type=forking in a systemd service file, after the service has been started, if the service is killed using kill, systemctl status $service returns active and systemctl is-active $service also returns active.

The same syntax seems to work fine for sendmail, but fails for mcollective, so I'm not sure if it is related to how ruby forks or not.

Version-Release number of selected component (if applicable):
systemd version 44

How reproducible:

Steps to Reproduce:
1. Start mcollective.service (systemctl start mcollective.service)
2. Kill mcollective using correct pid
3. systemctl status mcollective.service returns active
4. systemctl is-active mcollective.service returns active
  
Actual results:
systemctl claims mcollective is active, even though its process is not in the process list

Expected results:
systemctl recognizes mcollective is dead

Additional info:
mcollective.service used...
[Unit]
Description=The Marionette Collective
After=network.target

[Service]
Type=forking
StandardOutput=syslog
StandardError=syslog
ExecStart=/usr/sbin/mcollectived --config=/etc/mcollective/server.cfg --pidfile=/var/run/mcollective.pid
ExecReload=/bin/kill -USR1 $MAINPID
PIDFile=/var/run/mcollective.pid

[Install]
WantedBy=multi-user.target


Console output...
[root@fc17 ~]# cat /usr/lib/systemd/system/mcollective.service 
[Unit]
Description=The Marionette Collective
After=network.target

[Service]
Type=forking
StandardOutput=syslog
StandardError=syslog
ExecStart=/usr/sbin/mcollectived --config=/etc/mcollective/server.cfg --pidfile=/var/run/mcollective.pid
ExecReload=/bin/kill -USR1 $MAINPID
PIDFile=/var/run/mcollective.pid

[Install]
WantedBy=multi-user.target
[root@fc17 ~]# systemctl start mcollective.service
[root@fc17 ~]# systemctl status mcollective.service
mcollective.service - The Marionette Collective
	  Loaded: loaded (/usr/lib/systemd/system/mcollective.service; disabled)
	  Active: active (running) since Mon, 06 Aug 2012 09:36:28 -0700; 5s ago
	 Process: 1111 ExecStart=/usr/sbin/mcollectived --config=/etc/mcollective/server.cfg --pidfile=/var/run/mcollective.pid (code=exited, status=0/SUCCESS)
	Main PID: 1119 (ruby)
	  CGroup: name=systemd:/system/mcollective.service
		  └ 1119 ruby /usr/sbin/mcollectived --config=/etc/mcollective/server.cfg --pidfile=/var/run/mcollective.pid

[root@fc17 ~]# ps -ef | grep mcollective
root      1119     1  0 09:36 ?        00:00:00 ruby /usr/sbin/mcollectived --config=/etc/mcollective/server.cfg --pidfile=/var/run/mcollective.pid
root      1137   775  0 09:36 pts/0    00:00:00 grep --color=auto mcollective
[root@fc17 ~]# kill 1119
[root@fc17 ~]# systemctl status mcollective.service
mcollective.service - The Marionette Collective
	  Loaded: loaded (/usr/lib/systemd/system/mcollective.service; disabled)
	  Active: active (running) since Mon, 06 Aug 2012 09:36:28 -0700; 24s ago
	 Process: 1111 ExecStart=/usr/sbin/mcollectived --config=/etc/mcollective/server.cfg --pidfile=/var/run/mcollective.pid (code=exited, status=0/SUCCESS)
	Main PID: 1119 (code=killed, signal=TERM)
	  CGroup: name=systemd:/system/mcollective.service

[root@fc17 ~]# ps -ef | grep mcollective
root      1144   775  0 09:36 pts/0    00:00:00 grep --color=auto mcollective

Comment 1 Michal Schmidt 2012-08-08 12:34:13 UTC
(In reply to comment #0)
> systemd version 44

In future bugreports please tell the exact package versions (rpm -q systemd mcollective).
I could reproduce the bug with:
systemd-44-17.fc17.x86_64
mcollective-2.0.0-3.fc17.noarch

Debug output when it happens:

systemd[1]: mcollective.service got final SIGCHLD for state start
systemd[1]: Main PID loaded: 15434
systemd[1]: mcollective.service: Supervising process 15434 which is not our child. We'll most li
systemd[1]: mcollective.service changed start -> running
systemd[1]: Job mcollective.service/start finished, result=done
systemd[1]: Got SIGCHLD for process 15431 (ruby)
systemd[1]: Child 15431 died (code=exited, status=0/SUCCESS)
...
systemd[1]: Received SIGCHLD from PID 15434 (ruby).
systemd[1]: Got SIGCHLD for process 15434 (ruby)
systemd[1]: Child 15434 died (code=exited, status=1/FAILURE)
systemd[1]: Child 15434 belongs to mcollective.service
systemd[1]: mcollective.service: main process exited, code=exited, status=1

The state of the service stays active.
The bug is not deterministic. I needed several tries to reproduce it.

Comment 2 Michal Schmidt 2012-08-08 12:43:32 UTC
The log continues with:

systemd[1]: Accepted connection on private bus.
systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/sys
systemd[1]: mcollective.service: cgroup is empty

but even then the service is still in active state.

Comment 3 Jóhann B. Guðmundsson 2013-06-15 16:49:10 UTC
unit status output still shows active/active after pid was killed with systemd-204-6.fc19.x86_64

# systemctl status mcollective.service
mcollective.service - The Marionette Collective
   Loaded: loaded (/usr/lib/systemd/system/mcollective.service; disabled)
   Active: active (running) (Result: signal) since Sat 2013-06-15 16:35:58 GMT; 5min ago

^^^ still shows active 

  Process: 8761 ExecStart=/usr/sbin/mcollectived --config=/etc/mcollective/server.cfg --pidfile=/var/run/mcollective.pid (code=exited, status=0/SUCCESS)
 Main PID: 8768 (code=killed, signal=KILL) <-- we know that MainPID has been killed 
   CGroup: name=systemd:/system/mcollective.service

Jun 15 16:35:58 localhost.localdomain systemd[1]: PID file /var/run/mcollective.pid not readable (yet?) after start.
Jun 15 16:35:58 localhost.localdomain systemd[1]: mcollective.service: Supervising process 8768 which is not our child. We'll most likely not notice when it exits.

^^^ Informative msg display that we might not notice when it exits 

Jun 15 16:35:58 localhost.localdomain systemd[1]: Started The Marionette Collective.
Jun 15 16:36:21 localhost.localdomain systemd[1]: mcollective.service: main process exited, code=killed, status=9/KILL <-- We do know that it has

So there still seems there is something to be fixed here...

Comment 4 Fedora End Of Life 2013-07-04 06:32:52 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Fedora End Of Life 2013-08-01 18:08:13 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.