1353039 – execstoppost commands are not run after a start failure

Bug 1353039 - execstoppost commands are not run after a start failure

Summary: execstoppost commands are not run after a start failure

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	systemd
Sub Component:
Version:	24
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	systemd-maint
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-07-05 21:01 UTC by Bruno Wolff III
Modified:	2017-08-08 15:23 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-08-08 15:23:54 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
systemctl show output (4.87 KB, text/plain) 2016-07-05 21:02 UTC, Bruno Wolff III	no flags	Details
Wireguard service file (440 bytes, text/plain) 2016-07-14 17:09 UTC, Bruno Wolff III	no flags	Details
View All

Description Bruno Wolff III 2016-07-05 21:01:12 UTC

User-Agent:       
Build Identifier: 

When starting a service that fails, according to the documentation execstoppost commands are supposed to be run, but this isn't happening.

Reproducible: Always

Steps to Reproduce:
1.Set up a service that will fail, that has an execstoppost command.
2.start the service
3.show the service
Actual Results:  
No commands were run after the firs execstart command failed.

Expected Results:  
After the first execstart failure, the execstoppost command should have been run.

Comment 1 Bruno Wolff III 2016-07-05 21:02:46 UTC

Created attachment 1176661 [details]
systemctl show output

Comment 2 Susant Sahani 2016-07-12 08:38:03 UTC

Isn't the man page say if service has started successfully then only 'ExecStartPost=' will run ?

https://www.freedesktop.org/software/systemd/man/systemd.service.html
ExecStartPost= commands are only run after the service has started successfully.

Comment 3 Jan Synacek 2016-07-12 10:12:33 UTC

> When starting a service that fails, according to the documentation execstoppost > commands are supposed to be run,

Actually, as Susant says, it's the opposite.

Comment 4 Jan Synacek 2016-07-12 10:13:42 UTC

I take that back, that's for ExecStartPost, not ExecStopPost.

Comment 5 Bruno Wolff III 2016-07-12 10:58:45 UTC

From that same page:
Note that if any of the commands specified in ExecStartPre=, ExecStart=, or ExecStartPost= fail (and are not prefixed with "-", see above) or time out before the service is fully up, execution continues with commands specified in ExecStopPost=, the commands in ExecStop= are skipped.

I am using a command in execstoppost to clean things up if the service only part way starts up, but it isn't being run.

Comment 6 Susant Sahani 2016-07-14 04:34:25 UTC

My Bad I misread it. I tested with execstoppost . Is this the way you are testing 
I am not able to reproduce. 

For example 

------------
 [Unit]                                                                                                                                                                                   
Description=exec stop post                                                                                                                                                               
                                                                                                                                                                                         
[Service]                                                                                                                                                                                
ExecStart=/usr/sbin/nginx                                                                                                                                                                
ExecStopPost=/usr/bin/echo "ExecStopPost"                                                                                                                                                
                                                                                                                                                                                         
[Install]                                                                                                                                                                                
WantedBy=multi-user.target     
--------------------------
                                                                                                                                                          
                            
Jul 14 10:01:40 rawhide nginx[2119]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Jul 14 10:01:40 rawhide nginx[2119]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Jul 14 10:01:40 rawhide nginx[2119]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Jul 14 10:01:41 rawhide nginx[2119]: nginx: [emerg] still could not bind()
Jul 14 10:01:41 rawhide systemd[1]: Received SIGCHLD from PID 2119 (nginx).
Jul 14 10:01:41 rawhide systemd[1]: Child 2119 (nginx) died (code=exited, status=1/FAILURE)
Jul 14 10:01:41 rawhide systemd[1]: test.service: Child 2119 belongs to test.service
Jul 14 10:01:41 rawhide systemd[1]: test.service: Main process exited, code=exited, status=1/FAILURE
Jul 14 10:01:41 rawhide systemd[1]: test.service: About to execute: /usr/bin/echo ExecStopPost
Jul 14 10:01:41 rawhide systemd[1]: test.service: Forked /usr/bin/echo as 2121
Jul 14 10:01:41 rawhide systemd[1]: test.service: Changed running -> stop-post
Jul 14 10:01:41 rawhide systemd[1]: Sent message type=signal sender=n/a destination=n/a o
Jul 14 10:01:41 rawhide systemd[2121]: test.service: Executing: /usr/bin/echo ExecStopPost
Jul 14 10:01:41 rawhide echo[2121]: ExecStopPost <=======================================
Jul 14 10:01:41 rawhide systemd[1]: Received SIGCHLD from PID 2121 (echo).
Jul 14 10:01:41 rawhide systemd[1]: Child 2121 (echo) died (code=exited, status=0/SUCCESS)
Jul 14 10:01:41 rawhide systemd[1]: test.service: Child 2121 belongs to test.service
Jul 14 10:01:41 rawhide systemd[1]: test.service: Control process exited, code=exited status=0
Jul 14 10:01:41 rawhide systemd[1]: test.service: Got final SIGCHLD for state stop-post.
Jul 14 10:01:41 rawhide systemd[1]: test.service: Changed stop-post -> failed

Jul 14 10:01:41 rawhide systemd[1]: test.service: Unit entered failed state.
Jul 14 10:01:41 rawhide systemd[1]: test.service: Failed with result 'exit-code'

Comment 7 Jan Synacek 2016-07-14 06:07:15 UTC

I can't reproduce this either.

$ rpm -q systemd
systemd-229-8.fc24.x86_64

$ cat /etc/systemd/system/test.service
[Unit]
Description=test
[Service]
ExecStart=/usr/bin/iamnothere.exe
ExecStopPost=/usr/bin/echo "ExecStopPost"

$ systemctl start test.service

$systemctl status test.service 
● test.service - test
   Loaded: loaded (/etc/systemd/system/test.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2016-07-14 07:59:52 CEST; 8s ago
  Process: 4544 ExecStopPost=/usr/bin/echo ExecStopPost (code=exited, status=0/SUCCESS)
  Process: 4540 ExecStart=/usr/bin/iamnothere.exe (code=exited, status=203/EXEC)
 Main PID: 4540 (code=exited, status=203/EXEC)

Jul 14 07:59:52 jsynacek-ntb systemd[1]: Started test.
Jul 14 07:59:52 jsynacek-ntb systemd[1]: test.service: Main process exited, code=exited, status=203/EXEC
Jul 14 07:59:52 jsynacek-ntb echo[4544]: ExecStopPost
Jul 14 07:59:52 jsynacek-ntb systemd[1]: test.service: Unit entered failed state.
Jul 14 07:59:52 jsynacek-ntb systemd[1]: test.service: Failed with result 'exit-code'.

Could you please provide exact reproducer steps that you use?

Comment 8 Bruno Wolff III 2016-07-14 17:07:59 UTC

I am seeing this on f24. I am also using a oneshot service that may account for the issue. I'll attach the service file. If you actually want to try wireguard, it is at wireguard.io, but just replacing the wg command with true or false and tweaking the ip commands to make sense in your environment will probably work for testing this.

Comment 9 Bruno Wolff III 2016-07-14 17:09:58 UTC

Created attachment 1179913 [details]
Wireguard service file

Comment 10 Bruno Wolff III 2016-07-14 17:17:25 UTC

It does look like the issue is related to being a oneshot service. Try:
[Unit]
Description=test
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/false
ExecStopPost=/usr/bin/echo "ExecStopPost"

Comment 11 Bruno Wolff III 2016-07-14 19:36:47 UTC

This might be related to a freedesktop bug where execstoppost isn't working woth forking units. https://bugs.freedesktop.org/show_bug.cgi?id=78240

Comment 12 Jan Synacek 2016-07-15 06:57:13 UTC

I don't think ExecStopPost is supposed to be working with oneshot units. Also, it definitely won't work with RemainAfterExit.

As the manpage says:
ExecStopPost=
    Additional commands that are executed after the service is stopped. This includes cases where the commands configured in ExecStop= were used, where the service does not have any ExecStop= defined, or where the service exited unexpectedly.

Your service has not exited unexpectedly (from the systemd's point of view), because it has RemainAfterExit.

Comment 13 Bruno Wolff III 2016-07-15 15:27:30 UTC

At the very least this is a documentation error. The documentation does not mention an exception for oneshot (or forking) services.
Note my complaint was about what happens when a service fails, not normal startup. systemd was certainly detecting a failure and remainafterexit shouldn't apply because there was not a successful startup.

Comment 14 Michal Sekletar 2016-07-18 16:05:55 UTC

ExecStopPost is executed only when unit transitions from running state (even when unit is killed) or when Main PID exits successfully while in activating state and service is not of forking type.

Difference is that in case of oneshot service while configured ExecStart actions are executing, service is still in activating state and not running. However in case of simple service, systemd immediately transitions service to running state, systemctl doesn't block and exit 0 is returned even when ExecStart fails (i.e. ExecStart=/bin/false). Thus oneshot service must exit cleanly in order for ExecStopPost to be called. AFAICT, option RemainAfterExit doesn't have an impact on ExecStopPost.

Comment 15 Bruno Wolff III 2016-07-18 17:01:56 UTC

That doesn't seem to match the documentation.
ExecStopPost seems to be the recommended way to do clean up and I would think it would be better to have it work as documented rather than change the documentation to note that it doesn't work in some failure cases.

Comment 16 Bruno Wolff III 2016-07-18 18:47:30 UTC

On a related note it is annoying that one shot services can't use restart=on-failure which could make sense for them.

Comment 17 Fedora End Of Life 2017-07-25 21:36:48 UTC

This message is a reminder that Fedora 24 is nearing its end of life.
Approximately 2 (two) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 24. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '24'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 24 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 18 Fedora End Of Life 2017-08-08 15:23:54 UTC

Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.