User-Agent: Build Identifier: When starting a service that fails, according to the documentation execstoppost commands are supposed to be run, but this isn't happening. Reproducible: Always Steps to Reproduce: 1.Set up a service that will fail, that has an execstoppost command. 2.start the service 3.show the service Actual Results: No commands were run after the firs execstart command failed. Expected Results: After the first execstart failure, the execstoppost command should have been run.
Created attachment 1176661 [details] systemctl show output
Isn't the man page say if service has started successfully then only 'ExecStartPost=' will run ? https://www.freedesktop.org/software/systemd/man/systemd.service.html ExecStartPost= commands are only run after the service has started successfully.
> When starting a service that fails, according to the documentation execstoppost > commands are supposed to be run, Actually, as Susant says, it's the opposite.
I take that back, that's for ExecStartPost, not ExecStopPost.
From that same page: Note that if any of the commands specified in ExecStartPre=, ExecStart=, or ExecStartPost= fail (and are not prefixed with "-", see above) or time out before the service is fully up, execution continues with commands specified in ExecStopPost=, the commands in ExecStop= are skipped. I am using a command in execstoppost to clean things up if the service only part way starts up, but it isn't being run.
My Bad I misread it. I tested with execstoppost . Is this the way you are testing I am not able to reproduce. For example ------------ [Unit] Description=exec stop post [Service] ExecStart=/usr/sbin/nginx ExecStopPost=/usr/bin/echo "ExecStopPost" [Install] WantedBy=multi-user.target -------------------------- Jul 14 10:01:40 rawhide nginx[2119]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use) Jul 14 10:01:40 rawhide nginx[2119]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use) Jul 14 10:01:40 rawhide nginx[2119]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use) Jul 14 10:01:41 rawhide nginx[2119]: nginx: [emerg] still could not bind() Jul 14 10:01:41 rawhide systemd[1]: Received SIGCHLD from PID 2119 (nginx). Jul 14 10:01:41 rawhide systemd[1]: Child 2119 (nginx) died (code=exited, status=1/FAILURE) Jul 14 10:01:41 rawhide systemd[1]: test.service: Child 2119 belongs to test.service Jul 14 10:01:41 rawhide systemd[1]: test.service: Main process exited, code=exited, status=1/FAILURE Jul 14 10:01:41 rawhide systemd[1]: test.service: About to execute: /usr/bin/echo ExecStopPost Jul 14 10:01:41 rawhide systemd[1]: test.service: Forked /usr/bin/echo as 2121 Jul 14 10:01:41 rawhide systemd[1]: test.service: Changed running -> stop-post Jul 14 10:01:41 rawhide systemd[1]: Sent message type=signal sender=n/a destination=n/a o Jul 14 10:01:41 rawhide systemd[2121]: test.service: Executing: /usr/bin/echo ExecStopPost Jul 14 10:01:41 rawhide echo[2121]: ExecStopPost <======================================= Jul 14 10:01:41 rawhide systemd[1]: Received SIGCHLD from PID 2121 (echo). Jul 14 10:01:41 rawhide systemd[1]: Child 2121 (echo) died (code=exited, status=0/SUCCESS) Jul 14 10:01:41 rawhide systemd[1]: test.service: Child 2121 belongs to test.service Jul 14 10:01:41 rawhide systemd[1]: test.service: Control process exited, code=exited status=0 Jul 14 10:01:41 rawhide systemd[1]: test.service: Got final SIGCHLD for state stop-post. Jul 14 10:01:41 rawhide systemd[1]: test.service: Changed stop-post -> failed Jul 14 10:01:41 rawhide systemd[1]: test.service: Unit entered failed state. Jul 14 10:01:41 rawhide systemd[1]: test.service: Failed with result 'exit-code'
I can't reproduce this either. $ rpm -q systemd systemd-229-8.fc24.x86_64 $ cat /etc/systemd/system/test.service [Unit] Description=test [Service] ExecStart=/usr/bin/iamnothere.exe ExecStopPost=/usr/bin/echo "ExecStopPost" $ systemctl start test.service $systemctl status test.service ● test.service - test Loaded: loaded (/etc/systemd/system/test.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2016-07-14 07:59:52 CEST; 8s ago Process: 4544 ExecStopPost=/usr/bin/echo ExecStopPost (code=exited, status=0/SUCCESS) Process: 4540 ExecStart=/usr/bin/iamnothere.exe (code=exited, status=203/EXEC) Main PID: 4540 (code=exited, status=203/EXEC) Jul 14 07:59:52 jsynacek-ntb systemd[1]: Started test. Jul 14 07:59:52 jsynacek-ntb systemd[1]: test.service: Main process exited, code=exited, status=203/EXEC Jul 14 07:59:52 jsynacek-ntb echo[4544]: ExecStopPost Jul 14 07:59:52 jsynacek-ntb systemd[1]: test.service: Unit entered failed state. Jul 14 07:59:52 jsynacek-ntb systemd[1]: test.service: Failed with result 'exit-code'. Could you please provide exact reproducer steps that you use?
I am seeing this on f24. I am also using a oneshot service that may account for the issue. I'll attach the service file. If you actually want to try wireguard, it is at wireguard.io, but just replacing the wg command with true or false and tweaking the ip commands to make sense in your environment will probably work for testing this.
Created attachment 1179913 [details] Wireguard service file
It does look like the issue is related to being a oneshot service. Try: [Unit] Description=test [Service] Type=oneshot RemainAfterExit=yes ExecStart=/bin/false ExecStopPost=/usr/bin/echo "ExecStopPost"
This might be related to a freedesktop bug where execstoppost isn't working woth forking units. https://bugs.freedesktop.org/show_bug.cgi?id=78240
I don't think ExecStopPost is supposed to be working with oneshot units. Also, it definitely won't work with RemainAfterExit. As the manpage says: ExecStopPost= Additional commands that are executed after the service is stopped. This includes cases where the commands configured in ExecStop= were used, where the service does not have any ExecStop= defined, or where the service exited unexpectedly. Your service has not exited unexpectedly (from the systemd's point of view), because it has RemainAfterExit.
At the very least this is a documentation error. The documentation does not mention an exception for oneshot (or forking) services. Note my complaint was about what happens when a service fails, not normal startup. systemd was certainly detecting a failure and remainafterexit shouldn't apply because there was not a successful startup.
ExecStopPost is executed only when unit transitions from running state (even when unit is killed) or when Main PID exits successfully while in activating state and service is not of forking type. Difference is that in case of oneshot service while configured ExecStart actions are executing, service is still in activating state and not running. However in case of simple service, systemd immediately transitions service to running state, systemctl doesn't block and exit 0 is returned even when ExecStart fails (i.e. ExecStart=/bin/false). Thus oneshot service must exit cleanly in order for ExecStopPost to be called. AFAICT, option RemainAfterExit doesn't have an impact on ExecStopPost.
That doesn't seem to match the documentation. ExecStopPost seems to be the recommended way to do clean up and I would think it would be better to have it work as documented rather than change the documentation to note that it doesn't work in some failure cases.
On a related note it is annoying that one shot services can't use restart=on-failure which could make sense for them.
This message is a reminder that Fedora 24 is nearing its end of life. Approximately 2 (two) weeks from now Fedora will stop maintaining and issuing updates for Fedora 24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '24'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 24 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.