Bug 1395102
| Summary: | sendmail won't start after system reboot. Dies with a message "service has been restarted too many times" | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Salvatore Bognanni <sbognann> |
| Component: | sendmail | Assignee: | Jaroslav Škarvada <jskarvad> |
| Status: | CLOSED ERRATA | QA Contact: | Roman Žilka <rzilka> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.2 | CC: | amahdal, chorn, psklenar, rzilka |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | sendmail-8.14.7-5.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-01 12:42:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Salvatore Bognanni
2016-11-15 07:01:48 UTC
Is this still seen on 7.3? Any idea what is different on your system from other systems who are not experiencing this? It's probably caused by multiple restarts initiated by NetworkManager dispatcher. The workaround is to unlimit the number of allowed restarts in service file. But it's not reproducible everywhere, e.g. I wasn't able to reproduce it on my systems. Unfortunately the version of systemd used in RHEL-7 seems too old, so we have to use the old StartLimitInterval option in section [service] instead of the new option StartLimitIntervalSec from section [unit] as we used in Fedora. But this shouldn't be problem because the new systemd keeps the old option for backward compatibility (but it's undocumented). For details see bug 1422771. (In reply to Christian Horn from comment #3) > Is this still seen on 7.3? > Any idea what is different on your system from other systems who are not > experiencing this? Yes it's still happening on 7.3 Thanks for picking this up! We have a case with an (according to customer feedback successful) attempt to work around the issue in deploying 2 files as follows: # head -30 /etc/systemd/system/*.service.d/boot-after-NetworkManager-wait-online.conf ==> /etc/systemd/system/sendmail.service.d/boot-after-NetworkManager-wait-online.conf <== [Unit] After=NetworkManager-wait-online.service [Service] StartLimitInterval=0 ==> /etc/systemd/system/sm-client.service.d/boot-after-NetworkManager-wait-online.conf <== [Unit] After=NetworkManager-wait-online.service [Service] StartLimitInterval=0 Comments on this approach are welcome. (In reply to Christian Horn from comment #11) > Thanks for picking this up! > NP > We have a case with an (according to customer feedback successful) attempt > to work around the issue in deploying 2 files as follows: > > # head -30 > /etc/systemd/system/*.service.d/boot-after-NetworkManager-wait-online.conf > ==> > /etc/systemd/system/sendmail.service.d/boot-after-NetworkManager-wait-online. > conf <== > [Unit] > After=NetworkManager-wait-online.service > > [Service] > StartLimitInterval=0 > > ==> > /etc/systemd/system/sm-client.service.d/boot-after-NetworkManager-wait- > online.conf <== > [Unit] > After=NetworkManager-wait-online.service > > [Service] > StartLimitInterval=0 > > Comments on this approach are welcome. I think the most important is: > [Service] > StartLimitInterval=0 Our workaround also consist of adding it to the both sendmail.service and sm-client.service as customer did. I think it should work without NetworkManager-wait-online dep, it will be probably only restarted few more times which shouldn't be problem. Feel free to test the existing build of sendmail-8.14.7-5.el7 in advance (of course it's unsupported until officially released). For the future it will require more robust fix, e.g. some proxy accumulating the restart attempts and restarting in max rate of e.g. 1 restart per 10 seconds or similar approach, but such bigger code changes may be too invasive for stable RHEL release and needs to be checked in Fedora at first. QA: Verified.
******************** sendmail-8.14.7-4.el7 (faulty):
# cat /etc/NetworkManager/dispatcher.d/10-sendmail
#!/bin/sh
case "$2" in
up|down|vpn-up|vpn-down)
/bin/systemctl try-restart sendmail.service || :
;;
esac
# while true; do systemctl status sendmail|grep Active:; systemctl try-restart sendmail; sleep 1; done
Active: active (running) since Fri 2017-05-19 04:09:41 EDT; 1min 45s ago
Active: active (running) since Fri 2017-05-19 04:11:27 EDT; 1s ago
Active: active (running) since Fri 2017-05-19 04:11:28 EDT; 1s ago
Active: active (running) since Fri 2017-05-19 04:11:29 EDT; 1s ago
Active: active (running) since Fri 2017-05-19 04:11:30 EDT; 1s ago
Active: active (running) since Fri 2017-05-19 04:11:31 EDT; 1s ago
Job for sendmail.service failed because start of the service was attempted too often. See "systemctl status sendmail.service" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed sendmail.service" followed by "systemctl start sendmail.service" again.
Active: failed (Result: start-limit) since Fri 2017-05-19 04:11:32 EDT; 1s ago
Active: failed (Result: start-limit) since Fri 2017-05-19 04:11:32 EDT; 2s ago
Active: failed (Result: start-limit) since Fri 2017-05-19 04:11:32 EDT; 3s ago
Active: failed (Result: start-limit) since Fri 2017-05-19 04:11:32 EDT; 4s ago
Active: failed (Result: start-limit) since Fri 2017-05-19 04:11:32 EDT; 5s ago
******************** sendmail-8.14.7-5.el7 (fixed):
# cat /etc/NetworkManager/dispatcher.d/10-sendmail
#!/bin/sh
case "$2" in
up|down|vpn-up|vpn-down)
/bin/systemctl --no-block try-restart sendmail.service || :
;;
esac
# while true; do systemctl status sendmail|grep Active:; systemctl try-restart sendmail; sleep 1; done
Active: active (running) since Fri 2017-05-19 03:53:38 EDT; 17min ago
Active: active (running) since Fri 2017-05-19 04:11:26 EDT; 1s ago
Active: active (running) since Fri 2017-05-19 04:11:27 EDT; 1s ago
Active: active (running) since Fri 2017-05-19 04:11:28 EDT; 1s ago
Active: active (running) since Fri 2017-05-19 04:11:29 EDT; 1s ago
Active: active (running) since Fri 2017-05-19 04:11:30 EDT; 1s ago
Active: active (running) since Fri 2017-05-19 04:11:31 EDT; 1s ago
Active: active (running) since Fri 2017-05-19 04:11:32 EDT; 1s ago
Active: active (running) since Fri 2017-05-19 04:11:33 EDT; 1s ago
Active: active (running) since Fri 2017-05-19 04:11:34 EDT; 1s ago
Active: active (running) since Fri 2017-05-19 04:11:35 EDT; 1s ago
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2197 |