Bug 1310111 - [RFE] Sat6 services to be configured to restart on failure
[RFE] Sat6 services to be configured to restart on failure
Status: CLOSED WONTFIX
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Installer (Show other bugs)
6.1.6
All Linux
medium Severity medium (vote)
: Unspecified
: --
Assigned To: Katello Bug Bin
Katello QA List
: FutureFeature, Improvement, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-19 08:46 EST by Pavel Moravec
Modified: 2017-11-02 13:22 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-11-02 13:22:11 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Foreman Issue Tracker 16938 None None None 2016-10-14 09:46 EDT

  None (edit)
Description Pavel Moravec 2016-02-19 08:46:59 EST
Description of problem:
Request: 
configure Sat6 essential services to automatically restart on failure.

Reasoning:
Sat6 relies on various services that are essential for various functionality of the product. If such a service fails due to whatever reason (say, segfault), the functionality is temporarily disabled until an administrator intervention. That often comes only at the end of the sequence: some service failed -> some functionality doesnt work -> customer not notified / doesnt check logs -> after some time, they realize the functionality does not work -> raising support case to Red Hat -> takes time for us to identify the cause -> service restarted.

The functionality downtime and Red Hat support intervention is ridiculously high.

(Sat6 health-check script would alleviate this pain, to some extend. But even with that, the request will still be valid. Technically health-check script is just a different for of logs that doesnt restart failed service itself)

On technical level:
- not sure if applicable to RHEL6 where manual changes to each and every init script would have to be done. I am ok doing so for RHEL7 and updating systemd config only
- ideally, systemd service should be configured to restart any failed/killed/.. service several times in a row and then give up - or optionally try to restart the service with some nontrivial delay between the attempts
- essential/critical services: basically to cover "katello-service status" services


Version-Release number of selected component (if applicable):
6.1.6


How reproducible:
100%


Steps to Reproduce:
1. Mimic a service failure by killing it (an example: kill qdrouterd)
2. Wait some time to allow Sat to reheal
3. Ty the failed functionality (an example: install some errata that relies on qdrouterd)


Actual results:
3. fails regardless of the delay in 2.


Expected results:
3. to succeed after some time without any intervention


Additional info:
Comment 2 Bryan Kearney 2016-07-08 16:41:22 EDT
Per 6.3 planning, moving out non acked bugs to the backlog
Comment 4 Stephen Benjamin 2016-10-14 09:45:38 EDT
It's worth noting, Satellite team itself directly controls very few unit files. Most are shipped with the OS packages (httpd, qpid, tomcat, etc).
Comment 5 Stephen Benjamin 2016-10-14 09:46:04 EDT
Created redmine issue http://projects.theforeman.org/issues/16938 from this bug
Comment 6 Pavel Moravec 2016-10-14 11:26:29 EDT
(In reply to Stephen Benjamin from comment #4)
> It's worth noting, Satellite team itself directly controls very few unit
> files. Most are shipped with the OS packages (httpd, qpid, tomcat, etc).

.. but installer can configure the other unit files as well.
Comment 7 Stephen Benjamin 2016-10-14 12:49:47 EDT
Restart on failure is hiding legitimate bugs, and I'm against the idea (in general, I'm sure there's some case that makes sense).  But touching other projects unit files? Not the installer's responsibility AT ALL, even if it's systemd drop-ins.
Comment 8 Bryan Kearney 2017-11-02 13:21:36 EDT
Thank you for your interest in Satellite 6. We have evaluated this request, and we do not expect this to be implemented in product in the forseeable future. We are therefore closing this out as WONTFIX. If you have any concerns about this, please feel free to contact Rich Jerrido or Bryan Kearney. Thank you.
Comment 9 Bryan Kearney 2017-11-02 13:22:11 EDT
Thank you for your interest in Satellite 6. We have evaluated this request, and we do not expect this to be implemented in product in the forseeable future. We are therefore closing this out as WONTFIX. If you have any concerns about this, please feel free to contact Rich Jerrido or Bryan Kearney. Thank you.

Note You need to log in before you can comment on or make changes to this bug.