Bug 1310111 - [RFE] Sat6 services to be configured to restart on failure
Summary: [RFE] Sat6 services to be configured to restart on failure
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Installer
Version: 6.1.6
Hardware: All
OS: Linux
medium
medium vote
Target Milestone: Unspecified
Assignee: Katello Bug Bin
QA Contact: Katello QA List
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-19 13:46 UTC by Pavel Moravec
Modified: 2017-11-02 17:22 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-02 17:22:11 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 16938 0 None None None 2016-10-14 13:46:05 UTC

Description Pavel Moravec 2016-02-19 13:46:59 UTC
Description of problem:
Request: 
configure Sat6 essential services to automatically restart on failure.

Reasoning:
Sat6 relies on various services that are essential for various functionality of the product. If such a service fails due to whatever reason (say, segfault), the functionality is temporarily disabled until an administrator intervention. That often comes only at the end of the sequence: some service failed -> some functionality doesnt work -> customer not notified / doesnt check logs -> after some time, they realize the functionality does not work -> raising support case to Red Hat -> takes time for us to identify the cause -> service restarted.

The functionality downtime and Red Hat support intervention is ridiculously high.

(Sat6 health-check script would alleviate this pain, to some extend. But even with that, the request will still be valid. Technically health-check script is just a different for of logs that doesnt restart failed service itself)

On technical level:
- not sure if applicable to RHEL6 where manual changes to each and every init script would have to be done. I am ok doing so for RHEL7 and updating systemd config only
- ideally, systemd service should be configured to restart any failed/killed/.. service several times in a row and then give up - or optionally try to restart the service with some nontrivial delay between the attempts
- essential/critical services: basically to cover "katello-service status" services


Version-Release number of selected component (if applicable):
6.1.6


How reproducible:
100%


Steps to Reproduce:
1. Mimic a service failure by killing it (an example: kill qdrouterd)
2. Wait some time to allow Sat to reheal
3. Ty the failed functionality (an example: install some errata that relies on qdrouterd)


Actual results:
3. fails regardless of the delay in 2.


Expected results:
3. to succeed after some time without any intervention


Additional info:

Comment 2 Bryan Kearney 2016-07-08 20:41:22 UTC
Per 6.3 planning, moving out non acked bugs to the backlog

Comment 4 Stephen Benjamin 2016-10-14 13:45:38 UTC
It's worth noting, Satellite team itself directly controls very few unit files. Most are shipped with the OS packages (httpd, qpid, tomcat, etc).

Comment 5 Stephen Benjamin 2016-10-14 13:46:04 UTC
Created redmine issue http://projects.theforeman.org/issues/16938 from this bug

Comment 6 Pavel Moravec 2016-10-14 15:26:29 UTC
(In reply to Stephen Benjamin from comment #4)
> It's worth noting, Satellite team itself directly controls very few unit
> files. Most are shipped with the OS packages (httpd, qpid, tomcat, etc).

.. but installer can configure the other unit files as well.

Comment 7 Stephen Benjamin 2016-10-14 16:49:47 UTC
Restart on failure is hiding legitimate bugs, and I'm against the idea (in general, I'm sure there's some case that makes sense).  But touching other projects unit files? Not the installer's responsibility AT ALL, even if it's systemd drop-ins.

Comment 8 Bryan Kearney 2017-11-02 17:21:36 UTC
Thank you for your interest in Satellite 6. We have evaluated this request, and we do not expect this to be implemented in product in the forseeable future. We are therefore closing this out as WONTFIX. If you have any concerns about this, please feel free to contact Rich Jerrido or Bryan Kearney. Thank you.

Comment 9 Bryan Kearney 2017-11-02 17:22:11 UTC
Thank you for your interest in Satellite 6. We have evaluated this request, and we do not expect this to be implemented in product in the forseeable future. We are therefore closing this out as WONTFIX. If you have any concerns about this, please feel free to contact Rich Jerrido or Bryan Kearney. Thank you.


Note You need to log in before you can comment on or make changes to this bug.