Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
Request:
configure Sat6 essential services to automatically restart on failure.
Reasoning:
Sat6 relies on various services that are essential for various functionality of the product. If such a service fails due to whatever reason (say, segfault), the functionality is temporarily disabled until an administrator intervention. That often comes only at the end of the sequence: some service failed -> some functionality doesnt work -> customer not notified / doesnt check logs -> after some time, they realize the functionality does not work -> raising support case to Red Hat -> takes time for us to identify the cause -> service restarted.
The functionality downtime and Red Hat support intervention is ridiculously high.
(Sat6 health-check script would alleviate this pain, to some extend. But even with that, the request will still be valid. Technically health-check script is just a different for of logs that doesnt restart failed service itself)
On technical level:
- not sure if applicable to RHEL6 where manual changes to each and every init script would have to be done. I am ok doing so for RHEL7 and updating systemd config only
- ideally, systemd service should be configured to restart any failed/killed/.. service several times in a row and then give up - or optionally try to restart the service with some nontrivial delay between the attempts
- essential/critical services: basically to cover "katello-service status" services
Version-Release number of selected component (if applicable):
6.1.6
How reproducible:
100%
Steps to Reproduce:
1. Mimic a service failure by killing it (an example: kill qdrouterd)
2. Wait some time to allow Sat to reheal
3. Ty the failed functionality (an example: install some errata that relies on qdrouterd)
Actual results:
3. fails regardless of the delay in 2.
Expected results:
3. to succeed after some time without any intervention
Additional info:
(In reply to Stephen Benjamin from comment #4)
> It's worth noting, Satellite team itself directly controls very few unit
> files. Most are shipped with the OS packages (httpd, qpid, tomcat, etc).
.. but installer can configure the other unit files as well.
Restart on failure is hiding legitimate bugs, and I'm against the idea (in general, I'm sure there's some case that makes sense). But touching other projects unit files? Not the installer's responsibility AT ALL, even if it's systemd drop-ins.
Thank you for your interest in Satellite 6. We have evaluated this request, and we do not expect this to be implemented in product in the forseeable future. We are therefore closing this out as WONTFIX. If you have any concerns about this, please feel free to contact Rich Jerrido or Bryan Kearney. Thank you.
Thank you for your interest in Satellite 6. We have evaluated this request, and we do not expect this to be implemented in product in the forseeable future. We are therefore closing this out as WONTFIX. If you have any concerns about this, please feel free to contact Rich Jerrido or Bryan Kearney. Thank you.