Bug 1512426
| Summary: | Pulp returns HTTP 500 errors during httpd's logrotate | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Beat Rubischon <brubisch> |
| Component: | Pulp | Assignee: | satellite6-bugs <satellite6-bugs> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Katello QA List <katello-qa-list> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | Unspecified | CC: | bbuckingham, cdonnell, jsherril, jturel, michael.orlov, peter.vreman, ttereshc |
| Target Milestone: | Unspecified | Keywords: | Triaged |
| Target Release: | Unused | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-10-18 20:12:04 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1516481, 1572298 | ||
| Bug Blocks: | 1122832 | ||
|
Description
Beat Rubischon
2017-11-13 08:44:51 UTC
Further feedback about the issue. The Actions::Pulp::Repository::DistributorPublish jobs are filed by a cron daily script and - if there are unpublished repos - executed at the same time as the httpd logrotate. A clash is almost unavoidable: ---8<--- # rpm -qf /etc/cron.daily/katello-repository-publish-check katello-common-3.0.0-26.el7sat.noarch # cat /etc/cron.daily/katello-repository-publish-check #!/bin/env bash # Script for checking for any repositories that have not been published since # their last sync, and republishing them foreman-rake katello:publish_unpublished_repositories RAILS_ENV=production >/dev/null 2>&1 ---8<--- If logrotate sends SIGHUP to httpd, it's expected that child httpd processes will be terminated and restarted, right? I guess SIGHUP is a default for logrotate, maybe more graceful restart of Apache will be better? SIGUSR1? https://httpd.apache.org/docs/2.4/stopping.html#graceful I'm not sure what Pulp can do here. Any suggestions from Katello side to handle downtime of Apache? Tanya, I had a quick check how logrotate restarts Apache:
# less /etc/logrotate.d/httpd
...
postrotate
/bin/systemctl reload httpd.service > /dev/null 2>/dev/null || true
# less /usr/lib/systemd/system/httpd.service
...
ExecReload=/usr/sbin/httpd $OPTIONS -k graceful
So we have already a graceful restart. The running requests should be served out correctly.
From my understanding Apache usually keeps new connections on hold until the backends are started again, at least I never had an issue with PHP or FastCGI based applications during a restart. We might have a situation in WSGI that requests are already handled before the Python application is ready.
But yes, the longer I think about this kind of issue the more I think Katello should do some retry when Pulp returns unexpected error messages.
HTH, Beat
We could add a retry for simple task status fetching (which i believe is what is the case here), but i'm hesitant to add it for all calls to pulp. That should at least get us through the restart and not 'block' the task monitoring piece. The root cause of the wsgi app failure is likely the same as in BZ#1516481. Update of gofer is needed to avoid this failure. Since most seem to think that this is indeed on the pulp / gofer side I have reassigned the component and marked it as dependent on BZ#1516481. Please let me know if it should in be a duplicate or something else. I think this can also be closed as fixed in 6.4. The dependent BZ https://bugzilla.redhat.com/show_bug.cgi?id=1572298 is already closed for 6.4 Thanks Peter! Closing this one based upon comment 13. The solution for bug 1572298 is indeed in 6.4. If there are any concerns or if the issue re-appears, please feel free to re-open. |