Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1512426 - Pulp returns HTTP 500 errors during httpd's logrotate
Summary: Pulp returns HTTP 500 errors during httpd's logrotate
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: Unspecified
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: Katello QA List
URL:
Whiteboard:
Depends On: 1516481 1572298
Blocks: 1122832
TreeView+ depends on / blocked
 
Reported: 2017-11-13 08:44 UTC by Beat Rubischon
Modified: 2023-03-24 13:54 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-18 20:12:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Beat Rubischon 2017-11-13 08:44:51 UTC
Description of problem:

logrotate send a HUP to httpd on a daily base to release and recreate logfiles after rotation. This triggers a restart of all Pulp related child processes of Apache, during the restart Katello gets a HTTP 500 error for any new jobs.

Version-Release number of selected component (if applicable):

Satellite 6.2.12

How reproducible:

at random times

Steps to Reproduce:
1. Add sync jobs to Satellite
2. Wait until Actions::Pulp::Repository::DistributorPublish happens at the same time as logrotate

Actual results:

Actions::Pulp::Repository::DistributorPublish fails with error

RestClient::InternalServerError
500 Internal Server Error 

Expected results:

Katello jobs do not fail or do a retry once Pulp is accepting requests again.

Additional info:

Comment 6 Beat Rubischon 2017-11-14 14:48:24 UTC
Further feedback about the issue. The Actions::Pulp::Repository::DistributorPublish jobs are filed by a cron daily script and - if there are unpublished repos - executed at the same time as the httpd logrotate. A clash is almost unavoidable:

---8<---
# rpm -qf /etc/cron.daily/katello-repository-publish-check
katello-common-3.0.0-26.el7sat.noarch
# cat /etc/cron.daily/katello-repository-publish-check 
#!/bin/env bash
# Script for checking for any repositories that have not been published since
#  their last sync, and republishing them

foreman-rake katello:publish_unpublished_repositories RAILS_ENV=production >/dev/null 2>&1
---8<---

Comment 8 Tanya Tereshchenko 2017-11-16 20:55:58 UTC
If logrotate sends SIGHUP to httpd, it's expected that child httpd processes will be terminated and restarted, right? I guess SIGHUP is a default for logrotate, maybe more graceful restart of Apache will be better? SIGUSR1? https://httpd.apache.org/docs/2.4/stopping.html#graceful

I'm not sure what Pulp can do here.
Any suggestions from Katello side to handle downtime of Apache?

Comment 9 Beat Rubischon 2017-11-17 09:28:55 UTC
Tanya, I had a quick check how logrotate restarts Apache:

# less /etc/logrotate.d/httpd 
...
    postrotate
        /bin/systemctl reload httpd.service > /dev/null 2>/dev/null || true

# less /usr/lib/systemd/system/httpd.service
...
ExecReload=/usr/sbin/httpd $OPTIONS -k graceful

So we have already a graceful restart. The running requests should be served out correctly.

From my understanding Apache usually keeps new connections on hold until the backends are started again, at least I never had an issue with PHP or FastCGI based applications during a restart. We might have a situation in WSGI that requests are already handled before the Python application is ready.

But yes, the longer I think about this kind of issue the more I think Katello should do some retry when Pulp returns unexpected error messages.

HTH, Beat

Comment 10 Justin Sherrill 2017-11-27 13:58:27 UTC
We could add a retry for simple task status fetching (which i believe is what is the case here), but i'm hesitant to add it for all calls to pulp.  That should at least get us through the restart and not 'block' the task monitoring piece.

Comment 11 Tanya Tereshchenko 2018-01-09 21:45:23 UTC
The root cause of the wsgi app failure is likely the same as in BZ#1516481. 
Update of gofer is needed to avoid this failure.

Comment 12 Jonathon Turel 2018-01-31 19:53:13 UTC
Since most seem to think that this is indeed on the pulp / gofer side I have reassigned the component and marked it as dependent on BZ#1516481. Please let me know if it should in be a duplicate or something else.

Comment 13 Peter Vreman 2018-10-18 15:12:16 UTC
I think this can also be closed as fixed in 6.4. The dependent BZ https://bugzilla.redhat.com/show_bug.cgi?id=1572298 is already closed for 6.4

Comment 14 Brad Buckingham 2018-10-18 20:12:04 UTC
Thanks Peter!

Closing this one based upon comment 13.  The solution for bug 1572298 is indeed in 6.4.  If there are any concerns or if the issue re-appears, please feel free to re-open.


Note You need to log in before you can comment on or make changes to this bug.