Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1512426

Summary:	Pulp returns HTTP 500 errors during httpd's logrotate
Product:	Red Hat Satellite	Reporter:	Beat Rubischon <brubisch>
Component:	Pulp	Assignee:	satellite6-bugs <satellite6-bugs>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Katello QA List <katello-qa-list>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	Unspecified	CC:	bbuckingham, cdonnell, jsherril, jturel, michael.orlov, peter.vreman, ttereshc
Target Milestone:	Unspecified	Keywords:	Triaged
Target Release:	Unused
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-10-18 20:12:04 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1516481, 1572298
Bug Blocks:	1122832

Description Beat Rubischon 2017-11-13 08:44:51 UTC

Description of problem:

logrotate send a HUP to httpd on a daily base to release and recreate logfiles after rotation. This triggers a restart of all Pulp related child processes of Apache, during the restart Katello gets a HTTP 500 error for any new jobs.

Version-Release number of selected component (if applicable):

Satellite 6.2.12

How reproducible:

at random times

Steps to Reproduce:
1. Add sync jobs to Satellite
2. Wait until Actions::Pulp::Repository::DistributorPublish happens at the same time as logrotate

Actual results:

Actions::Pulp::Repository::DistributorPublish fails with error

RestClient::InternalServerError
500 Internal Server Error 

Expected results:

Katello jobs do not fail or do a retry once Pulp is accepting requests again.

Additional info:

Comment 6 Beat Rubischon 2017-11-14 14:48:24 UTC

Further feedback about the issue. The Actions::Pulp::Repository::DistributorPublish jobs are filed by a cron daily script and - if there are unpublished repos - executed at the same time as the httpd logrotate. A clash is almost unavoidable:

---8<---
# rpm -qf /etc/cron.daily/katello-repository-publish-check
katello-common-3.0.0-26.el7sat.noarch
# cat /etc/cron.daily/katello-repository-publish-check 
#!/bin/env bash
# Script for checking for any repositories that have not been published since
#  their last sync, and republishing them

foreman-rake katello:publish_unpublished_repositories RAILS_ENV=production >/dev/null 2>&1
---8<---

Comment 8 Tanya Tereshchenko 2017-11-16 20:55:58 UTC

If logrotate sends SIGHUP to httpd, it's expected that child httpd processes will be terminated and restarted, right? I guess SIGHUP is a default for logrotate, maybe more graceful restart of Apache will be better? SIGUSR1? https://httpd.apache.org/docs/2.4/stopping.html#graceful

I'm not sure what Pulp can do here.
Any suggestions from Katello side to handle downtime of Apache?

Comment 9 Beat Rubischon 2017-11-17 09:28:55 UTC

Tanya, I had a quick check how logrotate restarts Apache:

# less /etc/logrotate.d/httpd 
...
    postrotate
        /bin/systemctl reload httpd.service > /dev/null 2>/dev/null || true

# less /usr/lib/systemd/system/httpd.service
...
ExecReload=/usr/sbin/httpd $OPTIONS -k graceful

So we have already a graceful restart. The running requests should be served out correctly.

From my understanding Apache usually keeps new connections on hold until the backends are started again, at least I never had an issue with PHP or FastCGI based applications during a restart. We might have a situation in WSGI that requests are already handled before the Python application is ready.

But yes, the longer I think about this kind of issue the more I think Katello should do some retry when Pulp returns unexpected error messages.

HTH, Beat

Comment 10 Justin Sherrill 2017-11-27 13:58:27 UTC

We could add a retry for simple task status fetching (which i believe is what is the case here), but i'm hesitant to add it for all calls to pulp.  That should at least get us through the restart and not 'block' the task monitoring piece.

Comment 11 Tanya Tereshchenko 2018-01-09 21:45:23 UTC

The root cause of the wsgi app failure is likely the same as in BZ#1516481. 
Update of gofer is needed to avoid this failure.

Comment 12 Jonathon Turel 2018-01-31 19:53:13 UTC

Since most seem to think that this is indeed on the pulp / gofer side I have reassigned the component and marked it as dependent on BZ#1516481. Please let me know if it should in be a duplicate or something else.

Comment 13 Peter Vreman 2018-10-18 15:12:16 UTC

I think this can also be closed as fixed in 6.4. The dependent BZ https://bugzilla.redhat.com/show_bug.cgi?id=1572298 is already closed for 6.4

Comment 14 Brad Buckingham 2018-10-18 20:12:04 UTC

Thanks Peter!

Closing this one based upon comment 13.  The solution for bug 1572298 is indeed in 6.4.  If there are any concerns or if the issue re-appears, please feel free to re-open.