Bug 796264

Summary: Katello task lookup request times out waiting for RHEL mirrors to promote
Product: Red Hat Satellite Reporter: Og Maciel <omaciel>
Component: WebUIAssignee: Justin Sherrill <jsherril>
Status: CLOSED CURRENTRELEASE QA Contact: Og Maciel <omaciel>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.0.0CC: bkearney, jrist, pkilambi
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-22 18:28:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Web UI displaying error
none
Traceback none

Description Og Maciel 2012-02-22 15:19:09 UTC
Created attachment 565005 [details]
Web UI displaying error

Description of problem:

Katello's task lookup request during a repository promotion times out if a task takes too long to run. I'm not sure what the limit is in katello, but when promoting 3 RHEL repositories, this process will take more than 1-2 hours. The web ui tells me that my promotion has failed, but eventually pulp.log shows that the promotion succeeded.

Version-Release number of selected component (if applicable):

* candlepin-0.5.20-1.el6.noarch
* candlepin-tomcat6-0.5.20-1.el6.noarch
* katello-0.1.238-4.el6.noarch
* katello-all-0.1.238-4.el6.noarch
* katello-certs-tools-1.0.2-2.el6.noarch
* katello-cli-0.1.54-2.el6.noarch
* katello-cli-common-0.1.54-2.el6.noarch
* katello-common-0.1.238-4.el6.noarch
* katello-configure-0.1.64-5.el6.noarch
* katello-glue-candlepin-0.1.238-4.el6.noarch
* katello-glue-foreman-0.1.238-4.el6.noarch
* katello-glue-pulp-0.1.238-4.el6.noarch
* katello-httpd-ssl-key-pair-1.0-1.noarch
* katello-qpid-broker-key-pair-1.0-1.noarch
* katello-repos-0.1.5-1.el6.noarch
* katello-selinux-0.1.5-2.el6.noarch
* katello-trusted-ssl-cert-1.0-1.noarch
* pulp-0.0.265-1.el6.noarch
* pulp-admin-0.0.265-1.el6.noarch
* pulp-client-lib-0.0.265-1.el6.noarch
* pulp-common-0.0.265-1.el6.noarch
* pulp-selinux-server-0.0.265-1.el6.noarch

How reproducible:


Steps to Reproduce:
1. Created Seattle organization with Dev1, QA1 and GA1 environments
2. Added custom provider with a single product and 2 repositories pointing to el6-se and el6-tools content from latest puddle
3. Promoted all of the product to all environments, one at a time
4. Added a manifest and selected:
* Red Hat Enterprise Linux 6 Server RPMs x86_64 6Server
* Red Hat Enterprise Linux 6 Server RPMs x86_64 6.2
* Red Hat Enterprise Linux 6 Server RPMs x86_64 6.1
5. Synchronized all of the selected RHEL repositories
6. Added new filter and added httpd against all available RHEL repositories
7. Added new promotion and promoted only the RHEL content

Actual results:

After a long time the web ui showed (see attached screenshot) the following error message:

Failed to promote changeset 'promo-1-1'. Check notices for more details

Expected results:


Additional info:

When this problem happened, the following message was displayed in pulp.log:

pulp.server.api.synchronizers:INFO: synchronizers:829 Running createrepo, this may take a few minutes to complete.

katello/production.log had the following (full traceback attached) error:

Pulp::Task: Request Timeout

Comment 1 Og Maciel 2012-02-22 15:20:18 UTC
Created attachment 565006 [details]
Traceback

Traceback

Comment 2 Og Maciel 2012-02-22 15:23:14 UTC
delayed_jobs.log:

2012-02-22T09:47:51-0500: [Worker(delayed_job host:qetello03.usersys.redhat.com pid:11767)] Changeset#promote_content failed with RestClient::RequestTimeout: Pulp::Task: Request Timeout  (GET /pulp/api/tasks/?state=archived&state=current&id=1cd9785c-5d5b-11e1-959c-5254002b8762) - 0 failed attempts
2012-02-22T09:47:51-0500: [Worker(delayed_job host:qetello03.usersys.redhat.com pid:11767)] PERMANENTLY removing Changeset#promote_content because of 1 consecutive failures.
Pulp::Task: Request Timeout  (GET /pulp/api/tasks/?state=archived&state=current&id=1cd9785c-5d5b-11e1-959c-5254002b8762)

Comment 3 Justin Sherrill 2012-03-02 20:45:43 UTC
Trying something here:

17578c00fc216f8c84dc0c2ce1f4461fa468876b

Now if we get a timeout during a status check in promotion, we wait an extra 50 seconds to give the pulp server some breathing room.

If after 10 consecutive timeouts we still fail.  Most likely if pulp is throwing timeouts after 10 minutes, something bad is going on.

Let me know if this improves anything.

Comment 5 Mike McCune 2012-03-07 23:43:21 UTC
mass move ON_QA after brewing

Comment 6 Og Maciel 2012-03-09 15:40:18 UTC
Verified:
* candlepin-0.5.24-1.el6.noarch
* candlepin-tomcat6-0.5.24-1.el6.noarch
* katello-0.1.303-1.el6.noarch
* katello-all-0.1.303-1.el6.noarch
* katello-candlepin-cert-key-pair-1.0-1.noarch
* katello-certs-tools-1.0.4-1.el6.noarch
* katello-cli-0.1.102-1.el6.noarch
* katello-cli-common-0.1.102-1.el6.noarch
* katello-common-0.1.303-1.el6.noarch
* katello-configure-0.1.104-1.el6.noarch
* katello-glue-candlepin-0.1.303-1.el6.noarch
* katello-glue-foreman-0.1.303-1.el6.noarch
* katello-glue-pulp-0.1.303-1.el6.noarch
* katello-qpid-broker-key-pair-1.0-1.noarch
* katello-qpid-client-key-pair-1.0-1.noarch
* katello-selinux-0.1.8-1.el6.noarch
* pulp-1.0.0-4.el6.noarch
* pulp-common-1.0.0-4.el6.noarch
* pulp-selinux-server-1.0.0-4.el6.noarch