Bug 773088

Summary: During content sync, ui unresponsive and (Pulp::Repository: Request Timeout (GET /pulp/api/repositories/)
Product: Red Hat Satellite Reporter: James Laska <jlaska>
Component: Content ManagementAssignee: Mike McCune <mmccune>
Status: CLOSED CURRENTRELEASE QA Contact: Katello QA List <katello-qa-list>
Severity: urgent Docs Contact:
Priority: high    
Version: 6.0.0CC: cpelland, cstpierr, cwelton, jmatthew, jturner, lzap, mmccune, scollier, tsanders
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-22 18:18:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 784659    
Bug Blocks: 747354    
Attachments:
Description Flags
pulp-logs.tgz none

Description James Laska 2012-01-10 22:01:26 UTC
Created attachment 551953 [details]
pulp-logs.tgz

Description of problem:

While syncing "Red Hat Enterprise Linux 6 Server RPMs 61 x86_64", along with 3 other smaller repositories, I get periodic pulp::repository: request Timeout errors when attempting to access the katello web ui.

Perhaps anecdotal, but the content sync seems to be taking much longer than I'm accustomed to (on this same hardware).

Version-Release number of selected component (if applicable):
 * candlepin-0.5.8-1.el6.noarch
 * pulp-0.0.256-1.el6.noarch
 * katello-0.1.171-2.el6.noarch


Steps to Reproduce:
1. Import a Red Hat Manifest (manifests_stageSamTest20Nov2011.zip used)
2. Enable RHEL6.1 Server x86_64
3. Add a custom provider and product with several repositories (CloudEngine brew and puddle, and VMWare tools)
4. Start a content sync from the katello web interface
  
Actual results:

The sync appears to be taking much longer than I'm accustomed to on the same hardware+setup.  Additionally, the web interface seems fairly unresponsive and sluggish.  Often, requests are met with a pulp error:

  Pulp::Repository: Request Timeout (GET /pulp/api/repositories/)

Expected results:

 * A responsive web interface even during content sync

Additional info:

 * Attaching requested log files, which includes ...
drwxr-xr-x apache/apache     0 2012-01-10 08:53 var/log/pulp/
-rw-r--r-- root/root      2859 2012-01-10 08:53 var/log/pulp/db.log
-rw-r--r-- apache/apache     0 2012-01-10 08:53 var/log/pulp/events.log
-rw-r--r-- apache/apache 1173517 2012-01-10 10:48 var/log/pulp/pulp.log
-rw-r--r-- apache/apache 2486624 2012-01-10 11:52 var/log/pulp/grinder.log
drwxr-xr-x katello/katello     0 2012-01-10 08:54 var/log/katello/
-rw-r--r-- root/root         664 2012-01-10 08:54 var/log/katello/thin-log.5004.log
-rw-r--r-- root/root         664 2012-01-10 08:54 var/log/katello/thin-log.5002.log
-rw-r--r-- katello/root        0 2012-01-10 08:53 var/log/katello/production_delayed_jobs.log
-rw-r--r-- katello/katello   173 2012-01-10 08:54 var/log/katello/delayed_job.log
-rw-r--r-- root/root         664 2012-01-10 08:54 var/log/katello/thin-log.5000.log
-rw-r--r-- root/root         664 2012-01-10 08:54 var/log/katello/thin-log.5001.log
-rw-r--r-- katello/katello    44 2012-01-10 08:54 var/log/katello/jobs-startup.log
-rw-r--r-- root/root         664 2012-01-10 08:54 var/log/katello/thin-log.5003.log
-rw-r--r-- katello/root   348588 2012-01-10 11:52 var/log/katello/production.log
lrwxrwxrwx katello/katello     0 2012-01-10 08:51 var/log/katello/katello-configure -> katello-configure-20120110-085148
drwxr-xr-x katello/katello     0 2012-01-10 08:53 var/log/katello/katello-configure-20120110-085148/
-rw-r--r-- katello/root      204 2012-01-10 08:54 var/log/katello/katello-configure-20120110-085148/db_seed.log
-rw-r--r-- root/root          12 2012-01-10 08:52 var/log/katello/katello-configure-20120110-085148/create-postgresql-katello-user.log
-rw-r--r-- root/root          12 2012-01-10 08:52 var/log/katello/katello-configure-20120110-085148/create-postgresql-candlepin-user.log
-rw-r--r-- root/root          16 2012-01-10 08:52 var/log/katello/katello-configure-20120110-085148/create-postgresql-candlepin-database.log
-rw-r--r-- katello/root    29321 2012-01-10 08:53 var/log/katello/katello-configure-20120110-085148/db_migrate.log
-rw-r--r-- katello/katello     0 2012-01-10 08:51 var/log/katello/katello-configure-20120110-085148/katello-configure.conf
-rw-r--r-- katello/katello 95725 2012-01-10 08:54 var/log/katello/katello-configure-20120110-085148/main.log
-rw-r--r-- katello/root     2143 2012-01-10 08:53 var/log/katello/katello-configure-20120110-085148/db_env.log
-rw-r--r-- root/root         276 2012-01-10 08:53 var/log/katello/katello-configure-20120110-085148/cpsetup.log
-rw-r--r-- root/root          16 2012-01-10 08:52 var/log/katello/katello-configure-20120110-085148/create-postgresql-katello-database.log

Comment 1 Mike McCune 2012-01-11 02:00:59 UTC
fyi, this only appears to happen on single CPU instances.  I can't get it to reproduce on 2+ CPU guests or physical machines with more than one core.

Comment 2 Mike McCune 2012-01-11 02:15:46 UTC
I can reproduce this fairly easily now that I've narrowed it down to requiring a single CPU to error out.

Comment 4 John Matthews 2012-01-11 16:16:16 UTC
Next time this is seen please grab /var/log/httpd and attach the log files there along with what's under /var/log/pulp

Comment 5 John Matthews 2012-01-11 19:38:48 UTC
I'm able to duplicate a similar environment on a 2 CPU EC2 Large instance.
Mike told me that Katello is configured to display a timeout if the request takes more than 30 seconds.

Setup is Pulp with 500 repos created.
Synced 4 repos consisting of RHEL 5&6 both i386 and x86_64
Then created 500 dummy repos that are empty.

Next used a script to hit https://127.0.0.1/pulp/api/repositories/ in a loop.

When only 4 repos this took about .4 seconds.
With ~500 repos it took about 1.2 seconds.

Re-ran this fetch of 'https://127.0.0.1/pulp/api/repositories/' in a loop while syncing 1 repo.
Saw timings between 1.8 seconds on the low side to 4 seconds on the high side
Once I went to 2-4 repos syncing I saw timings around 6 seconds to 20 seconds for that one fetch.

When syncing I tested with clients running a foreground sync.

Final test was
1 thread looping on:  https://127.0.0.1/pulp/api/repositories/ 
4 pulp-admin clients polling repo sync status every .25 seconds.

heres an example of what I saw from the "GET" handler in our repository.py REST api.  I added a start/end timer for the method and output the seconds in the log.

2012-01-11 14:29:37,485 13921:140541117327104: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 11.8994979858 seconds
2012-01-11 14:29:49,109 13921:140541287184128: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 10.1845879555 seconds
2012-01-11 14:30:03,038 13921:140541297673984: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 12.9555640221 seconds
2012-01-11 14:30:16,325 13921:140541297673984: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 11.7111308575 seconds
2012-01-11 14:30:33,161 13921:140541085857536: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 15.4414169788 seconds
2012-01-11 14:30:56,569 13921:140541117327104: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 21.3497471809 seconds
2012-01-11 14:31:29,294 13921:140541096347392: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 30.4423940182 seconds
2012-01-11 14:31:52,192 13921:140541096347392: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 19.2887630463 seconds
2012-01-11 14:32:13,752 13921:140541297673984: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 20.3157689571 seconds
2012-01-11 14:32:35,615 1392

Comment 6 Mike McCune 2012-01-25 17:38:36 UTC
Added a short term patch while we work on this bug to bump the REST client timeout between Katello -> pulp/cp to 120 vs 30.

commit 7bdef9d52a938dbcc77510599752ae1a55096e5e
Author: Mike McCune <mmccune>
Date:   Wed Jan 25 09:37:13 2012 -0800

    773088 - short term bump of the REST client timeout to 120
    
    This is a short term fix while we work on the real solution
    to 773088 and the timeout issues we are seeing.  By no means is this the
    right or long term thing to do.
    
    Will move back to 30 after we have fixed the above bug

Comment 7 Mike McCune 2012-01-26 22:21:47 UTC
*** Bug 758502 has been marked as a duplicate of this bug. ***

Comment 8 John Matthews 2012-02-16 14:18:49 UTC
Please make the below change to the Katello installer so that pulp.conf has:
 task_weight: 2
 threads: 5


http://git.fedorahosted.org/git/?p=pulp.git;a=commitdiff;h=3c82b24230f16ccc45757d0325275d290b18de29

index d40f463..6c75b0a 100644 (file)
--- a/etc/pulp/pulp.conf
+++ b/etc/pulp/pulp.conf
@@ -79,7 +79,8 @@ lifetime: 180
 config: /etc/pulp/logging/basic.cfg
 
 [yum]
-threads: 15
+task_weight: 2
+threads: 5
 # True/False to flag if we should remove older rpms
 remove_old_versions: false
 # Integer to specify how many old packages to keep.

Comment 10 Mike McCune 2012-02-22 19:03:10 UTC
we made the appropriate sync config changes.  moving ON_QA since this was part of the last puddle on Feb 15

Comment 11 Corey Welton 2012-02-28 20:31:30 UTC
QA Verified - at very least, the changes have been made to the installer, and I have not (yet) seen timeouts. We can reopen if it shows up again.

Comment 14 Lukas Zapletal 2012-04-06 12:31:31 UTC
*** Bug 809234 has been marked as a duplicate of this bug. ***

Comment 16 Lukas Zapletal 2012-04-10 08:16:58 UTC
Mike says, "To be clear the original bug saw timeouts during rendering of the
sync_mgmt page but the new bug you closed as a dupe is a timeout during the POST call to clone"

So its the same error message, but different bug. It's not a clone probably, clearing the link.