Created attachment 551953 [details] pulp-logs.tgz Description of problem: While syncing "Red Hat Enterprise Linux 6 Server RPMs 61 x86_64", along with 3 other smaller repositories, I get periodic pulp::repository: request Timeout errors when attempting to access the katello web ui. Perhaps anecdotal, but the content sync seems to be taking much longer than I'm accustomed to (on this same hardware). Version-Release number of selected component (if applicable): * candlepin-0.5.8-1.el6.noarch * pulp-0.0.256-1.el6.noarch * katello-0.1.171-2.el6.noarch Steps to Reproduce: 1. Import a Red Hat Manifest (manifests_stageSamTest20Nov2011.zip used) 2. Enable RHEL6.1 Server x86_64 3. Add a custom provider and product with several repositories (CloudEngine brew and puddle, and VMWare tools) 4. Start a content sync from the katello web interface Actual results: The sync appears to be taking much longer than I'm accustomed to on the same hardware+setup. Additionally, the web interface seems fairly unresponsive and sluggish. Often, requests are met with a pulp error: Pulp::Repository: Request Timeout (GET /pulp/api/repositories/) Expected results: * A responsive web interface even during content sync Additional info: * Attaching requested log files, which includes ... drwxr-xr-x apache/apache 0 2012-01-10 08:53 var/log/pulp/ -rw-r--r-- root/root 2859 2012-01-10 08:53 var/log/pulp/db.log -rw-r--r-- apache/apache 0 2012-01-10 08:53 var/log/pulp/events.log -rw-r--r-- apache/apache 1173517 2012-01-10 10:48 var/log/pulp/pulp.log -rw-r--r-- apache/apache 2486624 2012-01-10 11:52 var/log/pulp/grinder.log drwxr-xr-x katello/katello 0 2012-01-10 08:54 var/log/katello/ -rw-r--r-- root/root 664 2012-01-10 08:54 var/log/katello/thin-log.5004.log -rw-r--r-- root/root 664 2012-01-10 08:54 var/log/katello/thin-log.5002.log -rw-r--r-- katello/root 0 2012-01-10 08:53 var/log/katello/production_delayed_jobs.log -rw-r--r-- katello/katello 173 2012-01-10 08:54 var/log/katello/delayed_job.log -rw-r--r-- root/root 664 2012-01-10 08:54 var/log/katello/thin-log.5000.log -rw-r--r-- root/root 664 2012-01-10 08:54 var/log/katello/thin-log.5001.log -rw-r--r-- katello/katello 44 2012-01-10 08:54 var/log/katello/jobs-startup.log -rw-r--r-- root/root 664 2012-01-10 08:54 var/log/katello/thin-log.5003.log -rw-r--r-- katello/root 348588 2012-01-10 11:52 var/log/katello/production.log lrwxrwxrwx katello/katello 0 2012-01-10 08:51 var/log/katello/katello-configure -> katello-configure-20120110-085148 drwxr-xr-x katello/katello 0 2012-01-10 08:53 var/log/katello/katello-configure-20120110-085148/ -rw-r--r-- katello/root 204 2012-01-10 08:54 var/log/katello/katello-configure-20120110-085148/db_seed.log -rw-r--r-- root/root 12 2012-01-10 08:52 var/log/katello/katello-configure-20120110-085148/create-postgresql-katello-user.log -rw-r--r-- root/root 12 2012-01-10 08:52 var/log/katello/katello-configure-20120110-085148/create-postgresql-candlepin-user.log -rw-r--r-- root/root 16 2012-01-10 08:52 var/log/katello/katello-configure-20120110-085148/create-postgresql-candlepin-database.log -rw-r--r-- katello/root 29321 2012-01-10 08:53 var/log/katello/katello-configure-20120110-085148/db_migrate.log -rw-r--r-- katello/katello 0 2012-01-10 08:51 var/log/katello/katello-configure-20120110-085148/katello-configure.conf -rw-r--r-- katello/katello 95725 2012-01-10 08:54 var/log/katello/katello-configure-20120110-085148/main.log -rw-r--r-- katello/root 2143 2012-01-10 08:53 var/log/katello/katello-configure-20120110-085148/db_env.log -rw-r--r-- root/root 276 2012-01-10 08:53 var/log/katello/katello-configure-20120110-085148/cpsetup.log -rw-r--r-- root/root 16 2012-01-10 08:52 var/log/katello/katello-configure-20120110-085148/create-postgresql-katello-database.log
fyi, this only appears to happen on single CPU instances. I can't get it to reproduce on 2+ CPU guests or physical machines with more than one core.
I can reproduce this fairly easily now that I've narrowed it down to requiring a single CPU to error out.
Next time this is seen please grab /var/log/httpd and attach the log files there along with what's under /var/log/pulp
I'm able to duplicate a similar environment on a 2 CPU EC2 Large instance. Mike told me that Katello is configured to display a timeout if the request takes more than 30 seconds. Setup is Pulp with 500 repos created. Synced 4 repos consisting of RHEL 5&6 both i386 and x86_64 Then created 500 dummy repos that are empty. Next used a script to hit https://127.0.0.1/pulp/api/repositories/ in a loop. When only 4 repos this took about .4 seconds. With ~500 repos it took about 1.2 seconds. Re-ran this fetch of 'https://127.0.0.1/pulp/api/repositories/' in a loop while syncing 1 repo. Saw timings between 1.8 seconds on the low side to 4 seconds on the high side Once I went to 2-4 repos syncing I saw timings around 6 seconds to 20 seconds for that one fetch. When syncing I tested with clients running a foreground sync. Final test was 1 thread looping on: https://127.0.0.1/pulp/api/repositories/ 4 pulp-admin clients polling repo sync status every .25 seconds. heres an example of what I saw from the "GET" handler in our repository.py REST api. I added a start/end timer for the method and output the seconds in the log. 2012-01-11 14:29:37,485 13921:140541117327104: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 11.8994979858 seconds 2012-01-11 14:29:49,109 13921:140541287184128: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 10.1845879555 seconds 2012-01-11 14:30:03,038 13921:140541297673984: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 12.9555640221 seconds 2012-01-11 14:30:16,325 13921:140541297673984: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 11.7111308575 seconds 2012-01-11 14:30:33,161 13921:140541085857536: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 15.4414169788 seconds 2012-01-11 14:30:56,569 13921:140541117327104: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 21.3497471809 seconds 2012-01-11 14:31:29,294 13921:140541096347392: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 30.4423940182 seconds 2012-01-11 14:31:52,192 13921:140541096347392: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 19.2887630463 seconds 2012-01-11 14:32:13,752 13921:140541297673984: pulp.server.webservices.controllers.repositories:INFO: repositories:208 repositories GET took: 20.3157689571 seconds 2012-01-11 14:32:35,615 1392
Added a short term patch while we work on this bug to bump the REST client timeout between Katello -> pulp/cp to 120 vs 30. commit 7bdef9d52a938dbcc77510599752ae1a55096e5e Author: Mike McCune <mmccune> Date: Wed Jan 25 09:37:13 2012 -0800 773088 - short term bump of the REST client timeout to 120 This is a short term fix while we work on the real solution to 773088 and the timeout issues we are seeing. By no means is this the right or long term thing to do. Will move back to 30 after we have fixed the above bug
*** Bug 758502 has been marked as a duplicate of this bug. ***
Please make the below change to the Katello installer so that pulp.conf has: task_weight: 2 threads: 5 http://git.fedorahosted.org/git/?p=pulp.git;a=commitdiff;h=3c82b24230f16ccc45757d0325275d290b18de29 index d40f463..6c75b0a 100644 (file) --- a/etc/pulp/pulp.conf +++ b/etc/pulp/pulp.conf @@ -79,7 +79,8 @@ lifetime: 180 config: /etc/pulp/logging/basic.cfg [yum] -threads: 15 +task_weight: 2 +threads: 5 # True/False to flag if we should remove older rpms remove_old_versions: false # Integer to specify how many old packages to keep.
we made the appropriate sync config changes. moving ON_QA since this was part of the last puddle on Feb 15
QA Verified - at very least, the changes have been made to the installer, and I have not (yet) seen timeouts. We can reopen if it shows up again.
*** Bug 809234 has been marked as a duplicate of this bug. ***
Mike says, "To be clear the original bug saw timeouts during rendering of the sync_mgmt page but the new bug you closed as a dupe is a timeout during the POST call to clone" So its the same error message, but different bug. It's not a clone probably, clearing the link.