Description of problem: A GET call to: https://127.0.0.1/pulp/api/repositories/ sometimes takes more than 30 seconds to return This issue is seen if we are syncing 2+ repos at the same time The issue is seen more on single CPU VM's The performance hit happens when we flesh out the returned repositories object...when we are syncing repositories For example when syncing: query to find ~1000 repositories takes under 500ms, but to convert the cursor to a fleshed out lists takes 20s+ at times. When not syncing same query returns same data in .5 seconds, query itself takes .05ms and flesh out takes .5s See for more information https://fedorahosted.org/pulp/wiki/PerformanceTesting/GetRepositories Version-Release number of selected component (if applicable): pulp 0.261 How reproducible: If you are syncing 4 repos and you have 1000+ repos configured you will be able to see these symptoms, but not every request will take a long period of time. Loop a series of GET's for a few minutes and you will see times range from possibly ~4s to 40s.
We noticed an inefficiency after retrieving the repositories data from mongo. For each repository we were doing a separate mongo query to look up data to build a uri_ref for CDS info. The below commit removes that: http://git.fedorahosted.org/git/?p=pulp.git;a=commitdiff;h=a0a7723809fc9857a1d9511ceca5b60846ae3ed4
Our understanding is that the observed behavior is seen from high CPU load. Syncing is taxing the CPU usage heavily. With 15 threads configured and syncing 4 repos we were seeing 1-min load avgs of ~27.0 We brought the sync threads down to 5, syncing 4 repos concurrently seeing 1-min load avgs around 15.0 - 18.0. With this setup looping GET /api/repositories/ is taking 6s - 20s, most of the GET's are taking about 10s. Example of what we are seeing with top: top - 08:45:39 up 29 days, 23:46, 5 users, load average: 16.56, 12.59, 5.90 Tasks: 127 total, 5 running, 122 sleeping, 0 stopped, 0 zombie Cpu(s): 23.9%us, 35.2%sy, 0.0%ni, 0.9%id, 25.7%wa, 0.0%hi, 0.0%si, 14.4%st Mem: 7646264k total, 7601620k used, 44644k free, 599944k buffers Swap: 0k total, 0k used, 0k free, 6051136k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29255 apache 20 0 1775m 257m 7104 S 56.0 3.4 4:23.72 httpd 29491 apache 20 0 215m 18m 5976 D 6.3 0.2 0:05.71 python 29485 apache 20 0 215m 18m 5972 D 5.3 0.2 0:05.88 python 29493 apache 20 0 215m 18m 5972 R 5.3 0.2 0:05.81 python 29484 apache 20 0 215m 18m 5972 S 4.6 0.2 0:05.49 python 29488 apache 20 0 215m 18m 5976 D 4.3 0.2 0:05.85 python 29446 root 20 0 315m 22m 6888 S 2.6 0.3 0:08.81 pulp-admin 29259 apache 20 0 220m 10m 4696 S 2.0 0.1 0:05.85 httpd 29263 apache 20 0 220m 10m 4696 S 2.0 0.1 0:05.70 httpd 29347 root 20 0 315m 22m 6888 R 2.0 0.3 0:09.61 pulp-admin 29375 root 20 0 315m 22m 6888 S 2.0 0.3 0:09.45 pulp-admin 29415 root 20 0 315m 22m 6888 S 2.0 0.3 0:08.87 pulp-admin 29440 apache 20 0 215m 18m 5976 S 2.0 0.2 0:05.78 python 29442 apache 20 0 215m 18m 5972 S 2.0 0.2 0:05.85 python 37 root 20 0 0 0 0 S 1.7 0.0 5:07.36 kswapd0 29262 apache 20 0 220m 10m 4696 S 1.7 0.1 0:05.75 httpd 29448 apache 20 0 215m 18m 5976 D 1.7 0.2 0:05.96 python 29463 apache 20 0 220m 10m 4692 R 1.7 0.1 0:05.48 httpd 29257 apache 20 0 220m 10m 4696 S 1.3 0.1 0:05.75 httpd 29395 apache 20 0 215m 18m 5976 S 1.3 0.2 0:09.55 python 29441 apache 20 0 215m 18m 5972 S 1.3 0.2 0:05.77 python 29449 apache 20 0 215m 18m 5972 S 1.3 0.2 0:05.98 python 29256 apache 20 0 220m 10m 4696 S 1.0 0.1 0:05.71 httpd
Summary of Changes: 1) Reduced number of mongo queries executed. Prior to below change, we would execute a separate mongo query per repository to determine part of the URI for CDS access. http://git.fedorahosted.org/git/?p=pulp.git;a=commitdiff;h=a0a7723809fc9857a1d9511ceca5b60846ae3ed4 2) Reduced number of worker threads for a sync and increased the weight of sync tasks to reduce the number allowed to run concurrently. This change only allows 2 repo syncs with 5 threads each to run concurrently. http://git.fedorahosted.org/git/?p=pulp.git;a=commitdiff;h=3c82b24230f16ccc45757d0325275d290b18de29
build: 0.266
<jmatthews> ideal way to test this is to load up katello and import one of the manifests so you have a few hundred repos loaded into pulp <jmatthews> then sync 4 repos (only 2 should be syncing and 2 waiting) <jmatthews> while those sync, loop a bash script calling the api/repositories GET and time it
verified [root@pulp-v1-1-server ~]# rpm -q pulp pulp-1.1.4-1.fc16.noarch [root@pulp-v1-1-server ~]# [root@pulp-v1-1-server ~]# time curl -k -u admin:admin https://localhost/pulp/api/repositories/f16/ {"package_count": 3107, "distributionid": ["ks-Fedora-Fedora-16-x86_64"], "consumer_cert": null, "consumer_ca": null, "filters": [], "id": "f16", "preserve_metadata": false, "publish": true, "source": {"url": "http://download.devel.redhat.com/released/F-16/GOLD/Fedora/x86_64/os/", "type": "remote"}, "groupid": [], "files": "/pulp/api/repositories/f16/files/", "relative_path": "released/F-16/GOLD/Fedora/x86_64/os", "sync_schedule": "2012-02-07T08:40:00-05:00/PT5M", "arch": "noarch", "feed_cert": null, "name": "f16", "feed_ca": null, "notes": {}, "last_sync": "2012-04-25T09:11:27-04:00", "content_types": "yum", "clone_ids": [], "checksum_type": "sha256", "sync_options": {"skip": {}}, "_id": "f16", "packages": "/pulp/api/repositories/f16/packages/", "packagegroups": "/pulp/api/repositories/f16/packagegroups/", "packagegroupcategories": "/pulp/api/repositories/f16/packagegroupcategories/", "errata": "/pulp/api/repositories/f16/errata/", "distribution": "/pulp/api/repositories/f16/distribution/", "keys": "/pulp/api/repositories/f16/keys/", "comps": "/pulp/api/repositories/f16/comps/", "uri_ref": "/pulp/api/repositories/f16/", "uri": "https://localhost/pulp/repos/released/F-16/GOLD/Fedora/x86_64/os/", "files_count": 33, "next_scheduled_time": null, "next_scheduled_sync": "2012-04-26T15:20:00Z"} real 0m0.452s user 0m0.015s sys 0m0.026s
Pulp v1.1 Release