Bug 1224497
| Summary: | mongo cursor times out during regenerate_applicability_for_repos | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Chris Roberts <chrobert> |
| Component: | Pulp | Assignee: | satellite6-bugs <satellite6-bugs> |
| Status: | CLOSED ERRATA | QA Contact: | Tazim Kolhar <tkolhar> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 6.0.8 | CC: | aagrawal, aglotov, ahoness, bbuckingham, bkearney, bmbouter, chrobert, cperry, cwelton, daviddavis, dkliban, ggainey, ipanova, mhrivnak, mkalyat, mmccune, mmello, pcreech, pghadge, pmoravec, rchan, tkolhar, ttereshc, xdmoon |
| Target Milestone: | Unspecified | Keywords: | Regression, Reopened, Triaged |
| Target Release: | Unused | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cursor timeouts in mongo caused content view features in Satellite to fail. The timeout has been modified to not impact the features.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-01-03 15:37:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Since this issue was entered in Red Hat Bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release. The Pulp upstream bug status is at NEW. Updating the external tracker on this bug. The Pulp upstream bug priority is at High. Updating the external tracker on this bug. The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug. The Pulp upstream bug status is at POST. Updating the external tracker on this bug. The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug. commit 333f0ba8d401e42aefa24d6d76c0aba5435d842c
Author: Dennis Kliban <dkliban>
Date: Wed May 27 21:52:44 2015 -0400
Adds batch size to cursor used to iterate repo profile applicabilities
According to mongo documentation [0] the cursor will initially return about 101
documents or slightly more than 1 megabyte of data. The subsequent fetches return
4 times as much data. The default timeout of mongo cursor (version 2.4) is 600
seconds. This timeout cannot be adjusted until mongodb 2.6. If 400 applicability
calculations need to be performed in 600 seconds, each calculation cannot take
any longer than 1.5 seconds. In my testing I found the calculations to take 12 to
13 seconds. Limiting the batch size to 25 ensures that calculations can take up
to 24 seconds each before the cursor times out.
[0] http://docs.mongodb.org/v2.4/core/cursors/#cursor-batches
https://pulp.plan.io/issues/998
fixes #998
(cherry picked from commit f8644708e1ed15dc2d4b04f4edd77eb7bc873963)
The prior hotfix did not include previous hotfixes for Bug 1171283. Rebuilt the packages to include fixes for 1171278 and this bug with an increase of the version and should install on prior hotfixed systems. **** Updated HOTFIX Instructions **** 1) Download the following tar.gz file to your Satellite Server: curl --output /var/tmp/1224497-hotfix-2.tar http://people.redhat.com/~mmccune/hotfix/1224497/1224497-hotfix-2.tar 2) Extract files cd /var/tmp tar xvf 1224497-hotfix-2.tar 3) Create yum repo file containing: /etc/yum.repos.d/hotfix.repo [1224497-hotfix] name=1224497-hotfix baseurl=file:///var/tmp/el6 enabled=1 gpgcheck=0 update baseurl to match the version of Enterprise Linux you are using (6 or 7) 4) yum update pulp-server this should install updated pulp packages: .. Resolving Dependencies --> Running transaction check ---> Package pulp-server.noarch 0:2.4.4-1.el6sat will be updated --> Processing Dependency: pulp-server = 2.4.4 for package: pulp-nodes-common-2.4.4-1.el6sat.noarch --> Processing Dependency: pulp-server = 2.4.4 for package: pulp-nodes-parent-2.4.4-1.el6sat.noarch --> Processing Dependency: pulp-server = 2.4.4 for package: pulp-rpm-plugins-2.4.4-1.1.el6sat.noarch --> Processing Dependency: pulp-server = 2.4.4 for package: pulp-puppet-plugins-2.4.4-1.el6sat.noarch ---> Package pulp-server.noarch 0:2.4.5-2.2.el6sat will be an update --> Processing Dependency: python-pulp-common = 2.4.5 for package: pulp-server-2.4.5-2.2.el6sat.noarch _-> Running transaction check .. 5) katello-service restart hi please provide verification steps thanks This updated set of packages further reduces the batch size on a transaction down to 5 to prevent timeouts. **** Updated HOTFIX 3 Instructions **** 1) Download the following tar.gz file to your Satellite Server: curl --output /var/tmp/1224497-hotfix-3.tar http://people.redhat.com/~mmccune/hotfix/1224497/1224497-hotfix-3.tar 2) Extract files cd /var/tmp tar xvf 1224497-hotfix-3.tar 3) Create yum repo file containing: /etc/yum.repos.d/hotfix.repo [1224497-hotfix] name=1224497-hotfix baseurl=file:///var/tmp/el6 enabled=1 gpgcheck=0 update baseurl to match the version of Enterprise Linux you are using (6 or 7) 4) yum update pulp-server this should install updated pulp packages: .. Resolving Dependencies --> Running transaction check ---> Package pulp-server.noarch 0:2.4.4-1.el6sat will be updated .. 5) katello-service restart VERIFIED: # rpm -qa | grep foreman ruby193-rubygem-foreman-tasks-0.6.12.7-1.el7sat.noarch rubygem-hammer_cli_foreman_docker-0.0.3.6-1.el7sat.noarch foreman-libvirt-1.7.2.26-1.el7sat.noarch ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch ibm-x3755-02.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-1.0-1.noarch foreman-gce-1.7.2.26-1.el7sat.noarch rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch foreman-selinux-1.7.2.13-1.el7sat.noarch foreman-compute-1.7.2.26-1.el7sat.noarch foreman-ovirt-1.7.2.26-1.el7sat.noarch rubygem-hammer_cli_foreman-0.1.4.14-1.el7sat.noarch foreman-postgresql-1.7.2.26-1.el7sat.noarch ruby193-rubygem-foreman_docker-1.2.0.14-1.el7sat.noarch ruby193-rubygem-foreman_discovery-2.0.0.15-1.el7sat.noarch ruby193-rubygem-foreman-redhat_access-0.2.0-6.el7sat.noarch rubygem-hammer_cli_foreman_tasks-0.0.3.4-1.el7sat.noarch foreman-proxy-1.7.2.5-1.el7sat.noarch ibm-x3755-02.ovirt.rhts.eng.bos.redhat.com-foreman-client-1.0-1.noarch ibm-x3755-02.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch foreman-vmware-1.7.2.26-1.el7sat.noarch rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch foreman-1.7.2.26-1.el7sat.noarch ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch foreman-debug-1.7.2.26-1.el7sat.noarch steps: published several CVs with no problem The Pulp upstream bug status is at ON_QA. Updating the external tracker on this bug. This bug is slated to be released with Satellite 6.1. This bug was fixed in version 6.1.1 of Satellite which was released on 12 August, 2015. The Pulp upstream bug status is at VERIFIED. Updating the external tracker on this bug. The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug. What additional has occured? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:0052 The same error was seen on Sat6.1.8 with this bugfix properly applied. Are there some guarantees that the batch size 5 will work in _all_ cases? Can't e.g. some huge errata require batch=1 only (something we cant know in advance)? It is possible that under heavy load, a batch size of 5 is not low enough. The batch size determines how frequently Pulp talks to the database. If for some reason it is taking an extremely long time to calculate each content host's applicability, it's possible that after 5 profiles more than 600 seconds has passed. All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST. |
Description of problem: The regenerate_applicability_for_repos task throws OperationFailure("cursor id '179967754771123979' not valid at server",) when multiple repos and thousands of consumers are involved (3000). This is a symptom of the cursor timing out during the operation. Version-Release number of selected component (if applicable): Installed Packages candlepin-0.9.23.1-1.el6.noarch candlepin-common-1.0.1-1.el6_5.noarch candlepin-scl-1-5.el6_4.noarch candlepin-scl-quartz-2.1.5-5.el6_4.noarch candlepin-scl-rhino-1.7R3-1.el6_4.noarch candlepin-scl-runtime-1-5.el6_4.noarch candlepin-selinux-0.9.23.1-1.el6.noarch candlepin-tomcat6-0.9.23.1-1.el6.noarch elasticsearch-0.90.10-6.el6sat.noarch katello-1.5.0-30.el6sat.noarch katello-certs-tools-1.5.6-1.el6sat.noarch katello-default-ca-1.0-1.noarch katello-installer-0.0.67-1.el6sat.noarch katello-server-ca-1.0-1.noarch katello.croberts.org-apache-1.0-1.noarch katello.croberts.org-foreman-client-1.0-1.noarch katello.croberts.org-foreman-proxy-1.0-1.noarch katello.croberts.org-parent-cert-1.0-1.noarch katello.croberts.org-puppet-client-1.0-1.noarch katello.croberts.org-qpid-broker-1.0-1.noarch katello.croberts.org-qpid-client-cert-1.0-1.noarch mod_wsgi-3.4-1.pulp.el6sat.x86_64 pulp-katello-0.3-4.el6sat.noarch pulp-nodes-common-2.4.4-1.el6sat.noarch pulp-nodes-parent-2.4.4-1.el6sat.noarch pulp-puppet-plugins-2.4.4-1.el6sat.noarch pulp-puppet-tools-2.4.4-1.el6sat.noarch pulp-rpm-plugins-2.4.4-1.1.el6sat.noarch pulp-selinux-2.4.4-1.el6sat.noarch pulp-server-2.4.4-1.el6sat.noarch python-gofer-qpid-1.3.0-1.el6sat.noarch python-isodate-0.5.0-1.pulp.el6sat.noarch python-kombu-3.0.15-12.pulp.el6sat.noarch python-pulp-bindings-2.4.4-1.el6sat.noarch python-pulp-common-2.4.4-1.el6sat.noarch python-pulp-puppet-common-2.4.4-1.el6sat.noarch python-pulp-rpm-common-2.4.4-1.1.el6sat.noarch python-qpid-0.22-14.el6sat.noarch python-qpid-qmf-0.22-37.el6.x86_64 qpid-cpp-client-0.22-42.el6.x86_64 qpid-cpp-server-0.22-42.el6.x86_64 qpid-cpp-server-linearstore-0.22-42.el6.x86_64 qpid-java-client-0.22-6.el6.noarch qpid-java-common-0.22-6.el6.noarch qpid-proton-c-0.7-1.el6.x86_64 qpid-qmf-0.22-37.el6.x86_64 qpid-tools-0.22-12.el6.noarch ruby193-rubygem-katello-1.5.0-98.el6sat.noarch rubygem-hammer_cli_katello-0.0.4-14.el6sat.noarch rubygem-smart_proxy_pulp-1.0.1-1.1.el6sat.noarch How reproducible: Steps to Reproduce: Try to publish a content view/promote one we get a timeout with mongo on a satellite that has 3K+ pulp consumers on it. Actual results: The regenerate_applicability_for_repos task throws OperationFailure("cursor id '179967754771123979' not valid at server",) when multiple repos and thousands of consumers are involved (3000). This is a symptom of the cursor timing out during the operation Expected results: content view publish/promote to work correctly