Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1224497 - mongo cursor times out during regenerate_applicability_for_repos
Summary: mongo cursor times out during regenerate_applicability_for_repos
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.0.8
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: Tazim Kolhar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-23 19:10 UTC by Chris Roberts
Modified: 2021-08-30 11:48 UTC (History)
24 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cursor timeouts in mongo caused content view features in Satellite to fail. The timeout has been modified to not impact the features.
Clone Of:
Environment:
Last Closed: 2017-01-03 15:37:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Pulp Redmine 998 0 High CLOSED - CURRENTRELEASE mongo cursor times out during regenerate_applicability_for_repos Never
Red Hat Knowledge Base (Solution) 2094821 0 None None None 2017-02-13 14:41:39 UTC
Red Hat Product Errata RHBA-2016:0052 0 normal SHIPPED_LIVE Satellite 6.1.6 bug fix update 2016-01-21 12:40:53 UTC

Description Chris Roberts 2015-05-23 19:10:04 UTC
Description of problem:

The regenerate_applicability_for_repos task throws OperationFailure("cursor id '179967754771123979' not valid at server",) when multiple repos and thousands of consumers are involved (3000). This is a symptom of the cursor timing out during the operation.

Version-Release number of selected component (if applicable):

Installed Packages

candlepin-0.9.23.1-1.el6.noarch
candlepin-common-1.0.1-1.el6_5.noarch
candlepin-scl-1-5.el6_4.noarch
candlepin-scl-quartz-2.1.5-5.el6_4.noarch
candlepin-scl-rhino-1.7R3-1.el6_4.noarch
candlepin-scl-runtime-1-5.el6_4.noarch
candlepin-selinux-0.9.23.1-1.el6.noarch
candlepin-tomcat6-0.9.23.1-1.el6.noarch
elasticsearch-0.90.10-6.el6sat.noarch
katello-1.5.0-30.el6sat.noarch
katello-certs-tools-1.5.6-1.el6sat.noarch
katello-default-ca-1.0-1.noarch
katello-installer-0.0.67-1.el6sat.noarch
katello-server-ca-1.0-1.noarch
katello.croberts.org-apache-1.0-1.noarch
katello.croberts.org-foreman-client-1.0-1.noarch
katello.croberts.org-foreman-proxy-1.0-1.noarch
katello.croberts.org-parent-cert-1.0-1.noarch
katello.croberts.org-puppet-client-1.0-1.noarch
katello.croberts.org-qpid-broker-1.0-1.noarch
katello.croberts.org-qpid-client-cert-1.0-1.noarch
mod_wsgi-3.4-1.pulp.el6sat.x86_64
pulp-katello-0.3-4.el6sat.noarch
pulp-nodes-common-2.4.4-1.el6sat.noarch
pulp-nodes-parent-2.4.4-1.el6sat.noarch
pulp-puppet-plugins-2.4.4-1.el6sat.noarch
pulp-puppet-tools-2.4.4-1.el6sat.noarch
pulp-rpm-plugins-2.4.4-1.1.el6sat.noarch
pulp-selinux-2.4.4-1.el6sat.noarch
pulp-server-2.4.4-1.el6sat.noarch
python-gofer-qpid-1.3.0-1.el6sat.noarch
python-isodate-0.5.0-1.pulp.el6sat.noarch
python-kombu-3.0.15-12.pulp.el6sat.noarch
python-pulp-bindings-2.4.4-1.el6sat.noarch
python-pulp-common-2.4.4-1.el6sat.noarch
python-pulp-puppet-common-2.4.4-1.el6sat.noarch
python-pulp-rpm-common-2.4.4-1.1.el6sat.noarch
python-qpid-0.22-14.el6sat.noarch
python-qpid-qmf-0.22-37.el6.x86_64
qpid-cpp-client-0.22-42.el6.x86_64
qpid-cpp-server-0.22-42.el6.x86_64
qpid-cpp-server-linearstore-0.22-42.el6.x86_64
qpid-java-client-0.22-6.el6.noarch
qpid-java-common-0.22-6.el6.noarch
qpid-proton-c-0.7-1.el6.x86_64
qpid-qmf-0.22-37.el6.x86_64
qpid-tools-0.22-12.el6.noarch
ruby193-rubygem-katello-1.5.0-98.el6sat.noarch
rubygem-hammer_cli_katello-0.0.4-14.el6sat.noarch
rubygem-smart_proxy_pulp-1.0.1-1.1.el6sat.noarch

How reproducible:


Steps to Reproduce:

Try to publish a content view/promote one we get a timeout with mongo on a satellite that has 3K+ pulp consumers on it.

Actual results:

The regenerate_applicability_for_repos task throws OperationFailure("cursor id '179967754771123979' not valid at server",) when multiple repos and thousands of consumers are involved (3000). This is a symptom of the cursor timing out during the operation

Expected results:
content view publish/promote to work correctly

Comment 1 RHEL Program Management 2015-05-23 19:22:24 UTC
Since this issue was entered in Red Hat Bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

Comment 5 pulp-infra@redhat.com 2015-05-26 12:57:44 UTC
The Pulp upstream bug status is at NEW. Updating the external tracker on this bug.

Comment 6 pulp-infra@redhat.com 2015-05-26 12:57:44 UTC
The Pulp upstream bug priority is at High. Updating the external tracker on this bug.

Comment 8 pulp-infra@redhat.com 2015-05-26 17:00:21 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.

Comment 9 pulp-infra@redhat.com 2015-05-28 02:30:18 UTC
The Pulp upstream bug status is at POST. Updating the external tracker on this bug.

Comment 10 pulp-infra@redhat.com 2015-05-28 15:00:18 UTC
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

Comment 13 Bryan Kearney 2015-05-31 13:14:39 UTC
commit 333f0ba8d401e42aefa24d6d76c0aba5435d842c
Author: Dennis Kliban <dkliban>
Date:   Wed May 27 21:52:44 2015 -0400

    Adds batch size to cursor used to iterate repo profile applicabilities
    
    According to mongo documentation [0] the cursor will initially return about 101
    documents or slightly more than 1 megabyte of data. The subsequent fetches return
    4 times as much data. The default timeout of mongo cursor (version 2.4) is 600
    seconds. This timeout cannot be adjusted until mongodb 2.6. If 400 applicability
    calculations need to be performed in 600 seconds, each calculation cannot take
    any longer than 1.5 seconds. In my testing I found the calculations to take 12 to
    13 seconds. Limiting the batch size to 25 ensures that calculations can take up
    to 24 seconds each before the cursor times out.
    
    [0] http://docs.mongodb.org/v2.4/core/cursors/#cursor-batches
    
    https://pulp.plan.io/issues/998
    fixes #998
    
    (cherry picked from commit f8644708e1ed15dc2d4b04f4edd77eb7bc873963)

Comment 15 Mike McCune 2015-06-02 02:36:31 UTC
The prior hotfix did not include previous hotfixes for Bug 1171283. Rebuilt the packages to include fixes for 1171278 and this bug with an increase of the version and should install on prior hotfixed systems.

**** Updated HOTFIX Instructions ****

1) Download the following tar.gz file to your Satellite Server:

curl --output /var/tmp/1224497-hotfix-2.tar http://people.redhat.com/~mmccune/hotfix/1224497/1224497-hotfix-2.tar

2) Extract files

cd /var/tmp
tar xvf 1224497-hotfix-2.tar

3) Create yum repo file containing:

/etc/yum.repos.d/hotfix.repo

[1224497-hotfix]
name=1224497-hotfix
baseurl=file:///var/tmp/el6
enabled=1
gpgcheck=0

update baseurl to match the version of Enterprise Linux you are using (6 or 7)

4) yum update pulp-server

this should install updated pulp packages:

..
Resolving Dependencies
--> Running transaction check
---> Package pulp-server.noarch 0:2.4.4-1.el6sat will be updated
--> Processing Dependency: pulp-server = 2.4.4 for package: pulp-nodes-common-2.4.4-1.el6sat.noarch
--> Processing Dependency: pulp-server = 2.4.4 for package: pulp-nodes-parent-2.4.4-1.el6sat.noarch
--> Processing Dependency: pulp-server = 2.4.4 for package: pulp-rpm-plugins-2.4.4-1.1.el6sat.noarch
--> Processing Dependency: pulp-server = 2.4.4 for package: pulp-puppet-plugins-2.4.4-1.el6sat.noarch
---> Package pulp-server.noarch 0:2.4.5-2.2.el6sat will be an update
--> Processing Dependency: python-pulp-common = 2.4.5 for package: pulp-server-2.4.5-2.2.el6sat.noarch
_-> Running transaction check
..

5) katello-service restart

Comment 16 Tazim Kolhar 2015-06-04 12:17:26 UTC
hi

please provide verification steps

thanks

Comment 18 Mike McCune 2015-06-08 13:51:02 UTC
This updated set of packages further reduces the batch size on a transaction down to 5 to prevent timeouts.

**** Updated HOTFIX 3 Instructions ****

1) Download the following tar.gz file to your Satellite Server:

curl --output /var/tmp/1224497-hotfix-3.tar http://people.redhat.com/~mmccune/hotfix/1224497/1224497-hotfix-3.tar

2) Extract files

cd /var/tmp
tar xvf 1224497-hotfix-3.tar

3) Create yum repo file containing:

/etc/yum.repos.d/hotfix.repo

[1224497-hotfix]
name=1224497-hotfix
baseurl=file:///var/tmp/el6
enabled=1
gpgcheck=0

update baseurl to match the version of Enterprise Linux you are using (6 or 7)

4) yum update pulp-server

this should install updated pulp packages:

..
Resolving Dependencies
--> Running transaction check
---> Package pulp-server.noarch 0:2.4.4-1.el6sat will be updated
..

5) katello-service restart

Comment 19 Tazim Kolhar 2015-06-10 10:03:11 UTC
VERIFIED:
# rpm -qa | grep foreman
ruby193-rubygem-foreman-tasks-0.6.12.7-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.6-1.el7sat.noarch
foreman-libvirt-1.7.2.26-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch
ibm-x3755-02.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-1.0-1.noarch
foreman-gce-1.7.2.26-1.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
foreman-selinux-1.7.2.13-1.el7sat.noarch
foreman-compute-1.7.2.26-1.el7sat.noarch
foreman-ovirt-1.7.2.26-1.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.14-1.el7sat.noarch
foreman-postgresql-1.7.2.26-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.14-1.el7sat.noarch
ruby193-rubygem-foreman_discovery-2.0.0.15-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.2.0-6.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.4-1.el7sat.noarch
foreman-proxy-1.7.2.5-1.el7sat.noarch
ibm-x3755-02.ovirt.rhts.eng.bos.redhat.com-foreman-client-1.0-1.noarch
ibm-x3755-02.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch
foreman-vmware-1.7.2.26-1.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
foreman-1.7.2.26-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
foreman-debug-1.7.2.26-1.el7sat.noarch

steps:
published several CVs with no problem

Comment 21 pulp-infra@redhat.com 2015-06-17 13:30:19 UTC
The Pulp upstream bug status is at ON_QA. Updating the external tracker on this bug.

Comment 22 Bryan Kearney 2015-08-11 13:23:18 UTC
This bug is slated to be released with Satellite 6.1.

Comment 23 Bryan Kearney 2015-08-12 13:58:10 UTC
This bug was fixed in version 6.1.1 of Satellite which was released on 12 August, 2015.

Comment 24 pulp-infra@redhat.com 2015-09-11 20:00:24 UTC
The Pulp upstream bug status is at VERIFIED. Updating the external tracker on this bug.

Comment 25 pulp-infra@redhat.com 2015-09-11 20:30:27 UTC
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.

Comment 28 Bryan Kearney 2016-01-04 18:22:32 UTC
What additional has occured?

Comment 33 errata-xmlrpc 2016-01-21 07:42:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0052

Comment 34 Pavel Moravec 2016-06-28 10:24:19 UTC
The same error was seen on Sat6.1.8 with this bugfix properly applied.

Are there some guarantees that the batch size 5 will work in _all_ cases? Can't e.g. some huge errata require batch=1 only (something we cant know in advance)?

Comment 35 Dennis Kliban 2016-06-28 13:43:24 UTC
It is possible that under heavy load, a batch size of 5 is not low enough. The batch size determines how frequently Pulp talks to the database. If for some reason it is taking an extremely long time to calculate each content host's applicability, it's possible that after 5 profiles more than 600 seconds has passed.

Comment 39 pulp-infra@redhat.com 2017-01-02 07:33:12 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.


Note You need to log in before you can comment on or make changes to this bug.