960278 – Request to search units times while memory consumption is high

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 960278 - Request to search units times while memory consumption is high

Summary: Request to search units times while memory consumption is high

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Content Management
Sub Component:
Version:	Nightly
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	Unspecified
Assignee:	Justin Sherrill
QA Contact:	Og Maciel
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	950743 971468
TreeView+	depends on / blocked

Reported:	2013-05-06 20:59 UTC by Justin Sherrill
Modified:	2019-09-26 15:49 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	971468 (view as bug list)
Environment:
Last Closed:	2013-07-18 21:19:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Justin Sherrill 2013-05-06 20:59:34 UTC

Description of problem:

In Katello syncing a repo involves a few steps:

1) sync the repo in pulp
2) When katello receives the post-sync notification index all packages
 a) fetch all unit ids of packages for a repo
 b) in batches of 200 (by default) request the metdata for each set of ids

When syncing more than 1 large repo (typically four repos), the first one that finishes will try to index its content and typically time out.  Our rest client timeout is set for 2 minutes.



Version-Release number of selected component (if applicable):
2.1.1-0.9

How reproducible:
Most of the time.


Steps to Reproduce:
1.  Sync 4 large repos at once (RHEL 6.2, 6.3, 6.4, 6Server)
2.  When one of them is done, request 200 packages (you may have to do this multiple times before one times out (not sure).

  
Actual results:
Request Times out


Expected results:
Request completes

Comment 1 Justin Sherrill 2013-05-09 02:32:46 UTC

I'm not sure this totally related to syncing.  

If i :

a) sync  RHEL 6.1, 6.2, 6.3, 6.4, 6server  
b) copy them all to new repos  
c) copy them all to new repos again
d) within the new repos request packages 200 at a time

Memory usage seems to be around 40% (of 12 gigs) and as part of d), one of the requests will timeout.  

It seems if i restart apache and re-do d) it will all complete.

Comment 3 Mike McCune 2013-06-06 15:12:38 UTC

The effect on Katello is when we refresh (re-publish) our Content Views we see a huge spike in memory consumption with httpd.  On an instance that I'm testing I see httpd go up to 4.8G after a republish.

Comment 4 Brad Buckingham 2013-06-11 14:23:24 UTC

This was addressed in the following katello PR:

https://github.com/Katello/katello/pull/2450

Comment 6 Brad Buckingham 2013-06-11 19:46:10 UTC

Mass move to ON_QA

Comment 7 Justin Sherrill 2013-06-12 13:29:16 UTC

I think we probably want to Cherry pick https://github.com/Katello/katello/pull/2479 as well.  This will actually assist in the timeout issue.

Comment 9 Brad Buckingham 2013-06-15 00:08:36 UTC

Mass move to ON_QA

Comment 10 Garik Khachikyan 2013-06-18 14:06:58 UTC

taking QA contact

Comment 11 Garik Khachikyan 2013-06-19 13:14:44 UTC

Bad news:

the RestTimeout happens (again) on my scenario doing in parallel:

- sync of RHEL5Server content for org1
- publishing RHEL 6.1; 6.2; 6.3; 6.4; 6Server 64bit content

It just stayed pending for a fair amount of time and got 
---
Failed to publish content view 'cvRHEL6' from definition 'cvd-1'.
Request Timeout (RestClient::RequestTimeout)
---

Server itself is strong enough having: 12Gb RAM + 16 CPUs there.

From Justin I got:
---
i guess pulp was just too overloaded
we've done some stuff to fix this 
but there's not a lot more katello can do
there's really nothing more we can do before mdp1 
well the only thing we could do 
is increase the timeout 
or disable it
---

So I think we better move this issue to MDP<next> ? (and erasing 6.0.1 flag)

I reopen this issue for making it available on DEV radars for next drop. thanks

Comment 12 Og Maciel 2013-06-19 18:49:30 UTC

One thing I noticed was that, once Foreman timed out, I was no longer able to login to Foreman via the web ui. The workaround was to restart foreman service.

Comment 13 Justin Sherrill 2013-06-19 20:09:46 UTC

We may have a fix for this for mdp1, moving back to 6.0.1

Comment 14 Justin Sherrill 2013-06-19 20:12:57 UTC

https://github.com/Katello/katello/pull/2531

Comment 15 Og Maciel 2013-06-19 21:07:31 UTC

After promoting a RHEL 5 x86_64 content view I could no longer log/switch to Foreman UI:

top - 17:06:05 up 1 day,  7:48,  2 users,  load average: 1.00, 0.97, 0.80
Tasks: 218 total,   2 running, 216 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.2%us,  0.2%sy,  0.0%ni, 74.0%id,  0.6%wa,  0.0%hi,  0.0%si,  0.0%st

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
25032 foreman   20   0  519m 154m 7692 R 99.6  1.0  10:38.97 thin
10720 elastics  20   0 2704m 410m  10m S  0.6  2.6  11:56.95 java
    1 root      20   0 19352 1016  784 S  0.0  0.0   0:02.77 init

Comment 16 Og Maciel 2013-06-19 21:33:11 UTC

FYI, not using the patch mentioned in #14

Comment 18 Og Maciel 2013-06-20 12:00:34 UTC

[root@cloud-qe-10 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         15940      15544        395          0        453       5127
-/+ buffers/cache:       9964       5976
Swap:         8039        139       7900

Comment 19 Og Maciel 2013-06-20 12:47:02 UTC

Possible culprit for Foreman eating all the CPU: https://bugzilla.redhat.com/show_bug.cgi?id=976362

Comment 20 Brad Buckingham 2013-06-21 12:59:28 UTC

Mass move to ON_QA

Comment 21 Og Maciel 2013-06-22 03:38:39 UTC

* Re-sync RHEL 5 i386 5Server while publishing content view with RHEL 5 i386 5Server, RHEL 5 x86_64 5Server, RHEL 6 i386 6Server and RHEL 6 x86_64 6Server

All operations completed

Comment 22 Og Maciel 2013-06-22 03:42:53 UTC

Verified:
* apr-util-ldap-1.3.9-3.el6_0.1.x86_64
* candlepin-0.8.9-1.el6_4.noarch
* candlepin-scl-1-5.el6_4.noarch
* candlepin-scl-quartz-2.1.5-5.el6_4.noarch
* candlepin-scl-rhino-1.7R3-1.el6_4.noarch
* candlepin-scl-runtime-1-5.el6_4.noarch
* candlepin-selinux-0.8.9-1.el6_4.noarch
* candlepin-tomcat6-0.8.9-1.el6_4.noarch
* elasticsearch-0.19.9-8.el6sat.noarch
* foreman-1.1.10014-1.noarch
* foreman-compute-1.1.10014-1.noarch
* foreman-installer-puppet-concat-0-2.d776701.git.0.21ef926.el6sat.noarch
* foreman-installer-puppet-dhcp-0-5.3a4a13c.el6sat.noarch
* foreman-installer-puppet-dns-0-7.fcae203.el6sat.noarch
* foreman-installer-puppet-foreman-0-6.568c5c4.el6sat.noarch
* foreman-installer-puppet-foreman_proxy-0-8.bd1e35d.el6sat.noarch
* foreman-installer-puppet-puppet-0-3.ab46748.el6sat.noarch
* foreman-installer-puppet-tftp-0-5.ea6c5e5.el6sat.noarch
* foreman-installer-puppet-xinetd-0-50a267b8.git.0.44aca6a.el6sat.noarch
* foreman-libvirt-1.1.10014-1.noarch
* foreman-postgresql-1.1.10014-1.noarch
* foreman-proxy-1.1.10003-1.el6sat.noarch
* foreman-proxy-installer-1.0.1-10.f5ae2cd.el6sat.noarch
* katello-1.4.2-17.el6sat.noarch
* katello-all-1.4.2-17.el6sat.noarch
* katello-candlepin-cert-key-pair-1.0-1.noarch
* katello-certs-tools-1.4.2-2.el6sat.noarch
* katello-cli-1.4.2-8.el6sat.noarch
* katello-cli-common-1.4.2-8.el6sat.noarch
* katello-common-1.4.2-17.el6sat.noarch
* katello-configure-1.4.3-16.el6sat.noarch
* katello-configure-foreman-1.4.3-16.el6sat.noarch
* katello-foreman-all-1.4.2-17.el6sat.noarch
* katello-glue-candlepin-1.4.2-17.el6sat.noarch
* katello-glue-elasticsearch-1.4.2-17.el6sat.noarch
* katello-glue-pulp-1.4.2-17.el6sat.noarch
* katello-qpid-broker-key-pair-1.0-1.noarch
* katello-qpid-client-key-pair-1.0-1.noarch
* katello-selinux-1.4.3-3.el6sat.noarch
* openldap-2.4.23-31.el6.x86_64
* pulp-rpm-plugins-2.1.2-1.el6sat.noarch
* pulp-selinux-2.1.2-1.el6sat.noarch
* pulp-server-2.1.2-1.el6sat.noarch
* python-ldap-2.3.10-1.el6.x86_64
* ruby193-rubygem-ldap_fluff-0.2.2-1.el6sat.noarch
* ruby193-rubygem-net-ldap-0.3.1-2.el6sat.noarch
* ruby193-rubygem-runcible-0.4.10-1.el6sat.noarch
* signo-0.0.19-1.el6sat.noarch
* signo-katello-0.0.19-1.el6sat.noarch

Comment 23 Mike McCune 2013-07-18 21:19:45 UTC

mass move to CLOSED:CURRENTRELEASE since MDP1 has been released.

Note You need to log in before you can comment on or make changes to this bug.