525575 – After upgrade to Satellite 5.3.0, Systems and System Groups Overview pages load much slower and/or run into 'service temporary unavailable'

Bug 525575 - After upgrade to Satellite 5.3.0, Systems and System Groups Overview pages load much slower and/or run into 'service temporary unavailable'

Summary: After upgrade to Satellite 5.3.0, Systems and System Groups Overview pages lo...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite 5
Classification:	Red Hat
Component:	Server
Sub Component:
Version:	530
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Justin Sherrill
QA Contact:	Brandon Perkins
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	sat531-blockers 527784 531645
TreeView+	depends on / blocked

Reported:	2009-09-24 21:27 UTC by Xixi
Modified:	2018-11-14 20:29 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	531645 (view as bug list)
Environment:
Last Closed:	2009-10-14 07:20:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2009:1496	0	normal	SHIPPED_LIVE	Red Hat Network Satellite bug fix update	2009-10-14 07:20:53 UTC

Description Xixi 2009-09-24 21:27:45 UTC

Description of problem:
Reported by multiple customers with similar symptoms after upgrading to sat 5.3.0 such as the following:
* When clicking on rhn/systems/Overview.do, after about 2 minutes the webserver reports "system temporary unavailable".
* After the upgrade it takes lot of time (~2-5 mins) to list all available System Groups. Via the WebUI: System -> Systemgroups. Using the API displays all groups immediately. The relevant page is rhn/systems/SystemGroupList.do. The satellite has ~60 systems subscribed into ~30 groups.
Attached you can find an oracle statspack and a fresh satellite-debug.
* Due to a huge SQL request taking more than half of the machine's computing power, attempts to display all groups on the satellite 5.3 produce a timeout from the JSP server

This is a very important issue because the URL is an entry point for many other important links like kickstart, system groups etc.

Version-Release number of selected component (if applicable):
Red Hat Network (RHN) Satellite 5.3.0

System architecture(s):
arch=x86_64
release=5
flavor=server
base=rhel-x86_64-server-5

How reproducible:
Always.

Steps to Reproduce:
1. Have a satellite with ~11000 systems (or in the thousands) registered and some RHEL base channels sync'd (we should have a customer db-control backup dump soon).
2. Click on https://SATELLITEFQDN/rhn/systems/Overview.do and rhn/systems/SystemGroupList.do
3. Wait.

Actual results:
See load on Satellite server go up, either page takes several minutes to load or end in "temporarily unavailable" error due to timeout.

Expected results:
Page loads in reasonable time (in the order of seconds depending on deployment size, at least comparable than earlier versions - <5.3.0 - of satellite), no timeout/temporarily unavailable errors or high loads.

Additional info:
Already being worked with Engineering.

Comment 2 Xixi 2009-09-24 21:57:30 UTC

Per IRC with Justin, engineering is working on optimizing the query (the final version may be different than proposed), in the meantime, can continue to use proposed query change from GPS consultant as a temporary workaround (great work! :)).

jsherrill_laptop        xixi,  so that kinda makes senese, but i'm surprised it made *that* much of a difference
xixi    looks like it's mainly "rhnServerNeededPackageCache" to "rhnServerNeededErrataCache"
xixi    yeah
jsherrill_laptop        i wonder if it would be faster to just use rhnServerNeededCache
jsherrill_laptop        which is the table that is underneath both those views
xixi    so use the table instead of the view
xixi    jsherrill_laptop: sure we can try that
jsherrill_laptop        yeah,  both those views used to be tables
jsherrill_laptop        but i abstracted it out into one table (rhnServerNeededCache)
xixi    jsherrill_laptop: ok
jsherrill_laptop        and then created two views to 'simulate' the old tables
xixi    jsherrill_laptop: so do we want to have the other affected customers try a modified version as you mentioned above?
jsherrill_laptop        couldn't hurt, and once we have an internal reproducer, we can test the two and see if there's much difference
jsherrill_laptop: yeah we definitely want to try out in-house

Comment 6 Carsten Clasohm 2009-09-25 07:37:02 UTC

(In reply to comment #2)
> jsherrill_laptop        i wonder if it would be faster to just use
> rhnServerNeededCache
> jsherrill_laptop        which is the table that is underneath both those views

I tried this, and it didn't make any difference for the execution time.

The reason why the rhnServerNeededPackageCache view is slow is the "max" and "group by". This forces Oracle to take all package rows for a server, and hold them in memory as a temporary result (or in the case here, in temporary tablespace, because the default PGA was no large enough).

Comment 15 Justin Sherrill 2009-09-30 21:22:26 UTC

spacewalk master 5713ce473236d273c2a4c1608ec3f7f144e9b4e8
satellite.git b4ac81bace69fc1fda5f1ace81db5e622b1b44e6

Comment 20 errata-xmlrpc 2009-10-14 07:20:55 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1496.html

Note You need to log in before you can comment on or make changes to this bug.