Bug 600502 - Very slow performance on Satellite API system.getId call w/ large number (thousands) of systems
Very slow performance on Satellite API system.getId call w/ large number (tho...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Server (Show other bugs)
530
All Linux
medium Severity medium
: ---
: ---
Assigned To: Justin Sherrill
Petr Sklenar
:
Depends On:
Blocks: sat540-blockers sat540-api/sat540-apis
  Show dependency treegraph
 
Reported: 2010-06-04 16:46 EDT by Xixi
Modified: 2010-10-28 10:55 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-28 10:55:18 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Xixi 2010-06-04 16:46:33 EDT
Description of problem:
When using a simple API call system.getId (customer provides hostname and want to retrieve the serverID), it takes very long to display the results - >15 seconds for Satellite with 2580 systems.  On a Satellite with 26921 systems, this call eventially times out with -
Traceback (most recent call last):
  File "./systemGetID.py", line 13, in ?
    info = server.system.getId(session, 'rh64.stabletransit.com')
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1096, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1383, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1137, in request
    headers
xmlrpclib.ProtocolError: <ProtocolError for sat53.usersys.redhat.com/rpc/api: 502 Proxy Error>

Version-Release number of selected component (if applicable):
Red Hat Network (RHN) Satellite 5.3.0

How reproducible:
Always.

Steps to Reproduce:
1. Have a satellite with many system profiles (like 2580 or 26921).
2. Invoke RHN API call server.system.getId with any system name.

Actual results:
Takes a long time to return results and system load increases, and depending on how many system there are on the Satellite it either eventually returns or times out with 502 Proxy error.

Expected results:
API call returns fast (a few seconds seems reasonable), no delay or traceback/502 Proxy error.

Additional info:
Technical analysis in next updates.
Comment 1 Xixi 2010-06-04 16:54:30 EDT
Just took a look at the code for system.getId API call it looks inefficient from java side even before we get into the database query part  - it's essentially getting all systems accessible to the user and then looping through the results & looks for the one(s) that matches the given name.  This is not very scalable -
...
       List<SystemOverview> dr = UserManager.visibleSystemsAsDto(loggedInUser);
       List returnList = new ArrayList();

       for (SystemOverview system : dr) {
           if (system.getName().equals(name)) {
               returnList.add(system);
           }
       }
...
A select id from rhnServer where name = <systemname> or similar would seem much lighter.

In addition, with the way it's implemented now, it uses the system_overview elaborator, which itself could use some optimization as seen in more details from customer (satellite with 2580 systems):
"During the script execution, we can see on the Satellite box that the process oraclerhnsat takes one CPU (100%). I have asked one of our DBA for some help, and after some investigation, it seems that a single SQL query is made several time, with a maximum results of 500 rows and that the index is fully scanned every time. In our case, this query is made 10 times in a session and one another time in an anoter session. As we have currently 2580 hosts, that make sense. Moreover, the sql  query is prefixed by a /*+ RULE */ that seems to indicate that you are bypassing the standard Oracle optimizer mode.
I have attached two files that are the report from Oracle."

The files are attached, and the query referred to above starts with "SELECT /*+ RULE */ SERVER_ID AS ID, OUTDATED_PACKAGES, SERVER_NAME, security_errata, bug_errata, enhancement_errata, SERVER_ADMINS, GROUP_COUNT, NOTE_COUNT, MODIFIED, CHANNEL_LABELS, CHANNEL_ID, HISTORY_COUNT, LAST_CHECKIN_DAYS_AGO, PENDING_UPDATES, OS, RELEASE, SERVER_ARCH_NAME" which corresponds to system_overview elaborator in System_queries.  So that itself sounds like it could be optimized, even if the getSystem API call is re-implemented to use something else.  Please let me know if we want a separate BZ for that. Thanks!
Comment 4 Justin Sherrill 2010-07-19 11:47:32 EDT
fixed: d176769740ee45a6831cb821d83b4ed551a51f14
Comment 7 Clifford Perry 2010-10-28 10:50:28 EDT
The 5.4.0 RHN Satellite and RHN Proxy release has occurred. This issue has been resolved with this release. 


RHEA-2010:0801 - RHN Satellite Server 5.4.0 Upgrade
https://rhn.redhat.com/rhn/errata/details/Details.do?eid=10332

RHEA-2010:0803 - RHN Tools enhancement update
https://rhn.redhat.com/rhn/errata/details/Details.do?eid=10333

RHEA-2010:0802 - RHN Proxy Server 5.4.0 bug fix update
https://rhn.redhat.com/rhn/errata/details/Details.do?eid=10334

RHEA-2010:0800 - RHN Satellite Server 5.4.0
https://rhn.redhat.com/rhn/errata/details/Details.do?eid=10335

Docs are available:

http://docs.redhat.com/docs/en-US/Red_Hat_Network_Satellite/index.html 

Regards,
Clifford

Note You need to log in before you can comment on or make changes to this bug.