Bug 481671 - package permissions check takes too long with multiorg 2
package permissions check takes too long with multiorg 2
Status: CLOSED CURRENTRELEASE
Product: Red Hat Satellite 5
Classification: Red Hat
Component: WebUI (Show other bugs)
530
All Linux
urgent Severity high
: ---
: ---
Assigned To: Jeff Ortel
Sayli Karmarkar
:
Depends On:
Blocks: 456998
  Show dependency treegraph
 
Reported: 2009-01-26 18:10 EST by Justin Sherrill
Modified: 2015-03-22 21:09 EDT (History)
4 users (show)

See Also:
Fixed In Version: sat530
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-10 15:48:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Oracle trace for before/after benchmarks on rhnAvailableChannels view. (29.31 KB, text/plain)
2009-01-29 14:49 EST, Jeff Ortel
no flags Details

  None (edit)
Description Justin Sherrill 2009-01-26 18:10:53 EST
PackageFactory.lookupByIdAndOrg() used to be a very fast operation.  On a large customer's DB it only takes 0.23 seconds max on 5.2.  (If you need a copy of the db, let me know).


If I upgrade the schema to current git, the operation takes about 11 seconds (6 seconds if i switch it to rule based optimizer).

The query in question is:

SELECT 1
  FROM rhnPackage P
 WHERE p.id = 715
   AND (   P.org_id = 1
        OR EXISTS (SELECT 1
                     FROM rhnChannelPackage CP,
                          rhnAvailableChannels AC
                    WHERE AC.org_id = 1
                      AND AC.channel_id = CP.channel_id
                      AND CP.package_id = P.id));



This is highly unacceptable as the site nav calls this method multiple times (10-15x) for an acl.  We could change the acl to not do the check, but there are many many other places that call this method, as it is a very common one.  The result of this is that the package details page doesn't even load with the current schema, it times out.  From the explain plans it seems to point to the new multiorg tables as the cause. 


Here's 2 tables with my statistics.  All runs were run back to back

On 5.2 schema:

run | time
-----------
1   |  0.43
2   |  0.01
3   |  0.00
4   |  0.00


on current git schema:

run | time
----------
1   |  13.28
2   |  11.11
3   |  11.13
4   |  11.10
Comment 1 Justin Sherrill 2009-01-26 18:11:34 EST
Also, worth noting.  All tests were done on the same exact machine, the only thing i switched out was the /rhnsat directory.
Comment 2 Partha Aji 2009-01-27 20:28:33 EST
Rewrite the query to 


SELECT 1
     FROM rhnPackage P
    WHERE p.id = 715
      AND (P.org_id = 1
           OR 
           Exists (select 1 from rhnOrgChannelTreeView CT  inner join rhnChannelPackage CP on Ct.id = CP.channel_id  where ct.org_id = 1 and CP.package_id = p.id)
           OR
           Exists ( SELECT 1 FROM rhnSharedChannelTreeView CT  inner join rhnChannelPackage CP on Ct.id = CP.channel_id
                        where ct.org_id = 1 and CP.package_id = p.id))


Speeds up the execution time to .09 seconds...
Comment 3 Partha Aji 2009-01-28 09:59:28 EST
Changed the actual query to the one on comment 2...
Comment 4 Jeff Ortel 2009-01-28 10:09:38 EST
In addition to the fix in #2, the rhnAvailableChannels still needs to be tuned.  It is used in other places.
Comment 5 Jeff Ortel 2009-01-29 14:47:35 EST
The performance problem with the view is that the correlated subquery used to get the number of systems registered to a channel also contains a correlated subquery.

-- as it was --

SELECT COUNT(1) 
FROM rhnServer S 
WHERE S.org_id = ct.org_id 
AND EXISTS (SELECT 1 FROM rhnServerChannel 
                    WHERE channel_id = ct.id AND server_id = S.id)

Since correlated subqueries are executed for every row in the result set in the outer query, this multiplies as follows:

The most inner query:

SELECT 1 FROM rhnServerChannel WHERE channel_id = ct.id AND server_id = S.id

in this case is executed for every server.  In this case is 13740.

The next most inner query:

SELECT COUNT(1) 
FROM rhnServer S 
WHERE S.org_id = ct.org_id 
AND EXISTS (<most inner query>)

Is executed for every row in the most outer query's (SELECT FROM rhnOrgChannelTreeView) result set as and in this case was: 100.

So, our most inner query was being executed: 100*13740 = 1,374,000 times.

By simply rewriting as:

(SELECT COUNT(1)
 FROM rhnServer S
 INNER JOIN rhnServerChannel SC ON SC.server_id = S.id
 WHERE SC.channel_id = CT.id AND S.org_id = CT.org_id)

Now we're only executing the above (1) level deep correlated subquery 100 times.

Query time for (select * from rhnAvailableChannels) view on this database was reduced from 11 seconds to 1.27 seconds using the cost based optimizer on 10g.  I believe the time using the rule based optimizer would be around .6 seconds.

See the attached oracle trace file: rhnAvailableChannelsTrace.txt
Comment 6 Jeff Ortel 2009-01-29 14:49:00 EST
Created attachment 330397 [details]
Oracle trace for before/after benchmarks on rhnAvailableChannels view.
Comment 7 John Matthews 2009-07-31 16:24:23 EDT
Moving to RELEASE_PENDING

Tested on a stage satellite with 2k systems registered.
Loaded package details page, it loaded quick, no noticeable delays.

Also executed XMLRPC "system.listSystemsWithPackage" since it is another path that used the call "PackageFactory.lookupByIdAndOrg()"
Comment 8 Brandon Perkins 2009-09-10 15:48:26 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1434.html

Note You need to log in before you can comment on or make changes to this bug.