Bug 1022279 - Errata queries constantly running
Summary: Errata queries constantly running
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Spacewalk
Classification: Community
Component: Server
Version: 2.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Michael Mráka
QA Contact: Red Hat Satellite QA List
URL:
Whiteboard:
Depends On:
Blocks: 1025636 space21
TreeView+ depends on / blocked
 
Reported: 2013-10-22 22:15 UTC by JDavis4102
Modified: 2014-03-04 13:08 UTC (History)
7 users (show)

Fixed In Version: spacewalk-java-2.1.64-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1025636 (view as bug list)
Environment:
Last Closed: 2014-03-04 13:06:43 UTC
Embargoed:


Attachments (Terms of Use)
query and query plan (11.74 KB, text/plain)
2013-10-30 21:41 UTC, daryl herzmann
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1023557 0 high CLOSED Satellite 5.6 postgres takes a long time to satellite-sync for initial sync 2021-02-22 00:41:40 UTC

Internal Links: 1023557

Description JDavis4102 2013-10-22 22:15:54 UTC
Description of problem: Errata tasks in taskomatic never seem to finish and in the DB logs I am seeing the following query all the time. Seems like it never finishes and usually takes up 2 cores and 100% cpu on those two cores.

SELECT DISTINCT S.id AS server_id, S.org_id org_id FROM rhnServer S JOIN (select rhnChannelErrata.errata_id, rhnChannelErrata.channel_id, rhnServerChannel.server_id, rhnErrataPackage.package_id from rhnChannelErrata, rhnErrataPackage, rhnChannelNewestPackage, rhnPackageEVR, rhnServerChannel, rhnServerPackage, rhnPackageUpgradeArchCompat where rhnChannelErrata.errata_id = rhnErrataPackage.errata_id and rhnChannelErrata.channel_id = rhnChannelNewestPackage.channel_id and rhnErrataPackage.package_id = rhnChannelNewestPackage.package_id and rhnChannelErrata.channel_id = rhnServerChannel.channel_id and rhnChannelNewestPackage.name_id = rhnServerPackage.name_id and rhnServerChannel.server_id = rhnServerPackage.server_id and rhnChannelNewestPackage.evr_id = rhnPackageEVR.id and rhnServerPackage.package_arch_id = rhnPackageUpgradeArchCompat.package_arch_id and rhnPackageUpgradeArchCompat.package_upgrade_arch_id = rhnCh


How reproducible:
Not sure

Steps to Reproduce:
Not sure how to reproduce except to have a lot of channels and servers (8800+)

Comment 1 daryl herzmann 2013-10-30 21:41:44 UTC
Created attachment 817650 [details]
query and query plan

Comment 2 daryl herzmann 2013-10-30 21:42:50 UTC
I've attached the offending query (took 1600 seconds) and am experimenting with adding various indices to remove those massive sequence scans!

Comment 3 daryl herzmann 2013-10-31 20:26:56 UTC
Well, even setting "set enable_seqscan='off';" did not help.  Looking at the query, it seems the SNV subquery is materializing a ton of data!  The outer query limit of SNV.channel_id = and SNV.errata_id = does not make it down into the subquery.  For a test, I modified the sub query and added the explicit (and rhnChannelErrata.errata_id = 19411). The query returned in a second or so.... 

I am not sure if that is the right thing to do....

Comment 4 Xixi 2013-11-01 01:45:31 UTC
See also: similar / possibly related bug currently being investigated for Red Hat Satellite 5.6 (the highlighted queries are different): 
Bug 1023557 - Satellite 5.6 postgres takes a long time to satellite-sync

Comment 5 Michael Mráka 2013-11-04 11:59:17 UTC
This seems to be (again) issue with PostgreSQL 8.4 optimizer which can't "unfold" inner sub-queries so it starts execution with  cartesian joins in these sub-queries and then later on filtering out majority of the rows...


From explain plan problematic parts, according to number of rows, are

a)
              ->  Hash  (cost=488538.14..488538.14 rows=31777694 width=60)
                     ->  Hash Join  (cost=31433.37..488538.14 rows=31777694 width=60)
                           Hash Cond: (csc.channel_id = ocnp.channel_id)

which corresponds to the third "not exits" sub-query (-- rule out records where other channel provides the same package name with higher evr),

b)
                           ->  Materialize  (cost=0.00..284594.16 rows=11626006 width=26)
                                 ->  Nested Loop  (cost=0.00..168334.10 rows=11626006 width=26)
                                       ->  Index Scan using rhn_sc_sid_cid_uq on rhnserverchannel csc  (cost=0.00..237.98 rows=3229 width=18)
                                       ->  Index Scan using rhn_cp_cp_uq on rhnchannelpackage cp  (cost=0.00..32.42 rows=1571 width=15)
                                             Index Cond: (cp.channel_id = csc.channel_id)

which corresponds to the first "not exits" sub-query (-- rule out records where channels with lower id exist == get min)

and c)
                     ->  Hash  (cost=130497.67..130497.67 rows=5737384 width=60)
                           ->  Hash Join  (cost=754.39..130497.67 rows=5737384 width=60)
                                 Hash Cond: (sp.package_arch_id = puac.package_upgrade_arch_id)
                                 ->  Hash Join  (cost=750.91..37615.95 rows=946784 width=60)
                                       Hash Cond: (sp.evr_id = sevr.id)

which corresponds to the second "not exits" sub-query (-- rule out records where the server has the same package name with higher evr).

Comment 6 Michael Mráka 2013-11-04 12:03:04 UTC
For a) and b) we can add condition to filter out just necessary channels,
i.e.
  AND CP.CHANNEL_ID = :channel_id
resp.
  AND CSC.CHANNEL_ID = :channel_id

Comment 7 Michael Mráka 2013-11-04 12:09:16 UTC
The changes from comment #6 actually make huge difference on my test system
from original

                                    QUERY PLAN
---------------------------------------------------------------
 Unique  (cost=2172993.26..3044634.48 rows=1 width=18) (actual time=673.244..673.244 rows=0 loops=1)

to new
                                 QUERY PLAN
--------------------------------------------
 Unique  (cost=711198.84..711198.85 rows=1 width=18) (actual time=22.454..22.454 rows=0 loops=1)

which is 3 times lower cost / faster.

Comment 8 Michael Mráka 2013-11-04 12:18:20 UTC
The c) is more tricky because there's no channel_id nor errata_id we can use to downsize query results. So we actually have to add some more tables to filter out redundant packages. E.g.

                -- rule out records where the server has the same package name with higher evr
                select 1 from rhnServerPackage sp, rhnPackageEVR sevr, rhnPackageUpgradeArchCompat puac
+, RHNERRATAPACKAGE JEP, RHNPACKAGE JP
                where rhnServerPackage.server_id = sp.server_id
+AND JEP.ERRATA_ID = :errata_id
+AND JEP.PACKAGE_ID = JP.ID
+AND JP.NAME_ID = SP.NAME_ID
                        and rhnServerPackage.name_id = sp.name_id
                        and sp.evr_id = sevr.id
                        and rhnPackageEVR.evr <= sevr.evr
                        and rhnServerPackage.package_arch_id = puac.package_arch_id
                        and puac.package_upgrade_arch_id = sp.package_arch_id

Comment 9 Michael Mráka 2013-11-04 12:22:40 UTC
After comment #8 change explain plan shows even
                                                                                                                   QUERY PLAN
---------------------------------------------------------------
 Unique  (cost=431174.58..431174.59 rows=1 width=18) (actual time=626.219..626.219 rows=0 loops=1)

which is 5 times better than original one.

Comment 10 Michael Mráka 2013-11-04 12:34:37 UTC
Fixed in spacewalk master with
commit ccd106a5ca81fb7a45ddcbe9c01c6e7dcb4bcb40
    1022279 - modified query to work better with PostgreSQL 8.4 optimizer
commit e431dc48db2d5197a1ef1888c6ef14fcfe869730
    1022279 - added hints for PostgreSQL 8.4 optimizer

Comment 12 JDavis4102 2014-01-15 23:34:13 UTC
Issues seems to not be fixed Still having issues with the following Errata query:

SELECT DISTINCT +
| | S.id AS server_id, S.org_id org_id +
| | FROM rhnServer S +
| | JOIN ( +
| | -- +
| | select rhnChannelErrata.errata_id, rhnChannelErrata.channel_id, rhnServerChannel.server_id, rhnErrataPackage.package_id+
| | from rhnChannelErrata, rhnErrataPackage, rhnChannelNewestPackage, rhnPackageEVR, +
| | rhnServerChannel, rhnServerPackage, rhnPackageUpgradeArchCompat +
| | where rhnChannelErrata.errata_id = rhnErrataPackage.errata_id +
| | -- +
| | and rhnChannelErrata.channel_id = rhnChannelNewestPackage.channel_id +
| | and rhnErrataPackage.package_id = rhnChannelNewestPackage.package_id +
| | -- +
| | and rhnChannelErrata.channel_id = rhnServerChannel.channel_id +
| | and rhnChannelNewestPackage.name_id = rhnServerPackage.name_id +
| | and rhnServerChannel.server_id = rhnServerPackage.server_id +
| | -- +
| | and rhnChannelNewestPackage.evr_id = rhnPackageEVR.id +
| | -- +
| | and rhnServerPackage.package_arch_id = rhnPackageUpgradeArchCompat.package_arch_id +
| | and rhnPackageUpgradeArchCompat.package_upgrade_arch_id = rhnCh

Comment 13 Matej Kollar 2014-01-17 12:15:41 UTC
Switching MODIFIED Spacewalk bugs to ON_QA before 2.1 release.

Comment 14 JDavis4102 2014-01-17 16:48:12 UTC
This bug is not resolved and needs further research. As the issue is not fixed. This bug needs to be reverted back to open and not modified.

Comment 15 Matej Kollar 2014-03-04 13:06:43 UTC
Spacewalk 2.1 has been released.
https://fedorahosted.org/spacewalk/wiki/ReleaseNotes21

Comment 16 Matej Kollar 2014-03-04 13:08:37 UTC
Spacewalk 2.1 has been released.
https://fedorahosted.org/spacewalk/wiki/ReleaseNotes21


Note You need to log in before you can comment on or make changes to this bug.