Red Hat Bugzilla – Bug 1022279
Errata queries constantly running
Last modified: 2014-03-04 08:08:37 EST
Description of problem: Errata tasks in taskomatic never seem to finish and in the DB logs I am seeing the following query all the time. Seems like it never finishes and usually takes up 2 cores and 100% cpu on those two cores.
SELECT DISTINCT S.id AS server_id, S.org_id org_id FROM rhnServer S JOIN (select rhnChannelErrata.errata_id, rhnChannelErrata.channel_id, rhnServerChannel.server_id, rhnErrataPackage.package_id from rhnChannelErrata, rhnErrataPackage, rhnChannelNewestPackage, rhnPackageEVR, rhnServerChannel, rhnServerPackage, rhnPackageUpgradeArchCompat where rhnChannelErrata.errata_id = rhnErrataPackage.errata_id and rhnChannelErrata.channel_id = rhnChannelNewestPackage.channel_id and rhnErrataPackage.package_id = rhnChannelNewestPackage.package_id and rhnChannelErrata.channel_id = rhnServerChannel.channel_id and rhnChannelNewestPackage.name_id = rhnServerPackage.name_id and rhnServerChannel.server_id = rhnServerPackage.server_id and rhnChannelNewestPackage.evr_id = rhnPackageEVR.id and rhnServerPackage.package_arch_id = rhnPackageUpgradeArchCompat.package_arch_id and rhnPackageUpgradeArchCompat.package_upgrade_arch_id = rhnCh
Steps to Reproduce:
Not sure how to reproduce except to have a lot of channels and servers (8800+)
Created attachment 817650 [details]
query and query plan
I've attached the offending query (took 1600 seconds) and am experimenting with adding various indices to remove those massive sequence scans!
Well, even setting "set enable_seqscan='off';" did not help. Looking at the query, it seems the SNV subquery is materializing a ton of data! The outer query limit of SNV.channel_id = and SNV.errata_id = does not make it down into the subquery. For a test, I modified the sub query and added the explicit (and rhnChannelErrata.errata_id = 19411). The query returned in a second or so....
I am not sure if that is the right thing to do....
See also: similar / possibly related bug currently being investigated for Red Hat Satellite 5.6 (the highlighted queries are different):
Bug 1023557 - Satellite 5.6 postgres takes a long time to satellite-sync
This seems to be (again) issue with PostgreSQL 8.4 optimizer which can't "unfold" inner sub-queries so it starts execution with cartesian joins in these sub-queries and then later on filtering out majority of the rows...
From explain plan problematic parts, according to number of rows, are
-> Hash (cost=488538.14..488538.14 rows=31777694 width=60)
-> Hash Join (cost=31433.37..488538.14 rows=31777694 width=60)
Hash Cond: (csc.channel_id = ocnp.channel_id)
which corresponds to the third "not exits" sub-query (-- rule out records where other channel provides the same package name with higher evr),
-> Materialize (cost=0.00..284594.16 rows=11626006 width=26)
-> Nested Loop (cost=0.00..168334.10 rows=11626006 width=26)
-> Index Scan using rhn_sc_sid_cid_uq on rhnserverchannel csc (cost=0.00..237.98 rows=3229 width=18)
-> Index Scan using rhn_cp_cp_uq on rhnchannelpackage cp (cost=0.00..32.42 rows=1571 width=15)
Index Cond: (cp.channel_id = csc.channel_id)
which corresponds to the first "not exits" sub-query (-- rule out records where channels with lower id exist == get min)
-> Hash (cost=130497.67..130497.67 rows=5737384 width=60)
-> Hash Join (cost=754.39..130497.67 rows=5737384 width=60)
Hash Cond: (sp.package_arch_id = puac.package_upgrade_arch_id)
-> Hash Join (cost=750.91..37615.95 rows=946784 width=60)
Hash Cond: (sp.evr_id = sevr.id)
which corresponds to the second "not exits" sub-query (-- rule out records where the server has the same package name with higher evr).
For a) and b) we can add condition to filter out just necessary channels,
AND CP.CHANNEL_ID = :channel_id
AND CSC.CHANNEL_ID = :channel_id
The changes from comment #6 actually make huge difference on my test system
Unique (cost=2172993.26..3044634.48 rows=1 width=18) (actual time=673.244..673.244 rows=0 loops=1)
Unique (cost=711198.84..711198.85 rows=1 width=18) (actual time=22.454..22.454 rows=0 loops=1)
which is 3 times lower cost / faster.
The c) is more tricky because there's no channel_id nor errata_id we can use to downsize query results. So we actually have to add some more tables to filter out redundant packages. E.g.
-- rule out records where the server has the same package name with higher evr
select 1 from rhnServerPackage sp, rhnPackageEVR sevr, rhnPackageUpgradeArchCompat puac
+, RHNERRATAPACKAGE JEP, RHNPACKAGE JP
where rhnServerPackage.server_id = sp.server_id
+AND JEP.ERRATA_ID = :errata_id
+AND JEP.PACKAGE_ID = JP.ID
+AND JP.NAME_ID = SP.NAME_ID
and rhnServerPackage.name_id = sp.name_id
and sp.evr_id = sevr.id
and rhnPackageEVR.evr <= sevr.evr
and rhnServerPackage.package_arch_id = puac.package_arch_id
and puac.package_upgrade_arch_id = sp.package_arch_id
After comment #8 change explain plan shows even
Unique (cost=431174.58..431174.59 rows=1 width=18) (actual time=626.219..626.219 rows=0 loops=1)
which is 5 times better than original one.
Fixed in spacewalk master with
1022279 - modified query to work better with PostgreSQL 8.4 optimizer
1022279 - added hints for PostgreSQL 8.4 optimizer
Issues seems to not be fixed Still having issues with the following Errata query:
SELECT DISTINCT +
| | S.id AS server_id, S.org_id org_id +
| | FROM rhnServer S +
| | JOIN ( +
| | -- +
| | select rhnChannelErrata.errata_id, rhnChannelErrata.channel_id, rhnServerChannel.server_id, rhnErrataPackage.package_id+
| | from rhnChannelErrata, rhnErrataPackage, rhnChannelNewestPackage, rhnPackageEVR, +
| | rhnServerChannel, rhnServerPackage, rhnPackageUpgradeArchCompat +
| | where rhnChannelErrata.errata_id = rhnErrataPackage.errata_id +
| | -- +
| | and rhnChannelErrata.channel_id = rhnChannelNewestPackage.channel_id +
| | and rhnErrataPackage.package_id = rhnChannelNewestPackage.package_id +
| | -- +
| | and rhnChannelErrata.channel_id = rhnServerChannel.channel_id +
| | and rhnChannelNewestPackage.name_id = rhnServerPackage.name_id +
| | and rhnServerChannel.server_id = rhnServerPackage.server_id +
| | -- +
| | and rhnChannelNewestPackage.evr_id = rhnPackageEVR.id +
| | -- +
| | and rhnServerPackage.package_arch_id = rhnPackageUpgradeArchCompat.package_arch_id +
| | and rhnPackageUpgradeArchCompat.package_upgrade_arch_id = rhnCh
Switching MODIFIED Spacewalk bugs to ON_QA before 2.1 release.
This bug is not resolved and needs further research. As the issue is not fixed. This bug needs to be reverted back to open and not modified.
Spacewalk 2.1 has been released.