Description of problem: Registering RHEL5-Server-GOLD to Satellite@PostgreSQL into RHEL5 base channel is very slow. Also adding or removing it's child channels is slow. Maybe it is cases by errata cache (rhn_server.update_needed_cache(systemID) (?))? Version-Release number of selected component (if applicable): Satellite-5.6.0-RHEL6-re20130807.0 How reproducible: always Steps to Reproduce: 1. Have Satellite@PostgreSQL with RHEL5 base channel synced 2. Add "log_min_duration_statement = 500" into DB config 3. Register RHEL5-Server-GOLD (default package set - i.e. desktop... => lot of packages to upgrade) Actual results: It takes too long: Done 3 attempts, and this is an average on "real": Sat560@EmbPostgreSQL: 2m46.001s 2m41.942s 2m32.043s => AVG = 159.995s Sat560@ExtOracleSameSyst: 0m5.540s 0m1.828s 0m1.674s => AVG = 3.014s Expected results: Should be quick as against Satellite@Oracle Additional info: Maybe this is a duplicate of bug 985947.
E.g. when adding/removing custom child channel to such a system you can see this: 2013-08-21 06:46:27.925 EDT LOG: duration: 1736.834 ms execute <unnamed>: select * from rhn_channel.subscribe_server($1, $2, 1, $3) as result 2013-08-21 06:46:27.925 EDT DETAIL: parameters: $1 = '1000010032', $2 = '106', $3 = '1' 2013-08-21 06:46:32.751 EDT LOG: duration: 886.580 ms execute <unnamed>: select * from rhn_channel.unsubscribe_server($1, $2) as result 2013-08-21 06:46:32.751 EDT DETAIL: parameters: $1 = '1000010032', $2 = '106' 2013-08-21 06:50:01.550 EDT LOG: duration: 1259.027 ms execute <unnamed>: select * from rhn_server.update_needed_cache($1) as result 2013-08-21 06:50:01.551 EDT DETAIL: parameters: $1 = '1000010032'
The problem is that the query optimizer in Postgres 8.4 is not smart enough to be optimize away a sub-select in the rhnServerNeededView when you are calling through the rhn_server.update_needed_cache function (which adds yet another level of complexity to the query that the optimizer is trying to execute). rhnServerNeededView: ###################################### CREATE OR REPLACE VIEW rhnServerNeededView ( org_id, server_id, errata_id, package_id, package_name_id, channel_id ) AS SELECT s.org_id, sp.server_id, x.errata_id, up.id, up.name_id, x.channel_id FROM rhnServer s join (SELECT sp_sp.server_id, sp_sp.name_id, sp_sp.package_arch_id, max(sp_pe.evr) AS max_evr FROM rhnServerPackage sp_sp join rhnPackageEvr sp_pe ON sp_pe.id = sp_sp.evr_id GROUP BY sp_sp.server_id, sp_sp.name_id, sp_sp.package_arch_id) sp ON sp.server_id = s.id join rhnPackage up ON up.name_id = sp.name_id join rhnPackageEvr upe ON upe.id = up.evr_id AND sp.max_evr < upe.evr join rhnPackageUpgradeArchCompat puac ON puac.package_arch_id = sp.package_arch_id AND puac.package_upgrade_arch_id = up.package_arch_id join rhnServerChannel sc ON sc.server_id = sp.server_id join rhnChannelPackage cp ON cp.package_id = up.id AND cp.channel_id = sc.channel_id left join (SELECT ep.errata_id, cp.channel_id, ep.package_id FROM rhnChannelErrata cp join rhnErrataPackage ep ON ep.errata_id = cp.errata_id) x ON x.channel_id = sc.channel_id AND x.package_id = cp.package_id ; ###################################### It is not strictly necessary to do the sub-select at the bottom there as a sub-select (there are no aggregate functions or other things that would prevent you from in-lining it), it's just written that way because that is similar to the logic we are trying to implement. However in this case Postgres 8.4's optimizer is not able to optimize it away, so it does that entire join on those two huge tables, only to throw out most of the results when it left joins the results to the results of the rest of the query. It is that behavior that causes the call to update_needed_cache to take the large amount of time it is taking, recorded at over a minute in comment 6 and I've seen it take as long as two minutes. In Oracle and Postgres 9.x this does not appear to be a problem, the optimizer is smart enough to transform the sub-select into joins that take much less time. In order to resolve this problem I have re-written the view to return the same data but through the join method instead of the sub-select method: ###################################### CREATE OR REPLACE VIEW rhnServerNeededView ( org_id, server_id, errata_id, package_id, package_name_id, channel_id ) AS SELECT DISTINCT s.org_id, sp.server_id, ce.errata_id, up.id, up.name_id, ce.channel_id FROM rhnServer s join (SELECT sp_sp.server_id, sp_sp.name_id, sp_sp.package_arch_id, max(sp_pe.evr) AS max_evr FROM rhnServerPackage sp_sp join rhnPackageEvr sp_pe ON sp_pe.id = sp_sp.evr_id GROUP BY sp_sp.server_id, sp_sp.name_id, sp_sp.package_arch_id) sp ON sp.server_id = s.id join rhnPackage up ON up.name_id = sp.name_id join rhnPackageEvr upe ON upe.id = up.evr_id AND sp.max_evr < upe.evr join rhnPackageUpgradeArchCompat puac ON puac.package_arch_id = sp.package_arch_id AND puac.package_upgrade_arch_id = up.package_arch_id join rhnServerChannel sc ON sc.server_id = sp.server_id join rhnChannelPackage cp ON cp.package_id = up.id AND cp.channel_id = sc.channel_id left join rhnErrataPackage ep ON cp.package_id = ep.package_id left join rhnChannelErrata ce ON sc.channel_id = ce.channel_id AND ep.errata_id = ce.errata_id; ###################################### I have tested this with both channels that contain and do not contain errata, and the output is the same as the original view in both cases. However since we have explicitly transformed the query into the join variation the optimizer has less work to do and the query returns quickly, sub-1-second, even on Postgres 8.4. This should enable the server registration to happen quickly like it does on Oracle and Postgres 9. Committing to Spacewalk master: a1f165cd894dd4d5affbc3d62ae49e034f979a6b
As a side note: Since the slow registration behavior comes down to how well the postgres optimizer is able to perform it's job this is not an inherently reproducible problem. There could be any number of factors that would influence the different things the optimizer tries in the limited amount of time it has to come up with the execution plan, including table sizes. This may be a problem on most Postgres 8.4 installs, or only a few of them. It may go away over time as postgres gathers additional statistics and trains its optimizer. It may even happen under postgres 9 if given the right conditions. This update to simplify the query should keep it from happening, but I don't know how reproducible the actual problem is prior to doing the update. The real question is does this updated view definition always return the same data as the old one (and I think it does), if so then we should use the simplified (and faster) definition.
It was discovered that my original fix for this bug was flawed, see bug 1004309. Reverting the original fix in Spacewalk master: 08e13a23fd034569520b55540bf85b22c0f9d67a And fixing it correctly: f70898d4a5a3ea33d4870c02ba3f45e592e8e9c7
*** Bug 1004309 has been marked as a duplicate of this bug. ***
Forgot to update the sha1 hashes: cb150900f64f60e3f5e5df03e2f618eb2709e671
Also need: cd6c31a1be359ce02477da4b2c4535decb13ff68 One file deletion did not make it into the original commit. Also breaking the operations on various entities in the upgrade script into separate files.
Missed another sha1 hash :( b1566cbdfb8171d6ec6ee6979b01543623959c9e
With the release of Red Hat Satellite 5.7 on January 12th 2015 this bug is being moved to a Closed Current Release state. The Satellite 5.7 GA Errata: - https://rhn.redhat.com/errata/RHSA-2015-0033.html Satellite 5.7 Release Notes: - https://access.redhat.com/documentation/en-US/Red_Hat_Satellite/5.7/html-single/Release_Notes/index.html Satellite Customer Portal Blog announcement for release: - https://access.redhat.com/blogs/1169563/posts/1315743 Cliff NOTE: This bug has not been re-verified (moved to RELEASE_PENDING) prior to release. We assume that the bug has indeed been fixed and not regressed since we initially verified it. Please re-open in the future if needed.