+++ This bug was initially created as a clone of Bug #527288 +++ Description of problem: satellite-sync took 2.25 hours before upgrading from 5.2 to 5.3. Now it takes 4.25 hours. After satellite-sync completes, the system shows lots of activity, with java, opracle and httpd processes monopolizing the CPU. top indicates 98-100% IOWAIT for up to two hours or more after satellite-sync is complete. I can see the hard drive light on the server is on solid indicating heavy disk activity. We addressed some (most?) issues, but first sync (especially from rhn.redhat.com) is still slower compared to Satellite 5.2. We released errata without this being addressed and we will focus on this part in this separate bugzilla.
This is also the case when performing the initial satellite-sync when operating disconnected from hosted RHN. I've identified when the problem specifically occurs and how to work around it temporarily. The main culprit seems to be oracle. Everything works normally until it gets to the stage where satellite-sync reports "Importing *relevant* package metadata." For about 2-3 minutes things work somewhat normally, then the oracle takes around 90-100% of the cpu and satellite-sync gets cpu starved. Even on a multi-processor machine these two processes use a single cpu. Temporary work around (only needed for 1st satellite-sync): start satellite-sync to load channel content kill it with ^C when it displays "Importing *relevant* package metadata" restart Satellite server: rhn-satellite restart start satellite-sync as before (*) * - satellite-sync will resume where it left off and satellite-sync will consume around 40-80% of the cpu while oracle will consume around 30%. A java process comes to life from time to time and consumes a lot of cpu, but oracle seems to be the culprit.
I found another workaround for this bug. After installing a RHN Satellite Server: satellite-sync a custom base channel with a single RPM restart the Satellite Server satellite-sync Red Hat content In the last step satellite-sync takes most of the cpu and oracle stays around 30%. Also a clarification on my earlier comment - oracle may not be the original cause of the problem, but it goes wild and consumes too much CPU when the initial satellite-sync is performed so it is the primary cause for the long times.
The following patch should addresses the issue: diff --git a/backend/server/importlib/backend.py b/backend/server/importlib/backend.py index 89d7eac..a56c067 100644 --- a/backend/server/importlib/backend.py +++ b/backend/server/importlib/backend.py @@ -84,7 +84,7 @@ class Backend: def processCapabilities(self, capabilityHash): # First figure out which capabilities are already inserted templ = """ - select id + select /*+index(rhnPackageCapability rhn_pkg_cap_name_version_uq)*/ id from rhnPackageCapability where name = :name and version %s"""
Fixed in Spacewalk master 7de2157c76e530296815fdfbbb4d9db88c7f9597.
Packages spacewalk-backend-0.5.28-38.bz_545389.1.el[45]sat built.
Test runs comparing Sat 5.2.0 and Sat 5.3.0 with applied advisory for sat sync: satellite-sync -c rhel-i386-server-supplementary-5 Synced to the stage. i386: RHEL4-U8: 5.2.0: 1 hours, 20 minutes, 8 seconds 5.3.0: 1 hours, 18 minutes, 43 seconds RHEL5-Server-U4: 5.2.0: 1 hours, 13 minutes, 48 seconds 5.3.0: 1 hours, 16 minutes, 18 seconds x86_64: RHEL4-U8: 5.3.0: 0 hours, 28 minutes, 17 seconds RHEL5-Server-U4: 5.2.0: 1 hours, 17 minutes, 34 seconds 5.3.0: 1 hours, 16 minutes, 38 seconds s390x: RHEL5-Server-U4: 5.2.0: 1 hours, 47 minutes, 18 seconds 5.3.0: 1 hours, 53 minutes, 42 seconds
RHTS tests passed for Sat 5.3.0 with applied advisory. RHEL4 i386, x86_64, s390, s390x RHEL5 i386, x86_64, s390x rh-tests-RHN-Satellite-Inter-Satellite-Sync-Sanity-general rh-tests-RHN-Satellite-Inter-Satellite-Sync-Sanity-crazy-metadata rh-tests-RHN-Satellite-Inter-Satellite-Sync-Sanity-check-var-satellite
(In reply to comment #8) > Test runs comparing Sat 5.2.0 and Sat 5.3.0 with applied advisory for sat sync: > > satellite-sync -c rhel-i386-server-supplementary-5 Pavel, isn't rhel-i386-server-supplementary-5 a child channel of rhel-i386-server-5? That would mean that the parent channel has already been synced on the Satellite before, and thus your satellite-sync is not the initial one. The bugfix (and I believe that the bugzilla as well) is about the very first channel to be synced to the Satellite, not about the first sync of any additional channel. While it's a good validation that fhe first sync of the additional channel is of the same speed as it was on 5.2.0, what we really need is to verify the speed of the very first channel sync on otherwise empty Satellite, and for that you will need the parent channel, I believe.
(In reply to comment #8) > Test runs comparing Sat 5.2.0 and Sat 5.3.0 with applied advisory for sat sync: > > satellite-sync -c rhel-i386-server-supplementary-5 > > Synced to the stage. > > i386: > RHEL4-U8: > 5.2.0: 1 hours, 20 minutes, 8 seconds > 5.3.0: 1 hours, 18 minutes, 43 seconds > RHEL5-Server-U4: > 5.2.0: 1 hours, 13 minutes, 48 seconds > 5.3.0: 1 hours, 16 minutes, 18 seconds > x86_64: > RHEL4-U8: > 5.3.0: 0 hours, 28 minutes, 17 seconds Have you got any explanation why this run is significantly faster than the other runs? What is the /var/satellite configuration -- is it NFS mounted, or? Are these times for the whole satellite-sync runs, or just for the "Importing *relevant* package metadata." step?
(In reply to comment #10) > isn't rhel-i386-server-supplementary-5 a child channel of rhel-i386-server-5? > That would mean that the parent channel has already been synced on the > Satellite before, and thus your satellite-sync is not the initial one. > > The bugfix (and I believe that the bugzilla as well) is about the very first > channel to be synced to the Satellite, not about the first sync of any > additional channel. While it's a good validation that fhe first sync of the > additional channel is of the same speed as it was on 5.2.0, what we really need > is to verify the speed of the very first channel sync on otherwise empty > Satellite, and for that you will need the parent channel, I believe. rhel-i386-server-supplementary-5 is a child channel I am running new test on fresh relevant satelites, RHELs and architectues.
(In reply to comment #11) > (In reply to comment #8) > Have you got any explanation why this run is significantly faster than the > other runs? What is the /var/satellite configuration -- is it NFS mounted, or? > > Are these times for the whole satellite-sync runs, or just for the > > "Importing *relevant* package metadata." > > step? I have no explanation for the faster run. /var/satellite configuration was the same as the other satellites, not nfs mounted. The times are for whole satellite-sync runs.
Test runs comparison of Sat 5.2.0 and Sat 5.3.0 with applied advisory: satellite-sync -c redhat-linux-i386-9 Synced to the stage for a base channel. Times captured are 'whole satellite-sync run'. s390x: RHEL4-U8, 5.3: 1 hours, 24 minutes, 23 seconds RHEL5-Server-U4, 5.3: 1 hours, 49 minutes, 56 seconds RHEL4-U8, 5.2: 1 hours, 18 minutes, 6 seconds i386: RHEL4-U8, 5.3: 1 hours, 5 minutes, 42 seconds RHEL5-Server-U4, 5.3: 0 hours, 48 minutes, 4 seconds RHEL4-U8, 5.2: 1 hours, 5 minutes, 26 seconds RHEL5-Server-U4, 5.2: 0 hours, 51 minutes, 46 seconds x86_64: RHEL4-U8, 5.3: 0 hours, 38 minutes, 31 seconds RHEL5-Server-U4, 5.3: 0 hours, 51 minutes, 3 seconds RHEL5-Server-U4, 5.2: 0 hours, 50 minutes, 21 seconds
Test runs comparison of Sat 5.2.0 and Sat 5.3.0 with applied advisory: satellite-sync -c rhel-i386-server-5.0.z Synced from the mounted stage export (nfs://dump-new.rhndev.redhat.com/vol/exportstageprod) for a base channel. Times captured are 'Importing *relevant* package metadata'. s390x: RHEL4-U8, 5.3: 0 hours, 31 minutes, 15 seconds RHEL5-Server-U4, 5.3: 0 hours, 32 minutes, 50 seconds RHEL4-U8, 5.2: 0 hours, 29 minutes, 17 seconds i386: RHEL4-U8, 5.3: 0 hours, 30 minutes, 5 seconds RHEL5-Server-U4, 5.3: 0 hours, 23 minutes, 52 seconds RHEL4-U8, 5.2: 0 hours, 31 minutes, 42 seconds RHEL5-Server-U4, 5.2: 0 hours, 22 minutes, 31 seconds x86_64: RHEL5-Server-U4, 5.3: 0 hours, 37 minutes, 6 seconds RHEL5-Server-U4, 5.2: 0 hours, 34 minutes, 6 seconds
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0105.html