Bug 545389 - satellite-sync taking 2-3 times longer with 5.3 as it did with 5.2 (first time sync)
Summary: satellite-sync taking 2-3 times longer with 5.3 as it did with 5.2 (first tim...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Satellite Synchronization
Version: 530
Hardware: All
OS: Linux
urgent
medium
Target Milestone: ---
Assignee: Jan Pazdziora
QA Contact: Pavel Kralik
URL:
Whiteboard:
Depends On: 527288
Blocks: sat531-blockers
TreeView+ depends on / blocked
 
Reported: 2009-12-08 12:54 UTC by Miroslav Suchý
Modified: 2013-04-30 23:32 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 527288
Environment:
Last Closed: 2010-02-16 08:39:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2010:0105 0 normal SHIPPED_LIVE Red Hat Network Satellite bug fix update 2010-02-16 08:39:14 UTC

Description Miroslav Suchý 2009-12-08 12:54:01 UTC
+++ This bug was initially created as a clone of Bug #527288 +++

Description of problem:
satellite-sync took 2.25 hours before upgrading from 5.2 to 5.3.  Now it takes 4.25 hours.  After satellite-sync completes, the system shows lots of activity, with java, opracle and httpd processes monopolizing the CPU.  top indicates 98-100% IOWAIT for up to two hours or more after satellite-sync is complete.  I can see the hard drive light on the server is on solid indicating heavy disk activity.

We addressed some (most?) issues, but first sync (especially from rhn.redhat.com) is still slower compared to Satellite 5.2. We released errata without this being addressed and we will focus on this part in this separate bugzilla.

Comment 2 George Hacker 2009-12-30 18:41:14 UTC
This is also the case when performing the initial satellite-sync when operating disconnected from hosted RHN. I've identified when the problem specifically occurs and how to work around it temporarily.

The main culprit seems to be oracle. Everything works normally until it gets to the stage where satellite-sync reports "Importing *relevant* package metadata." For about 2-3 minutes things work somewhat normally, then the oracle takes around 90-100% of the cpu and satellite-sync gets cpu starved.  Even on a multi-processor machine these two processes use a single cpu.

Temporary work around (only needed for 1st satellite-sync):

start satellite-sync to load channel content
kill it with ^C when it displays "Importing *relevant* package metadata"
restart Satellite server: rhn-satellite restart
start satellite-sync as before (*)

* - satellite-sync will resume where it left off and satellite-sync will consume around 40-80% of the cpu while oracle will consume around 30%. A java process comes to life from time to time and consumes a lot of cpu, but oracle seems to be the culprit.

Comment 3 George Hacker 2009-12-31 07:32:54 UTC
I found another workaround for this bug. After installing a RHN Satellite Server:

  satellite-sync a custom base channel with a single RPM
  restart the Satellite Server
  satellite-sync Red Hat content

In the last step satellite-sync takes most of the cpu and oracle stays around 30%.

Also a clarification on my earlier comment - oracle may not be the original cause of the problem, but it goes wild and consumes too much CPU when the initial satellite-sync is performed so it is the primary cause for the long times.

Comment 4 Jan Pazdziora 2010-01-22 14:01:26 UTC
The following patch should addresses the issue:

diff --git a/backend/server/importlib/backend.py b/backend/server/importlib/backend.py
index 89d7eac..a56c067 100644
--- a/backend/server/importlib/backend.py
+++ b/backend/server/importlib/backend.py
@@ -84,7 +84,7 @@ class Backend:
     def processCapabilities(self, capabilityHash):
         # First figure out which capabilities are already inserted
        templ = """
-            select id
+            select /*+index(rhnPackageCapability rhn_pkg_cap_name_version_uq)*/ id
               from rhnPackageCapability
              where name = :name 
                and version %s"""

Comment 5 Jan Pazdziora 2010-01-28 14:55:47 UTC
Fixed in Spacewalk master 7de2157c76e530296815fdfbbb4d9db88c7f9597.

Comment 7 Jan Pazdziora 2010-01-28 15:14:27 UTC
Packages spacewalk-backend-0.5.28-38.bz_545389.1.el[45]sat built.

Comment 8 Pavel Kralik 2010-02-03 17:32:50 UTC
Test runs comparing Sat 5.2.0 and Sat 5.3.0 with applied advisory for sat sync:

satellite-sync -c rhel-i386-server-supplementary-5

Synced to the stage.

i386:
        RHEL4-U8:
                5.2.0: 1 hours, 20 minutes, 8 seconds
                5.3.0: 1 hours, 18 minutes, 43 seconds
        RHEL5-Server-U4:
                5.2.0: 1 hours, 13 minutes, 48 seconds
                5.3.0: 1 hours, 16 minutes, 18 seconds
x86_64:
        RHEL4-U8:
                5.3.0: 0 hours, 28 minutes, 17 seconds
        RHEL5-Server-U4:
                5.2.0: 1 hours, 17 minutes, 34 seconds
                5.3.0: 1 hours, 16 minutes, 38 seconds
s390x:
        RHEL5-Server-U4:
                5.2.0: 1 hours, 47 minutes, 18 seconds
                5.3.0: 1 hours, 53 minutes, 42 seconds

Comment 9 Pavel Kralik 2010-02-03 17:38:53 UTC
RHTS tests passed for Sat 5.3.0 with applied advisory.

RHEL4 i386, x86_64, s390, s390x
RHEL5 i386, x86_64, s390x

rh-tests-RHN-Satellite-Inter-Satellite-Sync-Sanity-general
rh-tests-RHN-Satellite-Inter-Satellite-Sync-Sanity-crazy-metadata
rh-tests-RHN-Satellite-Inter-Satellite-Sync-Sanity-check-var-satellite

Comment 10 Jan Pazdziora 2010-02-03 20:31:18 UTC
(In reply to comment #8)
> Test runs comparing Sat 5.2.0 and Sat 5.3.0 with applied advisory for sat sync:
> 
> satellite-sync -c rhel-i386-server-supplementary-5

Pavel,

isn't rhel-i386-server-supplementary-5 a child channel of rhel-i386-server-5? That would mean that the parent channel has already been synced on the Satellite before, and thus your satellite-sync is not the initial one.

The bugfix (and I believe that the bugzilla as well) is about the very first channel to be synced to the Satellite, not about the first sync of any additional channel. While it's a good validation that fhe first sync of the additional channel is of the same speed as it was on 5.2.0, what we really need is to verify the speed of the very first channel sync on otherwise empty Satellite, and for that you will need the parent channel, I believe.

Comment 11 Jan Pazdziora 2010-02-03 20:35:23 UTC
(In reply to comment #8)
> Test runs comparing Sat 5.2.0 and Sat 5.3.0 with applied advisory for sat sync:
> 
> satellite-sync -c rhel-i386-server-supplementary-5
> 
> Synced to the stage.
> 
> i386:
>         RHEL4-U8:
>                 5.2.0: 1 hours, 20 minutes, 8 seconds
>                 5.3.0: 1 hours, 18 minutes, 43 seconds
>         RHEL5-Server-U4:
>                 5.2.0: 1 hours, 13 minutes, 48 seconds
>                 5.3.0: 1 hours, 16 minutes, 18 seconds
> x86_64:
>         RHEL4-U8:
>                 5.3.0: 0 hours, 28 minutes, 17 seconds

Have you got any explanation why this run is significantly faster than the other runs? What is the /var/satellite configuration -- is it NFS mounted, or?

Are these times for the whole satellite-sync runs, or just for the

  "Importing *relevant* package metadata."

step?

Comment 12 Pavel Kralik 2010-02-05 13:52:53 UTC
(In reply to comment #10)

> isn't rhel-i386-server-supplementary-5 a child channel of rhel-i386-server-5?
> That would mean that the parent channel has already been synced on the
> Satellite before, and thus your satellite-sync is not the initial one.
> 
> The bugfix (and I believe that the bugzilla as well) is about the very first
> channel to be synced to the Satellite, not about the first sync of any
> additional channel. While it's a good validation that fhe first sync of the
> additional channel is of the same speed as it was on 5.2.0, what we really need
> is to verify the speed of the very first channel sync on otherwise empty
> Satellite, and for that you will need the parent channel, I believe.    

rhel-i386-server-supplementary-5 is a child channel I am running new test on fresh relevant satelites, RHELs and architectues.

Comment 13 Pavel Kralik 2010-02-05 13:56:33 UTC
(In reply to comment #11)
> (In reply to comment #8)

> Have you got any explanation why this run is significantly faster than the
> other runs? What is the /var/satellite configuration -- is it NFS mounted, or?
> 
> Are these times for the whole satellite-sync runs, or just for the
> 
>   "Importing *relevant* package metadata."
> 
> step?    

I have no explanation for the faster run. /var/satellite configuration was the same as the other satellites, not nfs mounted.

The times are for whole satellite-sync runs.

Comment 14 Pavel Kralik 2010-02-05 14:01:53 UTC
Test runs comparison of Sat 5.2.0 and Sat 5.3.0 with applied advisory:

satellite-sync -c redhat-linux-i386-9

Synced to the stage for a base channel. Times captured are 'whole satellite-sync run'.

s390x:
RHEL4-U8, 5.3:
1 hours, 24 minutes, 23 seconds
RHEL5-Server-U4, 5.3:
1 hours, 49 minutes, 56 seconds
RHEL4-U8, 5.2:
1 hours, 18 minutes, 6 seconds

i386:
RHEL4-U8, 5.3:
1 hours, 5 minutes, 42 seconds
RHEL5-Server-U4, 5.3:
0 hours, 48 minutes, 4 seconds
RHEL4-U8, 5.2:
1 hours, 5 minutes, 26 seconds
RHEL5-Server-U4, 5.2:
0 hours, 51 minutes, 46 seconds

x86_64:
RHEL4-U8, 5.3:
0 hours, 38 minutes, 31 seconds
RHEL5-Server-U4, 5.3:
0 hours, 51 minutes, 3 seconds
RHEL5-Server-U4, 5.2:
0 hours, 50 minutes, 21 seconds

Comment 15 Pavel Kralik 2010-02-05 14:07:29 UTC
Test runs comparison of Sat 5.2.0 and Sat 5.3.0 with applied advisory:

satellite-sync -c rhel-i386-server-5.0.z

Synced from the mounted stage export (nfs://dump-new.rhndev.redhat.com/vol/exportstageprod) for a base channel. Times captured are 'Importing *relevant* package metadata'.

s390x:
RHEL4-U8, 5.3:
0 hours, 31 minutes, 15 seconds
RHEL5-Server-U4, 5.3:
0 hours, 32 minutes, 50 seconds
RHEL4-U8, 5.2:
0 hours, 29 minutes, 17 seconds

i386:
RHEL4-U8, 5.3:
0 hours, 30 minutes, 5 seconds
RHEL5-Server-U4, 5.3:
0 hours, 23 minutes, 52 seconds
RHEL4-U8, 5.2:
0 hours, 31 minutes, 42 seconds
RHEL5-Server-U4, 5.2:
0 hours, 22 minutes, 31 seconds

x86_64:
RHEL5-Server-U4, 5.3:
0 hours, 37 minutes, 6 seconds
RHEL5-Server-U4, 5.2:
0 hours, 34 minutes, 6 seconds

Comment 19 errata-xmlrpc 2010-02-16 08:39:19 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0105.html


Note You need to log in before you can comment on or make changes to this bug.