Bug 527288 - satellite-sync taking 2-3 times longer with 5.3 as it did with 5.2
Summary: satellite-sync taking 2-3 times longer with 5.3 as it did with 5.2
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Satellite Synchronization
Version: 530
Hardware: All
OS: Linux
urgent
medium
Target Milestone: ---
Assignee: Miroslav Suchý
QA Contact: Jiri Kastner
URL:
Whiteboard:
Depends On:
Blocks: sat531-blockers 545389
TreeView+ depends on / blocked
 
Reported: 2009-10-05 18:45 UTC by Thomas Cameron
Modified: 2018-10-27 15:41 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 545389 (view as bug list)
Environment:
Last Closed: 2009-12-16 13:46:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport from the Satellite server in question (5.06 MB, application/x-bzip2)
2009-10-05 19:03 UTC, Thomas Cameron
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1683 0 normal SHIPPED_LIVE satellite-sync bug fix update 2009-12-16 13:46:43 UTC

Description Thomas Cameron 2009-10-05 18:45:27 UTC
Description of problem:
satellite-sync took 2.25 hours before upgrading from 5.2 to 5.3.  Now it takes 4.25 hours.  After satellite-sync completes, the system shows lots of activity, with java, opracle and httpd processes monopolizing the CPU.  top indicates 98-100% IOWAIT for up to two hours or more after satellite-sync is complete.  I can see the hard drive light on the server is on solid indicating heavy disk activity.

I often see the heavy disk activity for ~ 1-2 hours if I reboot the guest or even just issue "rhn-satellite restart"

Version-Release number of selected component (if applicable):
Satellite 5.3.0 i386
RHEL 5.4 i386 as a Xen guest
 - 2.0 GB memory
 - NFS mounted /var/satellite
 - everything else (/rhnsat, /opt) all on local disk
 - Xen guest is LVM backed on dom0

RHEL 5.3 i386 as dom0
 - 4.0 GB memory
 - single 3.0 GHz Pentium 4 processor
 - 500 GB IDE drive

How reproducible:
Run satellite-sync

Steps to Reproduce:
1. Upgrade from Satellite 5.2 to 5.3
2. Run satellite-sync
  
Actual results:
It takes ~ 4.24 hours for satellite-sync to complete, then another 1-2 hours of system thrashing.

Expected results:
satellite-sync should only take ~ 2.25 hours and there should be no thrashing afterwards.

Additional info:
This server is synced with RHEL 3, 4 and 5, 32- and 64-bit x86 including all child channels including RHX content.

Comment 1 Thomas Cameron 2009-10-05 19:03:31 UTC
Created attachment 363729 [details]
sosreport from the Satellite server in question

Comment 3 George Hacker 2009-10-21 15:54:29 UTC
This behavior occurs when performing a satellite-sync on a freshly installed Satellite Server. Installing the software takes the same amount of time, but importing channel content is where the major difference is seen.

For example, I've observed the following times are needed to satellite-sync the rhel-i386-server-5 base channel using a channel dump from RHEL 5.1 vintage (3239 RPMS):

RHN Satellite 5.2 - approximately 55 minutes
RHN Satellite 5.3 - approximately 11 hours 45 minutes

I have performed the imports repeatedly and have seen consistent times needed.

Also, the time difference is only seen in the last step - the importing of the RPMS metadata (the last set of hashmarks). Everything else up to that point takes approximately the same amount of time. The system consumes 100% CPU at that time with oracle and satellite-sync at top of the list.

My hardware specs: 2.8GHz P4 CPU, 2.5GB RAM, 100GB 7200rpm HD

Comment 4 Akash Chandrashekar 2009-10-22 18:38:12 UTC
Currently experiencing issues mentioned above at customer site. 
oracle at 137.6% CPU  --- while Linking packages to channels

Comment 7 Miroslav Suchý 2009-11-12 08:14:52 UTC
Committed to satellite.git as commits:
d885040c3b5535b037e0c6064dbba4eda8718174
304418b5ec472fe816089bd42623b725ac5d8708
3100e1d5792f32a10f4cf9d4bc68b9f28c181db2
62f2ece7878b1957e12f7962bb62befacb0346e8 (this one made huge improvement)
77bd35e711e6297df2c710e1015fed7db3dfa000
181c293d661ee5713bc44f9daee2b0827bfd5c62
956f6dd633974dcf3c83a81a24f91e001561baad
09b46a1d65748fecddeb940fcb4565d0e09d78d4
373ef29d6744769f571c9e024a45d58d7e1fd518
90e4ea0273f3c42e199ab04167f3feb2639e3ce8

Comment 8 Michael Mráka 2009-11-13 08:25:00 UTC
Package spacewalk-backend-0.5.28-35 was pushed to webQA. Moving ON_QA.

Comment 10 Miroslav Suchý 2009-11-20 15:57:22 UTC
Summary of what the issue was is in #0. We do not know what caused it, beside the fact, that there has been a lot of code changes in 530 development and due to lack of performance measurement it add second here, second there and make big slowdown on the end.

Summary how it was fixed:
I run profiler on satellite sync on parts (mainly on "short step") which showed biggest slowdown from 5.2.
The resulting fix is set of channges (see #7 for reference) which keeps code semantics (but commit 62f2ece7878b1957e12f7962bb62befacb0346e8 which has been only masked sleep()) and only speed up thing.
There is two types of speed up. Remove duplicate work (like in commit  09b46a1d65748fecddeb940fcb4565d0e09d78d4) is first type. Second type is to change code to call little bit faster but equivalent function. But such call are used milion times so in result we get significant gain (example is commit 373ef29d6744769f571c9e024a45d58d7e1fd518).

Comment 13 Jiri Kastner 2009-12-01 11:39:36 UTC
doing 5.2 -> 5.3 upgrade tests on same hardware

Comment 24 errata-xmlrpc 2009-12-16 13:46:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1683.html

Comment 26 Thomas Cameron 2009-12-21 22:44:06 UTC
Wow!  I just ran satellite-sync.  On my little Xen guest with 2GB memory on a 3.4GHz Pentium 4 machine, it used to take about 5 and a half hours just to sync with RHN assuming no new packages.

After the updates, it took 0 hours, 57 minutes, 17 seconds!!!

I am speechless!  It is *much* faster than even Satellite 5.2 was!

Great job, Satellite team!


Note You need to log in before you can comment on or make changes to this bug.