Description of problem: satellite-sync took 2.25 hours before upgrading from 5.2 to 5.3. Now it takes 4.25 hours. After satellite-sync completes, the system shows lots of activity, with java, opracle and httpd processes monopolizing the CPU. top indicates 98-100% IOWAIT for up to two hours or more after satellite-sync is complete. I can see the hard drive light on the server is on solid indicating heavy disk activity. I often see the heavy disk activity for ~ 1-2 hours if I reboot the guest or even just issue "rhn-satellite restart" Version-Release number of selected component (if applicable): Satellite 5.3.0 i386 RHEL 5.4 i386 as a Xen guest - 2.0 GB memory - NFS mounted /var/satellite - everything else (/rhnsat, /opt) all on local disk - Xen guest is LVM backed on dom0 RHEL 5.3 i386 as dom0 - 4.0 GB memory - single 3.0 GHz Pentium 4 processor - 500 GB IDE drive How reproducible: Run satellite-sync Steps to Reproduce: 1. Upgrade from Satellite 5.2 to 5.3 2. Run satellite-sync Actual results: It takes ~ 4.24 hours for satellite-sync to complete, then another 1-2 hours of system thrashing. Expected results: satellite-sync should only take ~ 2.25 hours and there should be no thrashing afterwards. Additional info: This server is synced with RHEL 3, 4 and 5, 32- and 64-bit x86 including all child channels including RHX content.
Created attachment 363729 [details] sosreport from the Satellite server in question
This behavior occurs when performing a satellite-sync on a freshly installed Satellite Server. Installing the software takes the same amount of time, but importing channel content is where the major difference is seen. For example, I've observed the following times are needed to satellite-sync the rhel-i386-server-5 base channel using a channel dump from RHEL 5.1 vintage (3239 RPMS): RHN Satellite 5.2 - approximately 55 minutes RHN Satellite 5.3 - approximately 11 hours 45 minutes I have performed the imports repeatedly and have seen consistent times needed. Also, the time difference is only seen in the last step - the importing of the RPMS metadata (the last set of hashmarks). Everything else up to that point takes approximately the same amount of time. The system consumes 100% CPU at that time with oracle and satellite-sync at top of the list. My hardware specs: 2.8GHz P4 CPU, 2.5GB RAM, 100GB 7200rpm HD
Currently experiencing issues mentioned above at customer site. oracle at 137.6% CPU --- while Linking packages to channels
Committed to satellite.git as commits: d885040c3b5535b037e0c6064dbba4eda8718174 304418b5ec472fe816089bd42623b725ac5d8708 3100e1d5792f32a10f4cf9d4bc68b9f28c181db2 62f2ece7878b1957e12f7962bb62befacb0346e8 (this one made huge improvement) 77bd35e711e6297df2c710e1015fed7db3dfa000 181c293d661ee5713bc44f9daee2b0827bfd5c62 956f6dd633974dcf3c83a81a24f91e001561baad 09b46a1d65748fecddeb940fcb4565d0e09d78d4 373ef29d6744769f571c9e024a45d58d7e1fd518 90e4ea0273f3c42e199ab04167f3feb2639e3ce8
Package spacewalk-backend-0.5.28-35 was pushed to webQA. Moving ON_QA.
Summary of what the issue was is in #0. We do not know what caused it, beside the fact, that there has been a lot of code changes in 530 development and due to lack of performance measurement it add second here, second there and make big slowdown on the end. Summary how it was fixed: I run profiler on satellite sync on parts (mainly on "short step") which showed biggest slowdown from 5.2. The resulting fix is set of channges (see #7 for reference) which keeps code semantics (but commit 62f2ece7878b1957e12f7962bb62befacb0346e8 which has been only masked sleep()) and only speed up thing. There is two types of speed up. Remove duplicate work (like in commit 09b46a1d65748fecddeb940fcb4565d0e09d78d4) is first type. Second type is to change code to call little bit faster but equivalent function. But such call are used milion times so in result we get significant gain (example is commit 373ef29d6744769f571c9e024a45d58d7e1fd518).
doing 5.2 -> 5.3 upgrade tests on same hardware
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1683.html
Wow! I just ran satellite-sync. On my little Xen guest with 2GB memory on a 3.4GHz Pentium 4 machine, it used to take about 5 and a half hours just to sync with RHN assuming no new packages. After the updates, it took 0 hours, 57 minutes, 17 seconds!!! I am speechless! It is *much* faster than even Satellite 5.2 was! Great job, Satellite team!