Bug 495815

Summary: yum metadata generation in /var/cache/rhn can cause extreme server load
Product: Red Hat Satellite 5 Reporter: Xixi <xdmoon>
Component: ServerAssignee: Pradeep Kilambi <pkilambi>
Status: CLOSED CURRENTRELEASE QA Contact: Jeff Ortel <jortel>
Severity: high Docs Contact:
Priority: urgent    
Version: 520CC: bperkins, cperry, jortel, mmraka, xdmoon
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: sat530 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 495814 Environment:
Last Closed: 2009-09-10 20:35:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 485807    

Description Xixi 2009-04-14 22:30:48 UTC
Cloning for sat 5.3.0 - even though the code may not be necessary in satellite 5.3.0, QA should cover this case to make sure this is not seen in 5.3.0.  (Per mmccune and prad on IRC just now).

+++ This bug was initially created as a clone of Bug #495814 +++

If the cache for a given channel needs to get regenerated in /var/cache/rhn every client request for that new metadata kicks of a process to regenerate the files.

This can cause extreme load on the Satellite server and each thread is essentially doing the same thing.

In order to reproduce this issue I wrote a simple multi-threaded python utility to spawn multiple yum requests to a RHN Satellite server.  This client spins up 10 threads each doing:

yum clean all && yum search zsh

with separate --installroot parameters to allow simultaneous execution.

After setting up 2 RHEL5 clients each with my load simulator I was 
quickly able to get my Satellite to reach a load of *40-80* with it 
eventually ceasing to be accessible.

** Steps to reproduce the yum 'metadata storm' on a 5.2 Satellite:

1) Register at least 2 RHEL5 clients to your Satellite

2) Make sure your RHEL5 channel is populated and synced

3) Check out: 
http://svn.rhndev.redhat.com/viewcvs/trunk/eng/scripts/load-testing/yum-load-test.py

4) On each RHEL5 client as root execute: 'python yum-load-test.py'

5) On your RHN Satellite server run: 'rm -rf /var/cache/rhn/'

6) wait .. This will cause each client request to start re-generation of 
the metadata for the rhel5 channel.  As these requests pile up the 
server is quickly brought to its knees.

The more clients you have the quicker it will die.

Comment 1 Xixi 2009-04-14 22:42:46 UTC
bug 495814 for sat52maint
bug 495816 for sat51maint
bug 495815 for sat530-triage

Comment 3 Jeff Ortel 2009-06-29 21:27:50 UTC
1) Registered (2) systems and subscribed to fully sync'd RHEL 5 channel.
2) Started python script http://svn.rhndev.redhat.com/viewcvs/trunk/eng/scripts/load-testing/yum-load-test.py on both systems.
3) rm -rf /var/cache/rhn/ on satellite.
4) Waited until /var/cache/rhn/repodaata/ being regenerated.
5) Accessed the satellite WEBUI over the next 20 minutes and satellite still seems accessible.

Satellite does not seem to die.

Comment 4 Michael Mráka 2009-08-14 12:07:02 UTC
Verified in stage -> RELEASE_PENDING.

* registered 2 rhel5 clients
* started yum-load-test.py
* removed files from /var/cache/rhn/
* load didn't exceed 1.5
# sar -q 30 10
Linux 2.6.9-89.0.3.ELsmp (dell-pem710-01.rhts.eng.bos.redhat.com)       08/14/2009

07:52:25 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
07:52:55 AM        12       454      1.39      1.24      0.68
07:53:25 AM         1       454      1.37      1.24      0.70
07:53:55 AM         1       452      1.22      1.22      0.71
07:54:25 AM         0       452      1.13      1.20      0.72
07:54:55 AM         1       450      1.30      1.23      0.74
07:55:25 AM         0       451      1.49      1.28      0.78
07:55:55 AM         0       449      0.98      1.17      0.75
07:56:25 AM         0       451      0.59      1.06      0.73
07:56:55 AM         0       449      0.36      0.96      0.70
07:57:25 AM         0       451      0.28      0.88      0.69
Average:            2       451      1.01      1.15      0.72

Comment 5 Brandon Perkins 2009-09-10 20:35:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1434.html