Bug 495814

Summary: yum metadata generation in /var/cache/rhn can cause extreme server load
Product: Red Hat Satellite 5 Reporter: Mike McCune <mmccune>
Component: ServerAssignee: Jan Pazdziora <jpazdziora>
Status: CLOSED CURRENTRELEASE QA Contact: Jan Hutaƙ <jhutar>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 520CC: jhutar, jkastner, jpazdziora, tao, xdmoon
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 521 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 495815 495816 498118 498129 (view as bug list) Environment:
Last Closed: 2009-11-10 08:13:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 473868, 495816, 498118, 498129    

Description Mike McCune 2009-04-14 22:14:50 UTC
If the cache for a given channel needs to get regenerated in /var/cache/rhn every client request for that new metadata kicks of a process to regenerate the files.

This can cause extreme load on the Satellite server and each thread is essentially doing the same thing.

In order to reproduce this issue I wrote a simple multi-threaded python utility to spawn multiple yum requests to a RHN Satellite server.  This client spins up 10 threads each doing:

yum clean all && yum search zsh

with separate --installroot parameters to allow simultaneous execution.

After setting up 2 RHEL5 clients each with my load simulator I was 
quickly able to get my Satellite to reach a load of *40-80* with it 
eventually ceasing to be accessible.

** Steps to reproduce the yum 'metadata storm' on a 5.2 Satellite:

1) Register at least 2 RHEL5 clients to your Satellite

2) Make sure your RHEL5 channel is populated and synced

3) Check out: 
http://svn.rhndev.redhat.com/viewcvs/trunk/eng/scripts/load-testing/yum-load-test.py

4) On each RHEL5 client as root execute: 'python yum-load-test.py'

5) On your RHN Satellite server run: 'rm -rf /var/cache/rhn/'

6) wait .. This will cause each client request to start re-generation of 
the metadata for the rhel5 channel.  As these requests pile up the 
server is quickly brought to its knees.

The more clients you have the quicker it will die.

Comment 2 Xixi 2009-04-14 22:29:32 UTC
(In reply to comment #1)
In customer's case (Comment #1), they don't have to rm /var/cache/rhn/ to see the load spikes occurring, only need to have RHEL 5 clients check-in and start spawning cache regen.

Comment 3 Xixi 2009-04-14 22:49:24 UTC
bug 495814 for sat52maint
bug 495816 for sat51maint
bug 495815 for sat530-triage

Comment 10 Xixi 2009-04-29 01:18:20 UTC
(In reply to comment #3)
> bug 495814 for sat52maint
> bug 495816 for sat51maint
> bug 495815 for sat530-triage  
bug 498118 for sat 4.2.x
bug 498129 for sat50maint

Comment 23 Jan Pazdziora 2009-10-09 14:33:41 UTC
Packages rhns-5.2.0-23.el[45] built.

Comment 25 Jan Pazdziora 2009-10-16 07:38:50 UTC
Moving ON_QA as the packages were pushed to webqa with composes Satellite-5.2.1-RHEL[45]-re20091014.0.

Comment 31 Jan Pazdziora 2009-11-02 10:46:50 UTC
Please, in /etc/rhn/default/rhn_server.conf, set the use_repo_locking option to 1 and retry.

Comment 34 errata-xmlrpc 2009-11-10 08:13:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1564.html

Comment 35 Jan Pazdziora 2009-11-10 11:13:45 UTC
The errata which we've released is actually errata marking the Satellite 5.2.1 release, changing resolution to CURRENTRELEASE.