Bug 495814

Summary:	yum metadata generation in /var/cache/rhn can cause extreme server load
Product:	Red Hat Satellite 5	Reporter:	Mike McCune <mmccune>
Component:	Server	Assignee:	Jan Pazdziora (Red Hat) <jpazdziora>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Jan Hutař <jhutar>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	520	CC:	jhutar, jkastner, jpazdziora, tao, xdmoon
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	521	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	495815 495816 498118 498129 (view as bug list)		Environment:
Last Closed:	2009-11-10 08:13:28 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	473868, 495816, 498118, 498129

Description Mike McCune 2009-04-14 22:14:50 UTC

If the cache for a given channel needs to get regenerated in /var/cache/rhn every client request for that new metadata kicks of a process to regenerate the files.

This can cause extreme load on the Satellite server and each thread is essentially doing the same thing.

In order to reproduce this issue I wrote a simple multi-threaded python utility to spawn multiple yum requests to a RHN Satellite server.  This client spins up 10 threads each doing:

yum clean all && yum search zsh

with separate --installroot parameters to allow simultaneous execution.

After setting up 2 RHEL5 clients each with my load simulator I was 
quickly able to get my Satellite to reach a load of *40-80* with it 
eventually ceasing to be accessible.

** Steps to reproduce the yum 'metadata storm' on a 5.2 Satellite:

1) Register at least 2 RHEL5 clients to your Satellite

2) Make sure your RHEL5 channel is populated and synced

3) Check out: 
http://svn.rhndev.redhat.com/viewcvs/trunk/eng/scripts/load-testing/yum-load-test.py

4) On each RHEL5 client as root execute: 'python yum-load-test.py'

5) On your RHN Satellite server run: 'rm -rf /var/cache/rhn/'

6) wait .. This will cause each client request to start re-generation of 
the metadata for the rhel5 channel.  As these requests pile up the 
server is quickly brought to its knees.

The more clients you have the quicker it will die.

Comment 2 Xixi 2009-04-14 22:29:32 UTC

(In reply to comment #1)
In customer's case (Comment #1), they don't have to rm /var/cache/rhn/ to see the load spikes occurring, only need to have RHEL 5 clients check-in and start spawning cache regen.

Comment 3 Xixi 2009-04-14 22:49:24 UTC

bug 495814 for sat52maint
bug 495816 for sat51maint
bug 495815 for sat530-triage

Comment 10 Xixi 2009-04-29 01:18:20 UTC

(In reply to comment #3)
> bug 495814 for sat52maint
> bug 495816 for sat51maint
> bug 495815 for sat530-triage  
bug 498118 for sat 4.2.x
bug 498129 for sat50maint

Comment 23 Jan Pazdziora (Red Hat) 2009-10-09 14:33:41 UTC

Packages rhns-5.2.0-23.el[45] built.

Comment 25 Jan Pazdziora (Red Hat) 2009-10-16 07:38:50 UTC

Moving ON_QA as the packages were pushed to webqa with composes Satellite-5.2.1-RHEL[45]-re20091014.0.

Comment 31 Jan Pazdziora (Red Hat) 2009-11-02 10:46:50 UTC

Please, in /etc/rhn/default/rhn_server.conf, set the use_repo_locking option to 1 and retry.

Comment 34 errata-xmlrpc 2009-11-10 08:13:28 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1564.html

Comment 35 Jan Pazdziora (Red Hat) 2009-11-10 11:13:45 UTC

The errata which we've released is actually errata marking the Satellite 5.2.1 release, changing resolution to CURRENTRELEASE.