Bug 1414159

Summary:	candlepin event listener can cause high system load when there are lots of incoming messages
Product:	Red Hat Satellite	Reporter:	Chris Duryee <cduryee>
Component:	Subscription Management	Assignee:	Eric Helms <ehelms>
Status:	CLOSED DUPLICATE	QA Contact:	jcallaha
Severity:	high	Docs Contact:
Priority:	high
Version:	6.2.6	CC:	bbuckingham, bkearney, cduryee, inecas, jcallaha, jhutar, psuriset, sauchter
Target Milestone:	Unspecified	Keywords:	Performance, PrioBumpField, Triaged
Target Release:	Unused
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-07-31 19:33:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1479962

Description Chris Duryee 2017-01-17 22:25:11 UTC

Description of problem:

During mass registrations of systems, the candlepin event listener can get flooded with updates. If there are a large number of events coming in, it will fire lots of calls to candlepin for updated pool info, which can tax the system and cause performance issues. Eventually, these get expressed as DB locks:

2017-01-17 00:11:20 UTC STATEMENT:  UPDATE "katello_pools" SET "consumed" = $1, "updated_at" = $2 WHERE "katello_pools"."id" = 5
2017-01-17 00:11:23 UTC LOG:  process 44158 acquired ShareLock on transaction 76129721 after 4032.569 ms

There are probably a number of ways to fix this, but one way would be for the listener to read 100 messages at a time, de-duplicate them, then fire events for the pools that need updating. Additionally, it may make sense to only have the listener poll every 5 seconds in order to get more data queued up that can be de-duplicated.

Version-Release number of selected component (if applicable): 6.2.6


How reproducible: often


Steps to Reproduce:
1. register 30K or more systems
2. attempt a mass registration at about one system every 2 seconds for 1 hour

Actual results: db locks, postgres slowness, and high candlepin load


Expected results: system should cope with registrations