Bug 1414159

Summary: candlepin event listener can cause high system load when there are lots of incoming messages
Product: Red Hat Satellite Reporter: Chris Duryee <cduryee>
Component: Subscription ManagementAssignee: Eric Helms <ehelms>
Status: CLOSED DUPLICATE QA Contact: jcallaha
Severity: high Docs Contact:
Priority: high    
Version: 6.2.6CC: bbuckingham, bkearney, cduryee, inecas, jcallaha, jhutar, psuriset, sauchter
Target Milestone: UnspecifiedKeywords: Performance, PrioBumpField, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-31 19:33:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1479962    

Description Chris Duryee 2017-01-17 22:25:11 UTC
Description of problem:

During mass registrations of systems, the candlepin event listener can get flooded with updates. If there are a large number of events coming in, it will fire lots of calls to candlepin for updated pool info, which can tax the system and cause performance issues. Eventually, these get expressed as DB locks:

2017-01-17 00:11:20 UTC STATEMENT:  UPDATE "katello_pools" SET "consumed" = $1, "updated_at" = $2 WHERE "katello_pools"."id" = 5
2017-01-17 00:11:23 UTC LOG:  process 44158 acquired ShareLock on transaction 76129721 after 4032.569 ms

There are probably a number of ways to fix this, but one way would be for the listener to read 100 messages at a time, de-duplicate them, then fire events for the pools that need updating. Additionally, it may make sense to only have the listener poll every 5 seconds in order to get more data queued up that can be de-duplicated.

Version-Release number of selected component (if applicable): 6.2.6


How reproducible: often


Steps to Reproduce:
1. register 30K or more systems
2. attempt a mass registration at about one system every 2 seconds for 1 hour

Actual results: db locks, postgres slowness, and high candlepin load


Expected results: system should cope with registrations