Bug 1327047

Summary: slow, long running PostgreSQL queries from Candlepin due to virt-who submitting Hypervisor/Guest mappings
Product: Red Hat Satellite Reporter: Evgeni Golov <egolov>
Component: CandlepinAssignee: Barnaby Court <bcourt>
Status: CLOSED NOTABUG QA Contact: Katello QA List <katello-qa-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.1.8CC: chorn, csnyder, egolov, smeyer
Target Milestone: Unspecified   
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-25 17:48:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1296845    

Description Evgeni Golov 2016-04-14 07:28:28 UTC
Description of problem:
Environment: Satellite 6.1.8 on RHEL 7.2, three virt-who instances (running on the Capsules) report Hypervisor/Guest mappings from 5 VMware vCenters (virt-who-01: 488 hypervisors / 11224 guests + 14 hypervisors / 322 guests, virt-who-02: 399 hypervisors / 8379 guests + 7 hypervisors / 147 guests, virt-who-03: 488 hypervisors / 18468 guests).
PostgreSQL is as installed by katello-installer, the only change was to set max_stack_depth to 3M after we hit https://bugzilla.redhat.com/show_bug.cgi?id=1280346

When the virt-who instances are running, we see three PostgreSQL processes on the Satellite taking one CPU core each. Stopping one virt-who will reduce the number of busy PostgreSQL processes by one.

Looking at pg_stat_activity for such a process yields the following (this particular one was running for ~3minutes):
postgres=# select * from pg_stat_activity where pid = 5345;
-[ RECORD 1 ]----+-
datid            | 16387
datname          | candlepin
pid              | 5345
usesysid         | 16386
usename          | candlepin
application_name |
client_addr      | 127.0.0.1
client_hostname  |
client_port      | 52114
backend_start    | 2016-04-13 15:08:33.113219+02
xact_start       | 2016-04-13 15:53:52.538085+02
query_start      | 2016-04-13 15:53:57.05335+02
state_change     | 2016-04-13 15:53:57.053385+02
waiting          | f
state            | active
query            | select this_.consumer_id as y0_ from cp_consumer_guests this_ inner join cp_consumer gconsumer1_ on this_.consumer_id=gconsumer1_.id inner join cp_guest_ids_checkin checkins2_ on gconsumer1_.id=checkins2_.consumer_id where gconsumer1_.owner_id=$1 and (lower(this_.guest_id)=$2 or lower(this_.guest_id)=$3 or lower(this_.guest_id)=$4 or lower(this_.guest_id)=$5 or lower(this_.guest_id)=$6 or lower(this_.guest_id)=$7 or lower(this_.guest_id)=$8 or lower(this_.guest_id)=$9 or lower(this_.guest_id)=$10 or lower(this_.guest_id)=$11 or lower(this_.guest_id)=$12 or lower(this_.guest_id)=$13 or lower(this_.guest_id)=$14 or lower(this_.guest_id)=$15 or lower(this_.guest_id)=$16 or lower(this_.guest_id)=$17 or lower(this_.guest_id)=$18 or lower(this_.guest_id)=$19 or lower(this_.guest_id)=$20 or lower(this_.guest_id)=$21 or lower(this_.guest_id)=$22 or lower(this_.guest_id)=$23 or lower(this_.guest_id)=$24 or lower(this_.guest_id)=$25 or lower(this_.guest_id)=$26 or lower(this_.guest_id)=$27 or lower(this_.guest_id)=$28

Version-Release number of selected component (if applicable):
6.1.8

How reproducible:
always

Steps to Reproduce:
1. Install Satellite 6.1.8
2. let virt-who report big hypervisor/guest mappings to it

Actual results:
for each virt-who instance we see one highly loaded PostgreSQL process, executing slow queries

Expected results:
better performance :)

Comment 1 Chris Snyder 2016-04-14 20:19:34 UTC
Virt-who checkins are inherently database intensive. How often are the virt-who instances checking in?

Comment 2 Evgeni Golov 2016-04-15 06:16:52 UTC
(In reply to Chris Snyder from comment #1)
> Virt-who checkins are inherently database intensive. How often are the
> virt-who instances checking in?

every couple of minutes - the vCenters are pretty busy (the ESXi backend is event-based, and passing --interval does not help, at least not before https://github.com/virt-who/virt-who/commit/7398610d45c00756ffad9bd810cc2479ad6784e6 is deployed).

Comment 3 Barnaby Court 2016-04-25 17:48:58 UTC
The virt-who patch referenced in Comment 2 is the correct way to deal with this issue at this point.