Bug 1327047 - slow, long running PostgreSQL queries from Candlepin due to virt-who submitting Hypervisor/Guest mappings
Summary: slow, long running PostgreSQL queries from Candlepin due to virt-who submitti...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Candlepin
Version: 6.1.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified vote
Target Milestone: Unspecified
Assignee: Barnaby Court
QA Contact: Katello QA List
URL:
Whiteboard:
Depends On:
Blocks: 1296845
TreeView+ depends on / blocked
 
Reported: 2016-04-14 07:28 UTC by Evgeni Golov
Modified: 2019-11-14 07:47 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-25 17:48:58 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Evgeni Golov 2016-04-14 07:28:28 UTC
Description of problem:
Environment: Satellite 6.1.8 on RHEL 7.2, three virt-who instances (running on the Capsules) report Hypervisor/Guest mappings from 5 VMware vCenters (virt-who-01: 488 hypervisors / 11224 guests + 14 hypervisors / 322 guests, virt-who-02: 399 hypervisors / 8379 guests + 7 hypervisors / 147 guests, virt-who-03: 488 hypervisors / 18468 guests).
PostgreSQL is as installed by katello-installer, the only change was to set max_stack_depth to 3M after we hit https://bugzilla.redhat.com/show_bug.cgi?id=1280346

When the virt-who instances are running, we see three PostgreSQL processes on the Satellite taking one CPU core each. Stopping one virt-who will reduce the number of busy PostgreSQL processes by one.

Looking at pg_stat_activity for such a process yields the following (this particular one was running for ~3minutes):
postgres=# select * from pg_stat_activity where pid = 5345;
-[ RECORD 1 ]----+-
datid            | 16387
datname          | candlepin
pid              | 5345
usesysid         | 16386
usename          | candlepin
application_name |
client_addr      | 127.0.0.1
client_hostname  |
client_port      | 52114
backend_start    | 2016-04-13 15:08:33.113219+02
xact_start       | 2016-04-13 15:53:52.538085+02
query_start      | 2016-04-13 15:53:57.05335+02
state_change     | 2016-04-13 15:53:57.053385+02
waiting          | f
state            | active
query            | select this_.consumer_id as y0_ from cp_consumer_guests this_ inner join cp_consumer gconsumer1_ on this_.consumer_id=gconsumer1_.id inner join cp_guest_ids_checkin checkins2_ on gconsumer1_.id=checkins2_.consumer_id where gconsumer1_.owner_id=$1 and (lower(this_.guest_id)=$2 or lower(this_.guest_id)=$3 or lower(this_.guest_id)=$4 or lower(this_.guest_id)=$5 or lower(this_.guest_id)=$6 or lower(this_.guest_id)=$7 or lower(this_.guest_id)=$8 or lower(this_.guest_id)=$9 or lower(this_.guest_id)=$10 or lower(this_.guest_id)=$11 or lower(this_.guest_id)=$12 or lower(this_.guest_id)=$13 or lower(this_.guest_id)=$14 or lower(this_.guest_id)=$15 or lower(this_.guest_id)=$16 or lower(this_.guest_id)=$17 or lower(this_.guest_id)=$18 or lower(this_.guest_id)=$19 or lower(this_.guest_id)=$20 or lower(this_.guest_id)=$21 or lower(this_.guest_id)=$22 or lower(this_.guest_id)=$23 or lower(this_.guest_id)=$24 or lower(this_.guest_id)=$25 or lower(this_.guest_id)=$26 or lower(this_.guest_id)=$27 or lower(this_.guest_id)=$28

Version-Release number of selected component (if applicable):
6.1.8

How reproducible:
always

Steps to Reproduce:
1. Install Satellite 6.1.8
2. let virt-who report big hypervisor/guest mappings to it

Actual results:
for each virt-who instance we see one highly loaded PostgreSQL process, executing slow queries

Expected results:
better performance :)

Comment 1 Chris Snyder 2016-04-14 20:19:34 UTC
Virt-who checkins are inherently database intensive. How often are the virt-who instances checking in?

Comment 2 Evgeni Golov 2016-04-15 06:16:52 UTC
(In reply to Chris Snyder from comment #1)
> Virt-who checkins are inherently database intensive. How often are the
> virt-who instances checking in?

every couple of minutes - the vCenters are pretty busy (the ESXi backend is event-based, and passing --interval does not help, at least not before https://github.com/virt-who/virt-who/commit/7398610d45c00756ffad9bd810cc2479ad6784e6 is deployed).

Comment 3 Barnaby Court 2016-04-25 17:48:58 UTC
The virt-who patch referenced in Comment 2 is the correct way to deal with this issue at this point.


Note You need to log in before you can comment on or make changes to this bug.