1445928 – It is impossible to identify the source process/appliance for each connection in pg_stat_activity

Bug 1445928 - It is impossible to identify the source process/appliance for each connection in pg_stat_activity

Summary: It is impossible to identify the source process/appliance for each connection...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Appliance
Sub Component:
Version:	5.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	GA
Target Release:	5.9.0
Assignee:	Joe Rafaniello
QA Contact:	Tasos Papaioannou
Docs Contact:
URL:
Whiteboard:	appliance:database
Depends On:
Blocks:	1449389 1449390 1458339
TreeView+	depends on / blocked

Reported:	2017-04-26 19:59 UTC by Joe Rafaniello
Modified:	2018-03-06 15:32 UTC (History)
CC List:	5 users (show)
Fixed In Version:	5.9.0.1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1449389 1449390 1458339 (view as bug list)
Environment:
Last Closed:	2018-03-06 15:32:31 UTC
Category:	---
Cloudforms Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Joe Rafaniello 2017-04-26 19:59:03 UTC

Description of problem:  If you have several appliances and an issue occurs in postgresql, it's impossible to track postgresql connections back to the appliance and process that created that connection.  

pg_stat_activity looks something like this:

vmdb_production=# select pid, application_name from pg_stat_activity;
 pid   | application_name
 ------+--------------------------------------------------
 13770 | /var/www/miq/vmdb/lib/workers/bin/evm_server.rb
 13950 | /var/www/miq/vmdb/lib/workers/bin/evm_server.rb
 13957 | /var/www/miq/vmdb/lib/workers/bin/evm_server.rb
 13969 | /var/www/miq/vmdb/lib/workers/bin/evm_server.rb
 13975 | /var/www/miq/vmdb/lib/workers/bin/evm_server.rb
 13984 | /var/www/miq/vmdb/lib/workers/bin/evm_server.rb


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Start several appliances
2. Run this query: select pid, application_name from pg_stat_activity;
3. Cry

Actual results:  There are hundreds of connections with very little identifiable information as to the source other than a possible ip address.


Expected results:  You can customize the application_name in postgresql so each process can have a unique application_name which will make it much easier to find the source of database connection issues.


Additional info:

Comment 2 Joe Rafaniello 2017-04-26 20:00:05 UTC

Upstream PRs:
https://github.com/ManageIQ/manageiq/pull/13856
https://github.com/ManageIQ/manageiq/issues/14904

Comment 3 Joe Rafaniello 2017-04-26 20:00:57 UTC

When we need to do database lock or connection triaging, it's really hard to tell where each connection originated from because we don't specify an application_name. It looks something like this:

vmdb_production=# select pid, application_name from pg_stat_activity;
 pid   | application_name
 ------+--------------------------------------------------
 13770 | /var/www/miq/vmdb/lib/workers/bin/evm_server.rb
 13950 | /var/www/miq/vmdb/lib/workers/bin/evm_server.rb
 13957 | /var/www/miq/vmdb/lib/workers/bin/evm_server.rb
 13969 | /var/www/miq/vmdb/lib/workers/bin/evm_server.rb
 13975 | /var/www/miq/vmdb/lib/workers/bin/evm_server.rb
 13984 | /var/www/miq/vmdb/lib/workers/bin/evm_server.rb
The much debated format of the application_name now looks like this:

MIQ <worker pid> <worker class>[<worker id>], s[<server id>], <zone name>[<zone id>]
OR:
MIQ <server pid> Server[<server id>], <zone name>[<zone id>]

Note, the pg backend pid isn't needed in application_name since the pid column in pg_stat_activity is the backend pid. Plus, each of our processes can have more than one spid.

It now looks like this:

vmdb_development=# select pid, application_name from pg_stat_activity;
 pid  | application_name
------------------+------+-------------+----------------------
 5844 | MIQ 5835 Server[1r2], default[1r1]
 5888 | MIQ 5886 Generic[1r1111], s[1r2], default[1r1]
 5891 | MIQ 5889 Generic[1r1112], s[1r2], default[1r1]
 5894 | MIQ 5892 Priority[1r1113], s[1r2], default[1r1]
 5897 | MIQ 5895 Priority[1r1114], s[1r2], default[1r1]
 5900 | MIQ 5898 Schedule[1r1115], s[1r2], default[1r1]
 5928 | MIQ 5926 EventHandler[1r1116], s[1r2], default[1r1]
 5932 | MIQ 5929 Reporting[1r1117], s[1r2], default[1r1]
 5934 | MIQ 5931 Reporting[1r1118], s[1r2], default[1r1]
 5943 | MIQ 5941 Ui[1r1120], s[1r2], default[1r1]
 5940 | MIQ 5935 Websocket[1r1119], s[1r2], default[1r1]
 5946 | MIQ 5944 WebService[1r1121], s[1r2], default[1r1]
 5964 | MIQ 5935 Websocket[1r1119], s[1r2], default[1r1]
 5965 | MIQ 5941 Ui[1r1120], s[1r2], default[1r1]
 5966 | MIQ 5944 WebService[1r1121], s[1r2], default[1r1]

Comment 4 Joe Rafaniello 2017-04-26 20:42:01 UTC

#13856 and #14904 were manually backported to:
Euwe: https://github.com/ManageIQ/manageiq/issues/14909
Darga: https://github.com/ManageIQ/manageiq/issues/14911

Comment 8 Tasos Papaioannou 2017-10-09 15:35:31 UTC

Verified on 5.9.0.1.

Note You need to log in before you can comment on or make changes to this bug.