1533958 – Replication configuration page does not open when child database is down

Bug 1533958 - Replication configuration page does not open when child database is down

Summary: Replication configuration page does not open when child database is down

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Replication
Sub Component:
Version:	5.8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	GA
Target Release:	5.10.0
Assignee:	Yuri Rudman
QA Contact:	Tasos Papaioannou
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1550728 1550729
TreeView+	depends on / blocked

Reported:	2018-01-12 16:05 UTC by Saif Ali
Modified:	2021-09-09 13:02 UTC (History)
CC List:	6 users (show)
Fixed In Version:	5.10.0.0
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1550728 1550729 (view as bug list)
Environment:
Last Closed:	2019-02-11 14:04:28 UTC
Category:	---
Cloudforms Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Saif Ali 2018-01-12 16:05:07 UTC

Description of problem:
On the top/master region of a large CFME deployment,  when you go to Region 99 -> Replication to see the list of child databases, if one of them is down this page does not open, gives "502 Proxy Error". This is what shows up in the logs:

2018-01-08 20:38:05 GMT::5a53d68f.c7b4:@:[51124]:ERROR:  could not connect to the postgresql server in replication mode: timeout expired
2018-01-08 20:38:05 GMT::5a53d68f.c7b4:@:[51124]:DETAIL:  dsn was:   fallback_application_name='/var/www/miq/vmdb/lib/workers/bin/evm_server.rb' dbname='vmdb_production' host='<ipaddr>' user='root' password='<password>' port='5432'
2018-01-08 20:38:05 GMT::5a53d68f.c7b4:@:[51124]:LOG:  apply worker [51124] at slot 4 generation 42479 crashed
2018-01-08 20:38:05 GMT::59d69ddd.d58e:@:[54670]:LOG:  worker process: pglogical apply 16386:2117093528 (PID 51124) exited with exit code 1
2018-01-08 20:38:05 GMT::5a53d6ad.c7cc:@:[51148]:LOG:  starting apply for subscription region_3_subscription
2018-01-08 20:38:06 GMT::5a53d6ad.c7cc:@:[51148]:ERROR:  data stream ended
2018-01-08 20:38:06 GMT::5a53d6ad.c7cc:@:[51148]:LOG:  apply worker [51148] at slot 3 generation 32821 crashed
2018-01-08 20:38:06 GMT::59d69ddd.d58e:@:[54670]:LOG:  worker process: pglogical apply 16386:2404866424 (PID 51148) exited with exit code 1
2018-01-08 20:38:12 GMT::5a53d6b4.c7ce:@:[51150]:LOG:  starting apply for subscription region_3_subscription
2018-01-08 20:38:13 GMT::5a53d6b5.c7d0:@:[51152]:LOG:  starting apply for subscription region_19_subscription
2018-01-08 20:38:13 GMT::5a53d6b4.c7ce:@:[51150]:ERROR:  data stream ended
2018-01-08 20:38:13 GMT::5a53d6b4.c7ce:@:[51150]:LOG:  apply worker [51150] at slot 3 generation 32822 crashed
2018-01-08 20:38:13 GMT::59d69ddd.d58e:@:[54670]:LOG:  worker process: pglogical apply 16386:2404866424 (PID 51150) exited with exit code 1

Version-Release number of selected component (if applicable):
4.5

How reproducible:


Here's the thing: the "master" appliance (Region 99), the one that concentrates all the data, is working just fine, the problem is when I purposely shut down one of the "lower" appliances (Region 19), then go to the replication screen on the "master", that's when it does not work. If I bring up the lower region appliance, then it works again.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
The replication screen on the master appliance needs to be more forgiven with unreachable databases. That's the problem.


Additional info:

Comment 2 CFME Bot 2018-01-25 20:03:45 UTC

https://github.com/ManageIQ/manageiq/pull/16889

Comment 3 CFME Bot 2018-01-25 21:12:23 UTC

New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/f40c04332298912c3e4e93036c3725636a3d3759

commit f40c04332298912c3e4e93036c3725636a3d3759
Author:     Yuri Rudman <yrudman>
AuthorDate: Thu Jan 25 14:04:13 2018 -0500
Commit:     Yuri Rudman <yrudman>
CommitDate: Thu Jan 25 15:21:33 2018 -0500

    rescue attempt to get backlog from remote server, it will allow to manage subscription screen even if remote db is offline
    https://bugzilla.redhat.com/show_bug.cgi?id=1533958

 app/models/pglogical_subscription.rb | 3 +++
 1 file changed, 3 insertions(+)

Comment 6 Tasos Papaioannou 2018-06-27 19:05:01 UTC

Verified on 5.10.0.2.

Note You need to log in before you can comment on or make changes to this bug.