Bug 1533958

Summary: Replication configuration page does not open when child database is down
Product: Red Hat CloudForms Management Engine Reporter: Saif Ali <saali>
Component: ReplicationAssignee: Yuri Rudman <yrudman>
Status: CLOSED CURRENTRELEASE QA Contact: Tasos Papaioannou <tpapaioa>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.8.0CC: cpelland, mteixeira, obarenbo, simaishi, tpapaioa, yrudman
Target Milestone: GAKeywords: TestOnly, ZStream
Target Release: 5.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 5.10.0.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1550728 1550729 (view as bug list) Environment:
Last Closed: 2019-02-11 14:04:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1550728, 1550729    

Description Saif Ali 2018-01-12 16:05:07 UTC
Description of problem:
On the top/master region of a large CFME deployment,  when you go to Region 99 -> Replication to see the list of child databases, if one of them is down this page does not open, gives "502 Proxy Error". This is what shows up in the logs:

2018-01-08 20:38:05 GMT::5a53d68f.c7b4:@:[51124]:ERROR:  could not connect to the postgresql server in replication mode: timeout expired
2018-01-08 20:38:05 GMT::5a53d68f.c7b4:@:[51124]:DETAIL:  dsn was:   fallback_application_name='/var/www/miq/vmdb/lib/workers/bin/evm_server.rb' dbname='vmdb_production' host='<ipaddr>' user='root' password='<password>' port='5432'
2018-01-08 20:38:05 GMT::5a53d68f.c7b4:@:[51124]:LOG:  apply worker [51124] at slot 4 generation 42479 crashed
2018-01-08 20:38:05 GMT::59d69ddd.d58e:@:[54670]:LOG:  worker process: pglogical apply 16386:2117093528 (PID 51124) exited with exit code 1
2018-01-08 20:38:05 GMT::5a53d6ad.c7cc:@:[51148]:LOG:  starting apply for subscription region_3_subscription
2018-01-08 20:38:06 GMT::5a53d6ad.c7cc:@:[51148]:ERROR:  data stream ended
2018-01-08 20:38:06 GMT::5a53d6ad.c7cc:@:[51148]:LOG:  apply worker [51148] at slot 3 generation 32821 crashed
2018-01-08 20:38:06 GMT::59d69ddd.d58e:@:[54670]:LOG:  worker process: pglogical apply 16386:2404866424 (PID 51148) exited with exit code 1
2018-01-08 20:38:12 GMT::5a53d6b4.c7ce:@:[51150]:LOG:  starting apply for subscription region_3_subscription
2018-01-08 20:38:13 GMT::5a53d6b5.c7d0:@:[51152]:LOG:  starting apply for subscription region_19_subscription
2018-01-08 20:38:13 GMT::5a53d6b4.c7ce:@:[51150]:ERROR:  data stream ended
2018-01-08 20:38:13 GMT::5a53d6b4.c7ce:@:[51150]:LOG:  apply worker [51150] at slot 3 generation 32822 crashed
2018-01-08 20:38:13 GMT::59d69ddd.d58e:@:[54670]:LOG:  worker process: pglogical apply 16386:2404866424 (PID 51150) exited with exit code 1

Version-Release number of selected component (if applicable):
4.5

How reproducible:


Here's the thing: the "master" appliance (Region 99), the one that concentrates all the data, is working just fine, the problem is when I purposely shut down one of the "lower" appliances (Region 19), then go to the replication screen on the "master", that's when it does not work. If I bring up the lower region appliance, then it works again.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
The replication screen on the master appliance needs to be more forgiven with unreachable databases. That's the problem.


Additional info:

Comment 3 CFME Bot 2018-01-25 21:12:23 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/f40c04332298912c3e4e93036c3725636a3d3759

commit f40c04332298912c3e4e93036c3725636a3d3759
Author:     Yuri Rudman <yrudman>
AuthorDate: Thu Jan 25 14:04:13 2018 -0500
Commit:     Yuri Rudman <yrudman>
CommitDate: Thu Jan 25 15:21:33 2018 -0500

    rescue attempt to get backlog from remote server, it will allow to manage subscription screen even if remote db is offline
    https://bugzilla.redhat.com/show_bug.cgi?id=1533958

 app/models/pglogical_subscription.rb | 3 +++
 1 file changed, 3 insertions(+)

Comment 6 Tasos Papaioannou 2018-06-27 19:05:01 UTC
Verified on 5.10.0.2.