Bug 1418080 - After failing back over to a reintroduced node $APPLIANCE_PG_SERVICE shows as failed and appliance_console info shows Local Database Server: initialized and stopped
Summary: After failing back over to a reintroduced node $APPLIANCE_PG_SERVICE shows as...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Appliance
Version: 5.7.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: GA
: 5.10.0
Assignee: Nick Carboni
QA Contact: Jaroslav Henner
URL:
Whiteboard: HA
: 1426721 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-31 19:21 UTC by luke couzens
Modified: 2019-02-07 23:02 UTC (History)
8 users (show)

Fixed In Version: 5.10.0.11
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-07 23:02:18 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2019:0212 0 None None None 2019-02-07 23:02:27 UTC

Description luke couzens 2017-01-31 19:21:30 UTC
Description of problem:After failing back over to a reintroduced node $APPLIANCE_PG_SERVICE shows as failed and appliance_console info shows Local Database Server: initialized and stopped


Version-Release number of selected component (if applicable):5.7.1.0


How reproducible:100%


Steps to Reproduce:
1.Setup HA following [0]
2.fail over to secondary db
3.reintroduce primary as secondary node following [0]
4.fail back over to new secondary node

Actual results: failover works correctly, however appliance_console info and systemctl status show postgres as not running


Expected results:$APPLIANCE_PG_SERVICE show correctly running and appliance_console info showing correct status


Additional info:

[0]https://access.redhat.com/documentation/en/red-hat-cloudforms/4.2/single/configuring-high-availability/

I can see that postgres is running correctly via the following commnad: ps aux | grep post

Comment 2 Nick Carboni 2017-03-06 22:00:35 UTC
*** Bug 1426721 has been marked as a duplicate of this bug. ***

Comment 3 Nick Carboni 2017-03-13 12:30:49 UTC
Right now the console summary only lists the database as running if it is running under systemd.

In some cases (repmgr) the database is started using pg_ctl directly rather than systemctl. This will cause the console summary to report that the database is not running.

I propose that we use `pg_ctl -D $APPLIANCE_PG_DATA status` to determine the status of the database as that will always be correct.

Comment 4 Nick Carboni 2017-09-15 18:27:35 UTC
A configuration option for how to start the PG service after a follow was added in repmgr 3.2. To fix this issue we would have to upgrade and set the service_*_command in repmgr.conf.

ref: http://www.repmgr.org/release-notes-3.2.html

Comment 7 CFME Bot 2018-07-19 19:49:53 UTC
New commit detected on ManageIQ/manageiq-appliance_console/master:

https://github.com/ManageIQ/manageiq-appliance_console/commit/08dd210add84b0f714fcb89912aaa25ece08761a
commit 08dd210add84b0f714fcb89912aaa25ece08761a
Author:     Nick Carboni <ncarboni>
AuthorDate: Thu Jun 28 16:34:04 2018 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Thu Jun 28 16:34:04 2018 -0400

    Fix the repmgr.conf file for the new version

     - Rename changed keys in the repmgr config file
     - Set commands to use to manage postgres services
     - Add newly required data_directory parameter
     - Add new options to promote and follow commands

    The --log-to-file option ensures that all output gets to the repmgrd
    log file.

    The --upstream-node-id ensures that the correct node is chosen in
    the case that the old primary comes back online after a successful
    failover, but before a follow can be completed.

    https://bugzilla.redhat.com/show_bug.cgi?id=1418080
    https://www.pivotaltracker.com/story/show/135779733

 lib/manageiq/appliance_console/database_replication.rb | 15 +-
 spec/database_replication_spec.rb | 23 +-
 2 files changed, 30 insertions(+), 8 deletions(-)

Comment 9 CFME Bot 2018-08-06 19:56:45 UTC
New commit detected on ManageIQ/manageiq-appliance/master:

https://github.com/ManageIQ/manageiq-appliance/commit/8aba2f893cfbe878eea55c1da0de0b98f60024ab
commit 8aba2f893cfbe878eea55c1da0de0b98f60024ab
Author:     Nick Carboni <ncarboni>
AuthorDate: Wed Aug  1 16:53:52 2018 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Wed Aug  1 16:53:52 2018 -0400

    Bump versions of the console and HA admin gem

    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1544854
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1418080
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1535345
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1586186

    https://www.pivotaltracker.com/story/show/135779733
    https://www.pivotaltracker.com/story/show/141523501
    https://www.pivotaltracker.com/story/show/121849185

 manageiq-appliance-dependencies.rb | 4 +-
 1 file changed, 2 insertions(+), 2 deletions(-)


https://github.com/ManageIQ/manageiq-appliance/commit/92f565db0341156889d1d022f65285de350fa279
commit 92f565db0341156889d1d022f65285de350fa279
Author:     Nick Carboni <ncarboni>
AuthorDate: Fri Jun 29 16:34:53 2018 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Fri Jun 29 16:34:53 2018 -0400

    Add a sudoers include file for repmgr

    This will allow the postgresql user to use systemctl to manage
    the rh-postgresql95-postgresql service when run from repmgrd

    https://www.pivotaltracker.com/story/show/141523501
    https://www.pivotaltracker.com/story/show/135779733
    https://bugzilla.redhat.com/show_bug.cgi?id=1418080

 LINK/etc/sudoers.d/repmgr | 2 +
 1 file changed, 2 insertions(+)

Comment 10 Jaroslav Henner 2018-10-22 10:37:52 UTC
On the first (primary), then failed node, after adding back and failing the second node I see this in the appliance_console:

Local Database Server:   running (primary)
CFME Version:            5.10.0.19

Comment 12 errata-xmlrpc 2019-02-07 23:02:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0212


Note You need to log in before you can comment on or make changes to this bug.