1418080 – After failing back over to a reintroduced node $APPLIANCE_PG_SERVICE shows as failed and appliance_console info shows Local Database Server: initialized and stopped

Bug 1418080 - After failing back over to a reintroduced node $APPLIANCE_PG_SERVICE shows as failed and appliance_console info shows Local Database Server: initialized and stopped

Summary: After failing back over to a reintroduced node $APPLIANCE_PG_SERVICE shows as...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Appliance
Sub Component:
Version:	5.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	GA
Target Release:	5.10.0
Assignee:	Nick Carboni
QA Contact:	Jaroslav Henner
Docs Contact:
URL:
Whiteboard:	HA
Duplicates (1):	1426721 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-31 19:21 UTC by luke couzens
Modified:	2019-02-07 23:02 UTC (History)
CC List:	8 users (show)
Fixed In Version:	5.10.0.11
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-02-07 23:02:18 UTC
Category:	---
Cloudforms Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2019:0212	0	None	None	None	2019-02-07 23:02:27 UTC

Description luke couzens 2017-01-31 19:21:30 UTC

Description of problem:After failing back over to a reintroduced node $APPLIANCE_PG_SERVICE shows as failed and appliance_console info shows Local Database Server: initialized and stopped


Version-Release number of selected component (if applicable):5.7.1.0


How reproducible:100%


Steps to Reproduce:
1.Setup HA following [0]
2.fail over to secondary db
3.reintroduce primary as secondary node following [0]
4.fail back over to new secondary node

Actual results: failover works correctly, however appliance_console info and systemctl status show postgres as not running


Expected results:$APPLIANCE_PG_SERVICE show correctly running and appliance_console info showing correct status


Additional info:

[0]https://access.redhat.com/documentation/en/red-hat-cloudforms/4.2/single/configuring-high-availability/

I can see that postgres is running correctly via the following commnad: ps aux | grep post

Comment 2 Nick Carboni 2017-03-06 22:00:35 UTC

*** Bug 1426721 has been marked as a duplicate of this bug. ***

Comment 3 Nick Carboni 2017-03-13 12:30:49 UTC

Right now the console summary only lists the database as running if it is running under systemd.

In some cases (repmgr) the database is started using pg_ctl directly rather than systemctl. This will cause the console summary to report that the database is not running.

I propose that we use `pg_ctl -D $APPLIANCE_PG_DATA status` to determine the status of the database as that will always be correct.

Comment 4 Nick Carboni 2017-09-15 18:27:35 UTC

A configuration option for how to start the PG service after a follow was added in repmgr 3.2. To fix this issue we would have to upgrade and set the service_*_command in repmgr.conf.

ref: http://www.repmgr.org/release-notes-3.2.html

Comment 5 CFME Bot 2018-06-29 18:40:00 UTC

https://github.com/ManageIQ/manageiq-appliance_console/pull/49

Comment 6 CFME Bot 2018-06-29 20:47:28 UTC

https://github.com/ManageIQ/manageiq-appliance/pull/197

Comment 7 CFME Bot 2018-07-19 19:49:53 UTC

New commit detected on ManageIQ/manageiq-appliance_console/master:

https://github.com/ManageIQ/manageiq-appliance_console/commit/08dd210add84b0f714fcb89912aaa25ece08761a
commit 08dd210add84b0f714fcb89912aaa25ece08761a
Author:     Nick Carboni <ncarboni>
AuthorDate: Thu Jun 28 16:34:04 2018 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Thu Jun 28 16:34:04 2018 -0400

    Fix the repmgr.conf file for the new version

     - Rename changed keys in the repmgr config file
     - Set commands to use to manage postgres services
     - Add newly required data_directory parameter
     - Add new options to promote and follow commands

    The --log-to-file option ensures that all output gets to the repmgrd
    log file.

    The --upstream-node-id ensures that the correct node is chosen in
    the case that the old primary comes back online after a successful
    failover, but before a follow can be completed.

    https://bugzilla.redhat.com/show_bug.cgi?id=1418080
    https://www.pivotaltracker.com/story/show/135779733

 lib/manageiq/appliance_console/database_replication.rb | 15 +-
 spec/database_replication_spec.rb | 23 +-
 2 files changed, 30 insertions(+), 8 deletions(-)

Comment 8 CFME Bot 2018-08-01 21:08:23 UTC

https://github.com/ManageIQ/manageiq-appliance/pull/201

Comment 9 CFME Bot 2018-08-06 19:56:45 UTC

New commit detected on ManageIQ/manageiq-appliance/master:

https://github.com/ManageIQ/manageiq-appliance/commit/8aba2f893cfbe878eea55c1da0de0b98f60024ab
commit 8aba2f893cfbe878eea55c1da0de0b98f60024ab
Author:     Nick Carboni <ncarboni>
AuthorDate: Wed Aug  1 16:53:52 2018 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Wed Aug  1 16:53:52 2018 -0400

    Bump versions of the console and HA admin gem

    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1544854
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1418080
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1535345
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1586186

    https://www.pivotaltracker.com/story/show/135779733
    https://www.pivotaltracker.com/story/show/141523501
    https://www.pivotaltracker.com/story/show/121849185

 manageiq-appliance-dependencies.rb | 4 +-
 1 file changed, 2 insertions(+), 2 deletions(-)


https://github.com/ManageIQ/manageiq-appliance/commit/92f565db0341156889d1d022f65285de350fa279
commit 92f565db0341156889d1d022f65285de350fa279
Author:     Nick Carboni <ncarboni>
AuthorDate: Fri Jun 29 16:34:53 2018 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Fri Jun 29 16:34:53 2018 -0400

    Add a sudoers include file for repmgr

    This will allow the postgresql user to use systemctl to manage
    the rh-postgresql95-postgresql service when run from repmgrd

    https://www.pivotaltracker.com/story/show/141523501
    https://www.pivotaltracker.com/story/show/135779733
    https://bugzilla.redhat.com/show_bug.cgi?id=1418080

 LINK/etc/sudoers.d/repmgr | 2 +
 1 file changed, 2 insertions(+)

Comment 10 Jaroslav Henner 2018-10-22 10:37:52 UTC

On the first (primary), then failed node, after adding back and failing the second node I see this in the appliance_console:

Local Database Server:   running (primary)
CFME Version:            5.10.0.19

Comment 12 errata-xmlrpc 2019-02-07 23:02:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0212

Note You need to log in before you can comment on or make changes to this bug.