Document URL: https://access.redhat.com/documentation/en/red-hat-cloudforms/4.2/single/configuring-high-availability/ Section Number and Name: 5.1 Reintroducing the failed node Describe the issue: 1. It would be better to have $ cursor for postgres user. normally # means root. ex) #su - postgres $pg_rewind -D ... 2. "pg_rewind" will return an error for time is not synched between master and standby node, so NTP sync step is needed before executing pg_rewind command for all related db nodes(other appliance nodes either) edit /etc/ntp.conf with valid ntp server info #systemctl disable chronyd.service #systemctl stop chronyd.service #systemctl enable ntpd.service #systemctl start ntpd.service 3. Step 4. "copy over /var/lib/pgsql/.pgpass", it also needs to be owned by postgres user and the file permission must be 600. #chown postgres:postgres /var/lib/pgsql/.pgpass #chmod 600 /var/lib/pgsql/.pgpass 4. Step 5. NOTE " If the follow command times out and ... to re-add the node" => However this operation will be failing because new master server is having same a record(node-id) already for the failed node(previous master node). => Need to have a procedure to clean up previous node id or forcely add failed node with the same node id. Error msg will be like ######################## "Configuring Replication Standby Server... [2017-02-07 23:51:41] [ERROR] Unable to create node record ERROR: duplicate key value violates unique constraint "repl_nodes_pkey" DETAIL: Key (id)=(1) already exists." Suggestions for improvement: Additional information:
Assigning to Dayle for review. Dayle - would you be able to take a look at the above and incorporate this feedback?
*** Bug 1430409 has been marked as a duplicate of this bug. ***
These changes are now live on the Customer Portal: https://access.redhat.com/documentation/en-us/red_hat_cloudforms/4.2/html-single/configuring_high_availability/ Thanks to Taeho, Nick, and Suyog for your reviews on this one!