Bug 1351753

Summary: Controller node replacement with same hostname and ips does not work
Product: Red Hat OpenStack Reporter: Ben Nemec <bnemec>
Component: rhosp-directorAssignee: Angus Thomas <athomas>
Status: CLOSED CURRENTRELEASE QA Contact: Omri Hochman <ohochman>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: asoni, bnemec, brault, byount, dbecker, dciabrin, dhill, emacchi, mburns, mcornea, morazi, rhel-osp-director-maint, ushkalim
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-21 11:55:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
journal entries from failed galera node
none
mariadb log showing restart loop none

Description Ben Nemec 2016-06-30 18:04:26 UTC
Created attachment 1174703 [details]
journal entries from failed galera node

Description of problem: Following the documented procedure to replace a controller node in an HA setup fails if predictable hostnames and ips are in use and the same hostname and ip are reused for the replacement node.


Version-Release number of selected component (if applicable):


How reproducible: Both times I've attempted to do this it has failed in basically the same way.


Steps to Reproduce:
1. Deploy HA overcloud with predictable hostnames and ips
2. Attempt to replace one of the controllers with another controller using the same hostname and ips.
3.

Actual results: Galera does not start properly on the new node.  Mariadb seems to be in a restart loop and the journal shows messages from Galera like:

ERROR: MySQL is not running


Expected results: Successful controller node replacement.


Additional info:

Comment 2 Ben Nemec 2016-06-30 18:05:36 UTC
Created attachment 1174704 [details]
mariadb log showing restart loop

Comment 3 Emilien Macchi 2016-12-22 19:02:13 UTC
Moving to Pidone since it affects HA/Galera.

Comment 4 Damien Ciabrini 2016-12-23 14:16:48 UTC
Could it be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1326507 ?

Before https://bugzilla.redhat.com/show_bug.cgi?id=1338623 was fixed, the old controller node replacement procedure would fail due to an attribute never showing in pacemaker, and the resource agent would keep on retrying to recover the last sequence number.

Any chance you retry the latest replacement procedure with up to date packages to confirm this is already fixed?

Comment 5 Angela Soni 2017-02-10 18:25:30 UTC
Following up to see if there are any updates and if someone could verify the above comment of testing this with latest packages?

Comment 6 Angela Soni 2017-03-08 17:53:33 UTC
I have gone through  https://bugzilla.redhat.com/show_bug.cgi?id=1326507 ?
and this bug could be a dup, but someone from QA needs to verify if assigning the same hostname and IP as previous controller node would work or not? Replacing a controller node is not a issue, replacing with same hostname and IP is. Can QA test this w/ latest packages and see if that works?

- Angela

Comment 7 Ben Nemec 2017-04-27 16:13:43 UTC
I finally was able to complete a node replacement with the same hostname and ips, so it appears this is indeed fixed.  I did find one additional issue with the documented procedure for OSP 8 and 9, but it's unrelated to the same hostname and ip procedure so I've opened a new bug for it: https://bugzilla.redhat.com/show_bug.cgi?id=1446307