Bug 1351753

Summary:

Controller node replacement with same hostname and ips does not work

Product:

Red Hat OpenStack

Reporter:

Ben Nemec <bnemec>

Component:

rhosp-director

Assignee:

Angus Thomas <athomas>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Omri Hochman <ohochman>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

8.0 (Liberty)

CC:

asoni, bnemec, brault, byount, dbecker, dciabrin, dhill, emacchi, mburns, mcornea, morazi, rhel-osp-director-maint, ushkalim

Target Milestone:

---

Keywords:

Triaged, ZStream

Target Release:

8.0 (Liberty)

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-06-21 11:55:49 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
journal entries from failed galera node	none
mariadb log showing restart loop	none

Description Ben Nemec 2016-06-30 18:04:26 UTC

Created attachment 1174703 [details]
journal entries from failed galera node

Description of problem: Following the documented procedure to replace a controller node in an HA setup fails if predictable hostnames and ips are in use and the same hostname and ip are reused for the replacement node.


Version-Release number of selected component (if applicable):


How reproducible: Both times I've attempted to do this it has failed in basically the same way.


Steps to Reproduce:
1. Deploy HA overcloud with predictable hostnames and ips
2. Attempt to replace one of the controllers with another controller using the same hostname and ips.
3.

Actual results: Galera does not start properly on the new node.  Mariadb seems to be in a restart loop and the journal shows messages from Galera like:

ERROR: MySQL is not running


Expected results: Successful controller node replacement.


Additional info:

Comment 2 Ben Nemec 2016-06-30 18:05:36 UTC

Created attachment 1174704 [details]
mariadb log showing restart loop

Comment 3 Emilien Macchi 2016-12-22 19:02:13 UTC

Moving to Pidone since it affects HA/Galera.

Comment 4 Damien Ciabrini 2016-12-23 14:16:48 UTC

Could it be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1326507 ?

Before https://bugzilla.redhat.com/show_bug.cgi?id=1338623 was fixed, the old controller node replacement procedure would fail due to an attribute never showing in pacemaker, and the resource agent would keep on retrying to recover the last sequence number.

Any chance you retry the latest replacement procedure with up to date packages to confirm this is already fixed?

Comment 5 Angela Soni 2017-02-10 18:25:30 UTC

Following up to see if there are any updates and if someone could verify the above comment of testing this with latest packages?

Comment 6 Angela Soni 2017-03-08 17:53:33 UTC

I have gone through  https://bugzilla.redhat.com/show_bug.cgi?id=1326507 ?
and this bug could be a dup, but someone from QA needs to verify if assigning the same hostname and IP as previous controller node would work or not? Replacing a controller node is not a issue, replacing with same hostname and IP is. Can QA test this w/ latest packages and see if that works?

- Angela

Comment 7 Ben Nemec 2017-04-27 16:13:43 UTC

I finally was able to complete a node replacement with the same hostname and ips, so it appears this is indeed fixed.  I did find one additional issue with the documented procedure for OSP 8 and 9, but it's unrelated to the same hostname and ip procedure so I've opened a new bug for it: https://bugzilla.redhat.com/show_bug.cgi?id=1446307