Created attachment 1174703 [details] journal entries from failed galera node Description of problem: Following the documented procedure to replace a controller node in an HA setup fails if predictable hostnames and ips are in use and the same hostname and ip are reused for the replacement node. Version-Release number of selected component (if applicable): How reproducible: Both times I've attempted to do this it has failed in basically the same way. Steps to Reproduce: 1. Deploy HA overcloud with predictable hostnames and ips 2. Attempt to replace one of the controllers with another controller using the same hostname and ips. 3. Actual results: Galera does not start properly on the new node. Mariadb seems to be in a restart loop and the journal shows messages from Galera like: ERROR: MySQL is not running Expected results: Successful controller node replacement. Additional info:
Created attachment 1174704 [details] mariadb log showing restart loop
Moving to Pidone since it affects HA/Galera.
Could it be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1326507 ? Before https://bugzilla.redhat.com/show_bug.cgi?id=1338623 was fixed, the old controller node replacement procedure would fail due to an attribute never showing in pacemaker, and the resource agent would keep on retrying to recover the last sequence number. Any chance you retry the latest replacement procedure with up to date packages to confirm this is already fixed?
Following up to see if there are any updates and if someone could verify the above comment of testing this with latest packages?
I have gone through https://bugzilla.redhat.com/show_bug.cgi?id=1326507 ? and this bug could be a dup, but someone from QA needs to verify if assigning the same hostname and IP as previous controller node would work or not? Replacing a controller node is not a issue, replacing with same hostname and IP is. Can QA test this w/ latest packages and see if that works? - Angela
I finally was able to complete a node replacement with the same hostname and ips, so it appears this is indeed fixed. I did find one additional issue with the documented procedure for OSP 8 and 9, but it's unrelated to the same hostname and ip procedure so I've opened a new bug for it: https://bugzilla.redhat.com/show_bug.cgi?id=1446307