Bug 1353915

Summary: [RFE] Add to scripts generated by undercloud install - script to replace failed TripleO QuickStart HA Controller
Product: [Community] RDO Reporter: Boris Derzhavets <bderzhavets>
Component: openstack-tripleoAssignee: James Slagle <jslagle>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Shai Revivo <srevivo>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: trunkCC: amedeo.salvati, chris.brown, jtrowbri, lars
Target Milestone: ---   
Target Release: trunk   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-18 12:06:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Boris Derzhavets 2016-07-08 12:03:27 UTC
Description of problem:

Testing procedure bellow

Originally

[root@overcloud-controller-0 ~(keystone_admin)]# neutron l3-agent-list-hosting-router RouterDSA
+--------------------------------------+------------------------+----------------+-------+----------+
| id                                   | host                   | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------+----------------+-------+----------+
| 744a319a-4ab9-4250-90e1-2a0fc4e7208c | overcloud-controller-1 | True           | :-)   | active   |
| abcdc2b9-2dbb-43d7-b14d-9552723e990c | overcloud-controller-0 | True           | :-)   | standby  |
| 39d660d8-f83f-434b-98d5-e8d45c8a8f54 | overcloud-controller-2 | True           | :-)   | standby  |
+--------------------------------------+------------------------+----------------+-------+----------+

Restart controller_1

+--------------------------------------+------------------------+----------------+-------+----------+
| id                                   | host                   | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------+----------------+-------+----------+
| 744a319a-4ab9-4250-90e1-2a0fc4e7208c | overcloud-controller-1 | True           | xxx   | standby  |
| abcdc2b9-2dbb-43d7-b14d-9552723e990c | overcloud-controller-0 | True           | :-)   | active   |
| 39d660d8-f83f-434b-98d5-e8d45c8a8f54 | overcloud-controller-2 | True           | :-)   | standby  |
+--------------------------------------+------------------------+----------------+-------+----------+

Restart controller_0

[root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router RouterDSA
Broadcast message from systemd-journald (Fri 2016-07-08 11:43:25 UTC):

haproxy[3753]: proxy ceilometer has no server available!


Broadcast message from systemd-journald (Fri 2016-07-08 11:43:26 UTC):

haproxy[3753]: proxy nova_ec2 has no server available!

^C
[root@overcloud-controller-0 ~]# .  keystonerc_admin

[root@overcloud-controller-0 ~(keystone_admin)]# neutron l3-agent-list-hosting-router RouterDSA
+--------------------------------------+------------------------+----------------+-------+----------+
| id                                   | host                   | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------+----------------+-------+----------+
| 744a319a-4ab9-4250-90e1-2a0fc4e7208c | overcloud-controller-1 | True           | xxx   | standby  |
| abcdc2b9-2dbb-43d7-b14d-9552723e990c | overcloud-controller-0 | True           | xxx   | standby  |
| 39d660d8-f83f-434b-98d5-e8d45c8a8f54 | overcloud-controller-2 | True           | :-)   | standby  |
+--------------------------------------+------------------------+----------------+-------+----------+

[root@overcloud-controller-0 ~(keystone_admin)]# neutron l3-agent-list-hosting-router RouterDSA
Unable to establish connection to http://10.0.0.4:5000/v2.0/tokens


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Install via tripleo quickstart HA Controller(3 Nodes ) + 2*(Compute Nodes)
2. As admin create RouterDSA 
3. Reproduce test above.

Actual results:

HA Cluster goes away.

Expected results:

Bouncing controller's nodes just get them back in sync in alive state

Additional info:

Number of bouncing Controller's nodes might 2 or 3,
but finally keystone authorization via 10.0.0.4:5000/v2
crashes.

Comment 1 Boris Derzhavets 2016-07-10 13:43:46 UTC
Workaround :
If just one controller_(X) was stopped and started, then creating new fake
Router(X) will help out

[root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router Router01
+---------------------------------+---------------------------------+----------------+-------+----------+
| id                              | host                            | admin_state_up | alive | ha_state |
+---------------------------------+---------------------------------+----------------+-------+----------+
| 1fd8b44b-265f-                  | overcloud-                      | True           | xxx   | standby  |
| 4e05-a4e3-cf8eb26027bd          | controller-1.localdomain        |                |       |          |
| 2b027242-c6e1-4122-9e01-01fcd3b | overcloud-controller-0          | True           | :-)   | active   |
| 30e3f                           |                                 |                |       |          |
| 377c6968-05ee-457c-             | overcloud-controller-2          | True           | :-)   | standby  |
| acb3-f910a2ce3df5               |                                 |                |       |          |
+---------------------------------+---------------------------------+----------------+-------+----------+


Creating Router2 will result 

[root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router Router02
+---------------------------------+---------------------------------+----------------+-------+----------+
| id                              | host                            | admin_state_up | alive | ha_state |
+---------------------------------+---------------------------------+----------------+-------+----------+
| 377c6968-05ee-457c-             | overcloud-controller-2          | True           | :-)   | standby  |
| acb3-f910a2ce3df5               |                                 |                |       |          |
| 2b027242-c6e1-4122-9e01-01fcd3b | overcloud-controller-0          | True           | :-)   | active   |
| 30e3f                           |                                 |                |       |          |
| 1fd8b44b-265f-                  | overcloud-                      | True           | :-)   | standby  |
| 4e05-a4e3-cf8eb26027bd          | controller-1.localdomain        |                |       |          |
+---------------------------------+---------------------------------+----------------+-------+----------+
[root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router Router01
+---------------------------------+---------------------------------+----------------+-------+----------+
| id                              | host                            | admin_state_up | alive | ha_state |
+---------------------------------+---------------------------------+----------------+-------+----------+
| 1fd8b44b-265f-                  | overcloud-                      | True           | :-)   | standby  |
| 4e05-a4e3-cf8eb26027bd          | controller-1.localdomain        |                |       |          |
| 2b027242-c6e1-4122-9e01-01fcd3b | overcloud-controller-0          | True           | :-)   | active   |
| 30e3f                           |                                 |                |       |          |
| 377c6968-05ee-457c-             | overcloud-controller-2          | True           | :-)   | standby  |
| acb3-f910a2ce3df5               |                                 |                |       |          |
+---------------------------------+---------------------------------+----------------+-------+----------+

Comment 2 Boris Derzhavets 2016-07-15 20:18:19 UTC
Works only routers created under admin account. Creating new neutron router
for ordinary tenant allows only to switch to newly created neutron router, what obviously breaks VM been attached to private network served as interface for active neutron router before `nova stop/start overcloud-controller-(X)`.
If recovery of crashed overcloud-controller-(X) is possible via running
special heat template ,please, advise.

Comment 3 Boris Derzhavets 2016-07-16 18:47:35 UTC
Attempted fo follow http://docs.openstack.org/developer/tripleo-docs/post_deployment/replace_controller.html
It is not quite clear how update overcloud-deploy.sh to replace failed
controller node.

Comment 4 Christopher Brown 2017-06-18 12:06:31 UTC
Upstream requested clarification on lp bug but none given so closing as stale.