Bug 1353915 - [RFE] Add to scripts generated by undercloud install - script to replace failed TripleO QuickStart HA Controller
Summary: [RFE] Add to scripts generated by undercloud install - script to replace fai...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: RDO
Classification: Community
Component: openstack-tripleo
Version: trunk
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: trunk
Assignee: James Slagle
QA Contact: Shai Revivo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-08 12:03 UTC by Boris Derzhavets
Modified: 2017-06-18 12:06 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-18 12:06:31 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Launchpad 1604046 None None None 2016-07-18 16:56:58 UTC

Description Boris Derzhavets 2016-07-08 12:03:27 UTC
Description of problem:

Testing procedure bellow

Originally

[root@overcloud-controller-0 ~(keystone_admin)]# neutron l3-agent-list-hosting-router RouterDSA
+--------------------------------------+------------------------+----------------+-------+----------+
| id                                   | host                   | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------+----------------+-------+----------+
| 744a319a-4ab9-4250-90e1-2a0fc4e7208c | overcloud-controller-1 | True           | :-)   | active   |
| abcdc2b9-2dbb-43d7-b14d-9552723e990c | overcloud-controller-0 | True           | :-)   | standby  |
| 39d660d8-f83f-434b-98d5-e8d45c8a8f54 | overcloud-controller-2 | True           | :-)   | standby  |
+--------------------------------------+------------------------+----------------+-------+----------+

Restart controller_1

+--------------------------------------+------------------------+----------------+-------+----------+
| id                                   | host                   | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------+----------------+-------+----------+
| 744a319a-4ab9-4250-90e1-2a0fc4e7208c | overcloud-controller-1 | True           | xxx   | standby  |
| abcdc2b9-2dbb-43d7-b14d-9552723e990c | overcloud-controller-0 | True           | :-)   | active   |
| 39d660d8-f83f-434b-98d5-e8d45c8a8f54 | overcloud-controller-2 | True           | :-)   | standby  |
+--------------------------------------+------------------------+----------------+-------+----------+

Restart controller_0

[root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router RouterDSA
Broadcast message from systemd-journald@overcloud-controller-0.localdomain (Fri 2016-07-08 11:43:25 UTC):

haproxy[3753]: proxy ceilometer has no server available!


Broadcast message from systemd-journald@overcloud-controller-0.localdomain (Fri 2016-07-08 11:43:26 UTC):

haproxy[3753]: proxy nova_ec2 has no server available!

^C
[root@overcloud-controller-0 ~]# .  keystonerc_admin

[root@overcloud-controller-0 ~(keystone_admin)]# neutron l3-agent-list-hosting-router RouterDSA
+--------------------------------------+------------------------+----------------+-------+----------+
| id                                   | host                   | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------+----------------+-------+----------+
| 744a319a-4ab9-4250-90e1-2a0fc4e7208c | overcloud-controller-1 | True           | xxx   | standby  |
| abcdc2b9-2dbb-43d7-b14d-9552723e990c | overcloud-controller-0 | True           | xxx   | standby  |
| 39d660d8-f83f-434b-98d5-e8d45c8a8f54 | overcloud-controller-2 | True           | :-)   | standby  |
+--------------------------------------+------------------------+----------------+-------+----------+

[root@overcloud-controller-0 ~(keystone_admin)]# neutron l3-agent-list-hosting-router RouterDSA
Unable to establish connection to http://10.0.0.4:5000/v2.0/tokens


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Install via tripleo quickstart HA Controller(3 Nodes ) + 2*(Compute Nodes)
2. As admin create RouterDSA 
3. Reproduce test above.

Actual results:

HA Cluster goes away.

Expected results:

Bouncing controller's nodes just get them back in sync in alive state

Additional info:

Number of bouncing Controller's nodes might 2 or 3,
but finally keystone authorization via 10.0.0.4:5000/v2
crashes.

Comment 1 Boris Derzhavets 2016-07-10 13:43:46 UTC
Workaround :
If just one controller_(X) was stopped and started, then creating new fake
Router(X) will help out

[root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router Router01
+---------------------------------+---------------------------------+----------------+-------+----------+
| id                              | host                            | admin_state_up | alive | ha_state |
+---------------------------------+---------------------------------+----------------+-------+----------+
| 1fd8b44b-265f-                  | overcloud-                      | True           | xxx   | standby  |
| 4e05-a4e3-cf8eb26027bd          | controller-1.localdomain        |                |       |          |
| 2b027242-c6e1-4122-9e01-01fcd3b | overcloud-controller-0          | True           | :-)   | active   |
| 30e3f                           |                                 |                |       |          |
| 377c6968-05ee-457c-             | overcloud-controller-2          | True           | :-)   | standby  |
| acb3-f910a2ce3df5               |                                 |                |       |          |
+---------------------------------+---------------------------------+----------------+-------+----------+


Creating Router2 will result 

[root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router Router02
+---------------------------------+---------------------------------+----------------+-------+----------+
| id                              | host                            | admin_state_up | alive | ha_state |
+---------------------------------+---------------------------------+----------------+-------+----------+
| 377c6968-05ee-457c-             | overcloud-controller-2          | True           | :-)   | standby  |
| acb3-f910a2ce3df5               |                                 |                |       |          |
| 2b027242-c6e1-4122-9e01-01fcd3b | overcloud-controller-0          | True           | :-)   | active   |
| 30e3f                           |                                 |                |       |          |
| 1fd8b44b-265f-                  | overcloud-                      | True           | :-)   | standby  |
| 4e05-a4e3-cf8eb26027bd          | controller-1.localdomain        |                |       |          |
+---------------------------------+---------------------------------+----------------+-------+----------+
[root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router Router01
+---------------------------------+---------------------------------+----------------+-------+----------+
| id                              | host                            | admin_state_up | alive | ha_state |
+---------------------------------+---------------------------------+----------------+-------+----------+
| 1fd8b44b-265f-                  | overcloud-                      | True           | :-)   | standby  |
| 4e05-a4e3-cf8eb26027bd          | controller-1.localdomain        |                |       |          |
| 2b027242-c6e1-4122-9e01-01fcd3b | overcloud-controller-0          | True           | :-)   | active   |
| 30e3f                           |                                 |                |       |          |
| 377c6968-05ee-457c-             | overcloud-controller-2          | True           | :-)   | standby  |
| acb3-f910a2ce3df5               |                                 |                |       |          |
+---------------------------------+---------------------------------+----------------+-------+----------+

Comment 2 Boris Derzhavets 2016-07-15 20:18:19 UTC
Works only routers created under admin account. Creating new neutron router
for ordinary tenant allows only to switch to newly created neutron router, what obviously breaks VM been attached to private network served as interface for active neutron router before `nova stop/start overcloud-controller-(X)`.
If recovery of crashed overcloud-controller-(X) is possible via running
special heat template ,please, advise.

Comment 3 Boris Derzhavets 2016-07-16 18:47:35 UTC
Attempted fo follow http://docs.openstack.org/developer/tripleo-docs/post_deployment/replace_controller.html
It is not quite clear how update overcloud-deploy.sh to replace failed
controller node.

Comment 4 Christopher Brown 2017-06-18 12:06:31 UTC
Upstream requested clarification on lp bug but none given so closing as stale.


Note You need to log in before you can comment on or make changes to this bug.