Description of problem: Testing procedure bellow Originally [root@overcloud-controller-0 ~(keystone_admin)]# neutron l3-agent-list-hosting-router RouterDSA +--------------------------------------+------------------------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +--------------------------------------+------------------------+----------------+-------+----------+ | 744a319a-4ab9-4250-90e1-2a0fc4e7208c | overcloud-controller-1 | True | :-) | active | | abcdc2b9-2dbb-43d7-b14d-9552723e990c | overcloud-controller-0 | True | :-) | standby | | 39d660d8-f83f-434b-98d5-e8d45c8a8f54 | overcloud-controller-2 | True | :-) | standby | +--------------------------------------+------------------------+----------------+-------+----------+ Restart controller_1 +--------------------------------------+------------------------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +--------------------------------------+------------------------+----------------+-------+----------+ | 744a319a-4ab9-4250-90e1-2a0fc4e7208c | overcloud-controller-1 | True | xxx | standby | | abcdc2b9-2dbb-43d7-b14d-9552723e990c | overcloud-controller-0 | True | :-) | active | | 39d660d8-f83f-434b-98d5-e8d45c8a8f54 | overcloud-controller-2 | True | :-) | standby | +--------------------------------------+------------------------+----------------+-------+----------+ Restart controller_0 [root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router RouterDSA Broadcast message from systemd-journald (Fri 2016-07-08 11:43:25 UTC): haproxy[3753]: proxy ceilometer has no server available! Broadcast message from systemd-journald (Fri 2016-07-08 11:43:26 UTC): haproxy[3753]: proxy nova_ec2 has no server available! ^C [root@overcloud-controller-0 ~]# . keystonerc_admin [root@overcloud-controller-0 ~(keystone_admin)]# neutron l3-agent-list-hosting-router RouterDSA +--------------------------------------+------------------------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +--------------------------------------+------------------------+----------------+-------+----------+ | 744a319a-4ab9-4250-90e1-2a0fc4e7208c | overcloud-controller-1 | True | xxx | standby | | abcdc2b9-2dbb-43d7-b14d-9552723e990c | overcloud-controller-0 | True | xxx | standby | | 39d660d8-f83f-434b-98d5-e8d45c8a8f54 | overcloud-controller-2 | True | :-) | standby | +--------------------------------------+------------------------+----------------+-------+----------+ [root@overcloud-controller-0 ~(keystone_admin)]# neutron l3-agent-list-hosting-router RouterDSA Unable to establish connection to http://10.0.0.4:5000/v2.0/tokens Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Install via tripleo quickstart HA Controller(3 Nodes ) + 2*(Compute Nodes) 2. As admin create RouterDSA 3. Reproduce test above. Actual results: HA Cluster goes away. Expected results: Bouncing controller's nodes just get them back in sync in alive state Additional info: Number of bouncing Controller's nodes might 2 or 3, but finally keystone authorization via 10.0.0.4:5000/v2 crashes.
Workaround : If just one controller_(X) was stopped and started, then creating new fake Router(X) will help out [root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router Router01 +---------------------------------+---------------------------------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +---------------------------------+---------------------------------+----------------+-------+----------+ | 1fd8b44b-265f- | overcloud- | True | xxx | standby | | 4e05-a4e3-cf8eb26027bd | controller-1.localdomain | | | | | 2b027242-c6e1-4122-9e01-01fcd3b | overcloud-controller-0 | True | :-) | active | | 30e3f | | | | | | 377c6968-05ee-457c- | overcloud-controller-2 | True | :-) | standby | | acb3-f910a2ce3df5 | | | | | +---------------------------------+---------------------------------+----------------+-------+----------+ Creating Router2 will result [root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router Router02 +---------------------------------+---------------------------------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +---------------------------------+---------------------------------+----------------+-------+----------+ | 377c6968-05ee-457c- | overcloud-controller-2 | True | :-) | standby | | acb3-f910a2ce3df5 | | | | | | 2b027242-c6e1-4122-9e01-01fcd3b | overcloud-controller-0 | True | :-) | active | | 30e3f | | | | | | 1fd8b44b-265f- | overcloud- | True | :-) | standby | | 4e05-a4e3-cf8eb26027bd | controller-1.localdomain | | | | +---------------------------------+---------------------------------+----------------+-------+----------+ [root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router Router01 +---------------------------------+---------------------------------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +---------------------------------+---------------------------------+----------------+-------+----------+ | 1fd8b44b-265f- | overcloud- | True | :-) | standby | | 4e05-a4e3-cf8eb26027bd | controller-1.localdomain | | | | | 2b027242-c6e1-4122-9e01-01fcd3b | overcloud-controller-0 | True | :-) | active | | 30e3f | | | | | | 377c6968-05ee-457c- | overcloud-controller-2 | True | :-) | standby | | acb3-f910a2ce3df5 | | | | | +---------------------------------+---------------------------------+----------------+-------+----------+
Works only routers created under admin account. Creating new neutron router for ordinary tenant allows only to switch to newly created neutron router, what obviously breaks VM been attached to private network served as interface for active neutron router before `nova stop/start overcloud-controller-(X)`. If recovery of crashed overcloud-controller-(X) is possible via running special heat template ,please, advise.
Attempted fo follow http://docs.openstack.org/developer/tripleo-docs/post_deployment/replace_controller.html It is not quite clear how update overcloud-deploy.sh to replace failed controller node.
Upstream requested clarification on lp bug but none given so closing as stale.