Description of problem: We are running active standby octavia. During curl requests to the LB: From with in the master amphora we killed all haproxy services: systemctl stop system.slice ps -ef | grep hapr kill -9 PID1 PID2 PID3 Traffic failed for a few seconds and then continued to be successful. Failover occured, BUT "openstack loadbalancer amphora list" did not show that there was a failover: (overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer amphora list +--------------------------------------+--------------------------------------+-----------+--------+----------------+-------------+ | id | loadbalancer_id | status | role | lb_network_ip | ha_ip | +--------------------------------------+--------------------------------------+-----------+--------+----------------+-------------+ | 29db41ff-7fcc-4f75-90f0-93dc7df62bcc | 9c43f7ed-a929-4327-8766-9bad1d94f958 | ALLOCATED | MASTER | 192.168.199.58 | 192.168.1.8 | | f86fe48e-eec5-47d3-ad95-28095714de65 | 9c43f7ed-a929-4327-8766-9bad1d94f958 | ALLOCATED | BACKUP | 192.168.199.54 | 192.168.1.8 | +--------------------------------------+--------------------------------------+-----------+--------+----------------+-------------+ Additional to that: There was no info in none of octavia logs for the haproxy failures. Version-Release number of selected component (if applicable): 13 How reproducible: 100% Steps to Reproduce: 1. From the master amphora - systemctl stop system.slice 2. From the master amphora - ps -ef | grep haprox , and kill -9 for all the PIDs 3. execute "openstack loadbalancer amphora list" 4. See logs for info. Actual results: The failover (which indeed occured) is not seen in the amphora list command ( openstack loadbalancer failover LB command shows the changes in states and the new master) No info in logs Expected results: We should see new master We should see all info regarding the failure in logs Additional info:
Restarting the system.slice process not recovering the other haproxy services.
After debugging with NirM, we saw the following: Killing haproxy service for listener DOES initiate failover for traffic BUT amphora list table is not changed. Hence the heartbits continued to be sent to the octavia controller, this may be the reason for the lack of table change. The next step was to kill the amphora agent( prevent heartbeats sending), the table change after some time, the amphora was deleted by the house keeping BUT a new one failed to be created , and the LB moved to ERROR state. Additional incorrect behavior seen - when we killed haproxy service we saw " listener stae DOWN" in the logs. Should the listener change states when there is a backup amphora ready to take control? FYI
Additional info - New amphora allocation is not executed at all.
I cannot reproduce this. It might have been fixed in a release newer to the originally reported release. 1. Create load balaner and listener $ openstack loadbalancer create --vip-subnet-id private-subnet --name lb-1 $ openstack loadbalancer listener create --protocol HTTP --protocol-port 80 --name listener-1 lb-1 2. Check listener is ACTIVE/ONLINE $ openstack loadbalancer listener show listener-1 +-----------------------------+--------------------------------------+ | Field | Value | +-----------------------------+--------------------------------------+ | admin_state_up | True | | connection_limit | -1 | | created_at | 2019-10-01T12:45:48 | | default_pool_id | None | | default_tls_container_ref | None | | description | | | id | 28de0348-aaa7-4032-9604-0df9def243e2 | | insert_headers | None | | l7policies | | | loadbalancers | 8a4f3795-6656-4c45-b80e-aee33fd8cf0f | | name | listener-1 | | operating_status | ONLINE | | project_id | 6029ff484d3b42afaf7d3fcf9d4c1392 | | protocol | HTTP | | protocol_port | 80 | | provisioning_status | ACTIVE | | sni_container_refs | [] | | timeout_client_data | 50000 | | timeout_member_connect | 5000 | | timeout_member_data | 50000 | | timeout_tcp_inspect | 0 | | updated_at | 2019-10-01T13:06:12 | | client_ca_tls_container_ref | None | | client_authentication | NONE | | client_crl_container_ref | None | | allowed_cidrs | None | +-----------------------------+--------------------------------------+ 3. Stop haproxy systemd service in the amphora [centos@amphora-4869a150-15a2-4da7-a8c0-ed3c9c6ae9ee ~]$ sudo systemctl stop haproxy-8a4f3795-6656-4c45-b80e-aee33fd8cf0f.service 4. Octavia Health Manager detected the number of listeners was incoherent with its database, triggered amphora failover and set listener operating_status to OFFLINE. $ openstack loadbalancer listener show listener-1 +-----------------------------+--------------------------------------+ | Field | Value | +-----------------------------+--------------------------------------+ | admin_state_up | True | | connection_limit | -1 | | created_at | 2019-10-01T12:45:48 | | default_pool_id | None | | default_tls_container_ref | None | | description | | | id | 28de0348-aaa7-4032-9604-0df9def243e2 | | insert_headers | None | | l7policies | | | loadbalancers | 8a4f3795-6656-4c45-b80e-aee33fd8cf0f | | name | listener-1 | | operating_status | OFFLINE | | project_id | 6029ff484d3b42afaf7d3fcf9d4c1392 | | protocol | HTTP | | protocol_port | 80 | | provisioning_status | ACTIVE | | sni_container_refs | [] | | timeout_client_data | 50000 | | timeout_member_connect | 5000 | | timeout_member_data | 50000 | | timeout_tcp_inspect | 0 | | updated_at | 2019-10-01T13:06:28 | | client_ca_tls_container_ref | None | | client_authentication | NONE | | client_crl_container_ref | None | | allowed_cidrs | None | +-----------------------------+--------------------------------------+ WARNING octavia.controller.healthmanager.health_drivers.update_db [-] Amphora 4869a150-15a2-4da7-a8c0-ed3c9c6ae9ee health message reports 0 listeners when 1 expected INFO octavia.controller.healthmanager.health_manager [-] Stale amphora's id is: 4869a150-15a2-4da7-a8c0-ed3c9c6ae9ee INFO octavia.controller.worker.v1.controller_worker [-] Perform failover for an amphora: {'compute_id': u'659dd232-b94c-46b2-927a-fe162317f34d', 'role': 'master_or_backu p', 'id': u'4869a150-15a2-4da7-a8c0-ed3c9c6ae9ee', 'lb_network_ip': u'192.168.0.79', 'load_balancer_id': u'8a4f3795-6656-4c45-b80e-aee33fd8cf0f'} [...] INFO octavia.controller.worker.v1.controller_worker [-] Successfully completed the failover for an amphora: {'compute_id': u'659dd232-b94c-46b2-927a-fe162317f34d', 'role': 'master_or_backup', 'id': u'4869a150-15a2-4da7-a8c0-ed3c9c6ae9ee', 'lb_network_ip': u'192.168.0.79', 'load_balancer_id': u'8a4f3795-6656-4c45-b80e-aee33fd8cf0f'} INFO octavia.controller.worker.v1.controller_worker [-] Mark ACTIVE in DB for load balancer id: 8a4f3795-6656-4c45-b80e-aee33fd8cf0f INFO octavia.controller.healthmanager.health_manager [-] Attempted 1 failovers of amphora INFO octavia.controller.healthmanager.health_manager [-] Failed at 0 failovers of amphora INFO octavia.controller.healthmanager.health_manager [-] Cancelled 0 failovers of amphora INFO octavia.controller.healthmanager.health_manager [-] Successfully completed 1 failovers of amphora 5. Listener came back ONLINE and a new amphora was created. $ openstack loadbalancer listener show listener-1 +-----------------------------+--------------------------------------+ | Field | Value | +-----------------------------+--------------------------------------+ | admin_state_up | True | | connection_limit | -1 | | created_at | 2019-10-01T12:45:48 | | default_pool_id | None | | default_tls_container_ref | None | | description | | | id | 28de0348-aaa7-4032-9604-0df9def243e2 | | insert_headers | None | | l7policies | | | loadbalancers | 8a4f3795-6656-4c45-b80e-aee33fd8cf0f | | name | listener-1 | | operating_status | ONLINE | | project_id | 6029ff484d3b42afaf7d3fcf9d4c1392 | | protocol | HTTP | | protocol_port | 80 | | provisioning_status | ACTIVE | | sni_container_refs | [] | | timeout_client_data | 50000 | | timeout_member_connect | 5000 | | timeout_member_data | 50000 | | timeout_tcp_inspect | 0 | | updated_at | 2019-10-01T13:06:32 | | client_ca_tls_container_ref | None | | client_authentication | NONE | | client_crl_container_ref | None | | allowed_cidrs | None | +-----------------------------+--------------------------------------+ $ openstack loadbalancer amphora list +--------------------------------------+--------------------------------------+-----------+--------+---------------+----------+ | id | loadbalancer_id | status | role | lb_network_ip | ha_ip | +--------------------------------------+--------------------------------------+-----------+--------+---------------+----------+ | 702f2042-d670-498d-8958-268ab1f71bab | 8a4f3795-6656-4c45-b80e-aee33fd8cf0f | ALLOCATED | MASTER | 192.168.0.10 | 10.0.0.3 | | c21d5a35-d2ba-498a-bbc1-193d7fee41fc | 8a4f3795-6656-4c45-b80e-aee33fd8cf0f | ALLOCATED | BACKUP | 192.168.0.91 | 10.0.0.3 | +--------------------------------------+--------------------------------------+-----------+--------+---------------+----------+