Bug 1573881
| Summary: | Active-standby, Killing haproxy services in amphoras- Fail over is not seen | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alexander Stafeyev <astafeye> |
| Component: | openstack-octavia | Assignee: | Carlos Goncalves <cgoncalves> |
| Status: | CLOSED WORKSFORME | QA Contact: | Bruna Bonguardo <bbonguar> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 13.0 (Queens) | CC: | bperkins, cgoncalves, cylopez, fiezzi, ihrachys, lpeer, majopela |
| Target Milestone: | z4 | Keywords: | Triaged, ZStream |
| Target Release: | 14.0 (Rocky) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-01 13:27:56 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1698576 | ||
|
Description
Alexander Stafeyev
2018-05-02 12:37:35 UTC
Restarting the system.slice process not recovering the other haproxy services. After debugging with NirM, we saw the following: Killing haproxy service for listener DOES initiate failover for traffic BUT amphora list table is not changed. Hence the heartbits continued to be sent to the octavia controller, this may be the reason for the lack of table change. The next step was to kill the amphora agent( prevent heartbeats sending), the table change after some time, the amphora was deleted by the house keeping BUT a new one failed to be created , and the LB moved to ERROR state. Additional incorrect behavior seen - when we killed haproxy service we saw " listener stae DOWN" in the logs. Should the listener change states when there is a backup amphora ready to take control? FYI Additional info - New amphora allocation is not executed at all. I cannot reproduce this. It might have been fixed in a release newer to the originally reported release.
1. Create load balaner and listener
$ openstack loadbalancer create --vip-subnet-id private-subnet --name lb-1
$ openstack loadbalancer listener create --protocol HTTP --protocol-port 80 --name listener-1 lb-1
2. Check listener is ACTIVE/ONLINE
$ openstack loadbalancer listener show listener-1
+-----------------------------+--------------------------------------+
| Field | Value |
+-----------------------------+--------------------------------------+
| admin_state_up | True |
| connection_limit | -1 |
| created_at | 2019-10-01T12:45:48 |
| default_pool_id | None |
| default_tls_container_ref | None |
| description | |
| id | 28de0348-aaa7-4032-9604-0df9def243e2 |
| insert_headers | None |
| l7policies | |
| loadbalancers | 8a4f3795-6656-4c45-b80e-aee33fd8cf0f |
| name | listener-1 |
| operating_status | ONLINE |
| project_id | 6029ff484d3b42afaf7d3fcf9d4c1392 |
| protocol | HTTP |
| protocol_port | 80 |
| provisioning_status | ACTIVE |
| sni_container_refs | [] |
| timeout_client_data | 50000 |
| timeout_member_connect | 5000 |
| timeout_member_data | 50000 |
| timeout_tcp_inspect | 0 |
| updated_at | 2019-10-01T13:06:12 |
| client_ca_tls_container_ref | None |
| client_authentication | NONE |
| client_crl_container_ref | None |
| allowed_cidrs | None |
+-----------------------------+--------------------------------------+
3. Stop haproxy systemd service in the amphora
[centos@amphora-4869a150-15a2-4da7-a8c0-ed3c9c6ae9ee ~]$ sudo systemctl stop haproxy-8a4f3795-6656-4c45-b80e-aee33fd8cf0f.service
4. Octavia Health Manager detected the number of listeners was incoherent with its database, triggered amphora failover and set listener operating_status to OFFLINE.
$ openstack loadbalancer listener show listener-1
+-----------------------------+--------------------------------------+
| Field | Value |
+-----------------------------+--------------------------------------+
| admin_state_up | True |
| connection_limit | -1 |
| created_at | 2019-10-01T12:45:48 |
| default_pool_id | None |
| default_tls_container_ref | None |
| description | |
| id | 28de0348-aaa7-4032-9604-0df9def243e2 |
| insert_headers | None |
| l7policies | |
| loadbalancers | 8a4f3795-6656-4c45-b80e-aee33fd8cf0f |
| name | listener-1 |
| operating_status | OFFLINE |
| project_id | 6029ff484d3b42afaf7d3fcf9d4c1392 |
| protocol | HTTP |
| protocol_port | 80 |
| provisioning_status | ACTIVE |
| sni_container_refs | [] |
| timeout_client_data | 50000 |
| timeout_member_connect | 5000 |
| timeout_member_data | 50000 |
| timeout_tcp_inspect | 0 |
| updated_at | 2019-10-01T13:06:28 |
| client_ca_tls_container_ref | None |
| client_authentication | NONE |
| client_crl_container_ref | None |
| allowed_cidrs | None |
+-----------------------------+--------------------------------------+
WARNING octavia.controller.healthmanager.health_drivers.update_db [-] Amphora 4869a150-15a2-4da7-a8c0-ed3c9c6ae9ee health message reports 0 listeners when 1 expected
INFO octavia.controller.healthmanager.health_manager [-] Stale amphora's id is: 4869a150-15a2-4da7-a8c0-ed3c9c6ae9ee
INFO octavia.controller.worker.v1.controller_worker [-] Perform failover for an amphora: {'compute_id': u'659dd232-b94c-46b2-927a-fe162317f34d', 'role': 'master_or_backu
p', 'id': u'4869a150-15a2-4da7-a8c0-ed3c9c6ae9ee', 'lb_network_ip': u'192.168.0.79', 'load_balancer_id': u'8a4f3795-6656-4c45-b80e-aee33fd8cf0f'}
[...]
INFO octavia.controller.worker.v1.controller_worker [-] Successfully completed the failover for an amphora: {'compute_id': u'659dd232-b94c-46b2-927a-fe162317f34d', 'role': 'master_or_backup', 'id': u'4869a150-15a2-4da7-a8c0-ed3c9c6ae9ee', 'lb_network_ip': u'192.168.0.79', 'load_balancer_id': u'8a4f3795-6656-4c45-b80e-aee33fd8cf0f'}
INFO octavia.controller.worker.v1.controller_worker [-] Mark ACTIVE in DB for load balancer id: 8a4f3795-6656-4c45-b80e-aee33fd8cf0f
INFO octavia.controller.healthmanager.health_manager [-] Attempted 1 failovers of amphora
INFO octavia.controller.healthmanager.health_manager [-] Failed at 0 failovers of amphora
INFO octavia.controller.healthmanager.health_manager [-] Cancelled 0 failovers of amphora
INFO octavia.controller.healthmanager.health_manager [-] Successfully completed 1 failovers of amphora
5. Listener came back ONLINE and a new amphora was created.
$ openstack loadbalancer listener show listener-1
+-----------------------------+--------------------------------------+
| Field | Value |
+-----------------------------+--------------------------------------+
| admin_state_up | True |
| connection_limit | -1 |
| created_at | 2019-10-01T12:45:48 |
| default_pool_id | None |
| default_tls_container_ref | None |
| description | |
| id | 28de0348-aaa7-4032-9604-0df9def243e2 |
| insert_headers | None |
| l7policies | |
| loadbalancers | 8a4f3795-6656-4c45-b80e-aee33fd8cf0f |
| name | listener-1 |
| operating_status | ONLINE |
| project_id | 6029ff484d3b42afaf7d3fcf9d4c1392 |
| protocol | HTTP |
| protocol_port | 80 |
| provisioning_status | ACTIVE |
| sni_container_refs | [] |
| timeout_client_data | 50000 |
| timeout_member_connect | 5000 |
| timeout_member_data | 50000 |
| timeout_tcp_inspect | 0 |
| updated_at | 2019-10-01T13:06:32 |
| client_ca_tls_container_ref | None |
| client_authentication | NONE |
| client_crl_container_ref | None |
| allowed_cidrs | None |
+-----------------------------+--------------------------------------+
$ openstack loadbalancer amphora list
+--------------------------------------+--------------------------------------+-----------+--------+---------------+----------+
| id | loadbalancer_id | status | role | lb_network_ip | ha_ip |
+--------------------------------------+--------------------------------------+-----------+--------+---------------+----------+
| 702f2042-d670-498d-8958-268ab1f71bab | 8a4f3795-6656-4c45-b80e-aee33fd8cf0f | ALLOCATED | MASTER | 192.168.0.10 | 10.0.0.3 |
| c21d5a35-d2ba-498a-bbc1-193d7fee41fc | 8a4f3795-6656-4c45-b80e-aee33fd8cf0f | ALLOCATED | BACKUP | 192.168.0.91 | 10.0.0.3 |
+--------------------------------------+--------------------------------------+-----------+--------+---------------+----------+
|