Bug 1712448
Summary: | Cannot delete load balancer that is in PENDING_UPDATE with PENDING_CREATE LISTENER after running into BZ 1693808 | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Andreas Karis <akaris> |
Component: | openstack-octavia | Assignee: | Carlos Goncalves <cgoncalves> |
Status: | CLOSED ERRATA | QA Contact: | Bruna Bonguardo <bbonguar> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 13.0 (Queens) | CC: | amuller, astafeye, cgoncalves, gthiemon, ihrachys, lpeer, majopela, michjohn, philippe.cyr, scohen, slinaber, twilson |
Target Milestone: | z9 | Keywords: | Triaged, ZStream |
Target Release: | 13.0 (Queens) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-octavia-2.1.2-0.20190921025150.431d9c9.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-11-07 13:51:56 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Andreas Karis
2019-05-21 14:23:02 UTC
I rebooted my single controller, just to see if this might have an effect, but it has none: (overcloud) [stack@undercloud-1 ~]$ openstack loadbalancer listener delete listener1 openstack loadbalancer dLoad Balancer 134940e3-2efe-4f44-96b8-7682cabb700e is immutable and cannot be updated. (HTTP 409) (Request-ID: req-d2ebc2aa-4d5f-4ba3-ad11-49d6b2e4722f) e(overcloud) [stack@undercloud-1 ~]$ openstack loadbalancer delete lb1 Validation failure: Cannot delete Load Balancer 134940e3-2efe-4f44-96b8-7682cabb700e - it has children (HTTP 400) (Request-ID: req-92a6b543-5f4e-4728-ab03-7a2fed51e54a) So far, I found the following workaround, although one should obviously not mess around in the database (and this is not supported): ~~~ [root@overcloud-controller-0 ~]# docker exec -it galera-bundle-docker-0 /bin/bash ()[root@overcloud-controller-0 /]# mysql octavia Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 626 Server version: 10.1.20-MariaDB MariaDB Server Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [octavia]> MariaDB [octavia]> MariaDB [octavia]> show tables; +--------------------------+ | Tables_in_octavia | +--------------------------+ | alembic_version | | algorithm | | amphora | | amphora_build_request | | amphora_build_slots | | amphora_health | | amphora_roles | | health_monitor | | health_monitor_type | | l7policy | | l7policy_action | | l7rule | | l7rule_compare_type | | l7rule_type | | lb_topology | | listener | | listener_statistics | | load_balancer | | member | | operating_status | | pool | | protocol | | provisioning_status | | quotas | | session_persistence | | session_persistence_type | | sni | | vip | | vrrp_auth_method | | vrrp_group | +--------------------------+ 30 rows in set (0.00 sec) MariaDB [octavia]> select * from listener \G *************************** 1. row *************************** project_id: 522130f5bf8847db8b86eb05d6493cc7 id: 4a8f5b7a-b376-4dca-8381-1ac40458b873 name: listener1 description: NULL protocol: TERMINATED_HTTPS protocol_port: 443 connection_limit: -1 load_balancer_id: 134940e3-2efe-4f44-96b8-7682cabb700e tls_certificate_id: NULL default_pool_id: NULL provisioning_status: PENDING_UPDATE operating_status: ONLINE enabled: 1 peer_port: 1025 insert_headers: NULL created_at: 2019-05-21 14:02:40 updated_at: 2019-05-21 14:03:52 1 row in set (0.00 sec) MariaDB [octavia]> select * from load_balancer \G *************************** 1. row *************************** project_id: 522130f5bf8847db8b86eb05d6493cc7 id: 134940e3-2efe-4f44-96b8-7682cabb700e name: lb1 description: NULL provisioning_status: PENDING_UPDATE operating_status: ONLINE enabled: 1 topology: SINGLE server_group_id: NULL created_at: 2019-05-21 13:49:54 updated_at: 2019-05-21 14:03:52 1 row in set (0.01 sec) MariaDB [octavia]> MariaDB [octavia]> update listener set provisioning_status = 'ACTIVE' where id = '4a8f5b7a-b376-4dca-8381-1ac40458b873'; Query OK, 1 row affected (0.13 sec) Rows matched: 1 Changed: 1 Warnings: 0 MariaDB [octavia]> update loadbalancer set provisioning_status = 'ACTIVE' where id = '134940e3-2efe-4f44-96b8-7682cabb700e'; ERROR 1146 (42S02): Table 'octavia.loadbalancer' doesn't exist MariaDB [octavia]> update load_balancer set provisioning_status = 'ACTIVE' where id = '134940e3-2efe-4f44-96b8-7682cabb700e'; Query OK, 1 row affected (0.03 sec) Rows matched: 1 Changed: 1 Warnings: 0 MariaDB [octavia]> ~~~ Now, I can delete the listener and loadbalancer: ~~~ (overcloud) [stack@undercloud-1 ~]$ openstack loadbalancer list +--------------------------------------+------+----------------------------------+-------------+---------------------+----------+ | id | name | project_id | vip_address | provisioning_status | provider | +--------------------------------------+------+----------------------------------+-------------+---------------------+----------+ | 134940e3-2efe-4f44-96b8-7682cabb700e | lb1 | 522130f5bf8847db8b86eb05d6493cc7 | 10.0.0.107 | ACTIVE | octavia | +--------------------------------------+------+----------------------------------+-------------+---------------------+----------+ (overcloud) [stack@undercloud-1 ~]$ openstack loadbalancer listener delete 4a8f5b7a-b376-4dca-8381-1ac40458b873 (overcloud) [stack@undercloud-1 ~]$ openstack loadbalancer delete 134940e3-2efe-4f44-96b8-7682cabb700e (overcloud) [stack@undercloud-1 ~]$ openstack loadbalancer list (overcloud) [stack@undercloud-1 ~]$ nova list --all +--------------------------------------+--------------+----------------------------------+--------+------------+-------------+----------------------------------------------------------------------+ | ID | Name | Tenant ID | Status | Task State | Power State | Networks | +--------------------------------------+--------------+----------------------------------+--------+------------+-------------+----------------------------------------------------------------------+ | 0d7c70f5-fbac-4a0d-974e-8e1358ce93dc | cirros-test1 | 522130f5bf8847db8b86eb05d6493cc7 | ACTIVE | - | Running | private=192.168.0.21, 2000:192:168:1:f816:3eff:fe0a:b7f3, 10.0.0.111 | | 01640259-acbf-40dc-97e9-f5f432cf14c6 | rhel-test1 | 522130f5bf8847db8b86eb05d6493cc7 | ACTIVE | - | Running | private=192.168.0.9, 2000:192:168:1:f816:3eff:fe34:b706, 10.0.0.119 | +--------------------------------------+--------------+----------------------------------+--------+------------+-------------+----------------------------------------------------------------------+ (overcloud) [stack@undercloud-1 ~]$ openstack loadbalancer amphora list (overcloud) [stack@undercloud-1 ~]$ ~~~ Also, I'd like to know if I can execute this database manipulation during a remote session with the customer. I don't really see why this would cause issues, and it seems like the only valid workaround to me to get rid of the failed load balancer. Thanks, Andreas Can we get the SOS report? Yes, a path forward would be to set the provisioning status to ERROR in the database, then delete the resource and attempt to recreate it or trigger a failover. Andreas, the Common Name needs to be set on the certificate. Patch https://review.opendev.org/#/c/667200/ will validate the content of certificates at API level so it will not accept invalid ones. The bug was fixed in stable/queens upstream. Reopening: this is not about invalid certificates, but about the fact that if we end up in an invalid state due to https://bugzilla.redhat.com/show_bug.cgi?id=1693808 , listeners are in UPDATE_PENDING and can then never be deleted. I just ran into this again troubleshooting another environment with https://bugzilla.redhat.com/show_bug.cgi?id=1693808 The point is, we will end up in these situations again. Here's the entire sequence that leads to this problem, pre-1693808: Reproduce the same issue as in: https://bugzilla.redhat.com/show_bug.cgi?id=1693808 ~~~ openssl req -new -newkey rsa:4096 -x509 -sha256 -days 3650 -nodes -out server.crt -keyout server.key openssl pkcs12 -export -inkey server.key -in server.crt -passout pass: -out server.p12 openstack secret store --name='tls_secret1' -t 'application/octet-stream' -e 'base64' --payload="$(base64 < server.p12)" openstack acl user add -u ca8ac3feedc64c26bde24ef404586422 $(openstack secret list | awk '/ tls_secret1 / {print $2}') openstack loadbalancer create --name lb1 --vip-subnet-id provider1-subnet sleep 300 openstack loadbalancer listener create --protocol-port 443 --protocol TERMINATED_HTTPS --name listener1 --default-tls-container=$(openstack secret list | awk '/ tls_secret1 / {print $2}') lb1 ~~~ Error message as in 1693808: ~~~ Could not retrieve certificate: ['http://10.0.0.15:9311/v1/secrets/7558e3a1-ffc1-43a1-9506-e625be67f038'] (HTTP 400) (Request-ID: req-c6dddf4c-cab9-4950-a791-78e45641ff08) (overcloud) [stack@undercloud-1 octavia_keys]$ ~~~ Given that the above doesn't work, we create a loadbalancer listener *without* the default-tls-container as a workaround: ~~~ (overcloud) [stack@undercloud-1 octavia_keys]$ openstack loadbalancer listener create --protocol-port 443 --protocol TERMINATED_HTTPS --name listener1 lb1 +---------------------------+--------------------------------------+ | Field | Value | +---------------------------+--------------------------------------+ | admin_state_up | True | | connection_limit | -1 | | created_at | 2019-05-21T14:02:40 | | default_pool_id | None | | default_tls_container_ref | None | | description | | | id | 4a8f5b7a-b376-4dca-8381-1ac40458b873 | | insert_headers | None | | l7policies | | | loadbalancers | 134940e3-2efe-4f44-96b8-7682cabb700e | | name | listener1 | | operating_status | OFFLINE | | project_id | 522130f5bf8847db8b86eb05d6493cc7 | | protocol | TERMINATED_HTTPS | | protocol_port | 443 | | provisioning_status | PENDING_CREATE | | sni_container_refs | [] | | updated_at | None | +---------------------------+--------------------------------------+ (overcloud) [stack@undercloud-1 octavia_keys]$ openstack loadbalancer listener list +--------------------------------------+-----------------+-----------+----------------------------------+------------------+---------------+----------------+ | id | default_pool_id | name | project_id | protocol | protocol_port | admin_state_up | +--------------------------------------+-----------------+-----------+----------------------------------+------------------+---------------+----------------+ | 4a8f5b7a-b376-4dca-8381-1ac40458b873 | None | listener1 | 522130f5bf8847db8b86eb05d6493cc7 | TERMINATED_HTTPS | 443 | True | +--------------------------------------+-----------------+-----------+----------------------------------+------------------+---------------+----------------+ ~~~ We now set the default-tls-container: ~~~ (overcloud) [stack@undercloud-1 octavia_keys]$ openstack loadbalancer listener set --default-tls-container-ref $(openstack secret list | awk '/ tls_secret1 / {print $2}') listener1 Could not retrieve certificate: ['http://10.0.0.15:9311/v1/secrets/7558e3a1-ffc1-43a1-9506-e625be67f038'] (HTTP 400) (Request-ID: req-892836f3-6926-42dc-a463-78813888ac4f) (overcloud) [stack@undercloud-1 octavia_keys]$ openstack loadbalancer list openstack loadbalancer lis+--------------------------------------+------+----------------------------------+-------------+---------------------+----------+ | id | name | project_id | vip_address | provisioning_status | provider | +--------------------------------------+------+----------------------------------+-------------+---------------------+----------+ | 134940e3-2efe-4f44-96b8-7682cabb700e | lb1 | 522130f5bf8847db8b86eb05d6493cc7 | 10.0.0.107 | PENDING_UPDATE | octavia | +--------------------------------------+------+----------------------------------+-------------+---------------------+----------+ t(overcloud) [stack@undercloud-1 octavia_keys]$ openstack loadbalancer listener list +--------------------------------------+-----------------+-----------+----------------------------------+------------------+---------------+----------------+ | id | default_pool_id | name | project_id | protocol | protocol_port | admin_state_up | +--------------------------------------+-----------------+-----------+----------------------------------+------------------+---------------+----------------+ | 4a8f5b7a-b376-4dca-8381-1ac40458b873 | None | listener1 | 522130f5bf8847db8b86eb05d6493cc7 | TERMINATED_HTTPS | 443 | True | +--------------------------------------+-----------------+-----------+----------------------------------+------------------+---------------+----------------+ (overcloud) [stack@undercloud-1 octavia_keys]$ ~~~ And we are stuck in PENDING_UPDATE. These resources can now *NEVER* be deleted via CLI and are stuck there forever. The only way to fix this is to manipulate the database: https://access.redhat.com/solutions/4251821 Have you tried the patch in your environment? If not, please give it a try. If yes, please share Octavia service logs (API and worker). Carlos, the patch (either) is fine. Certificate validation - perfect, I know that's needed. And 1693808 is fixed as well. I just know that we will down the road have another bug that will lead to the same situation and/or symptoms and then users have no way of deleting loadbalancers and have to go into the database. This is the purpose of this request. The patches you mentioned aren't fixing the problem that I'm reporting. They are avoiding it from happening. Got it. Just to reinforce: Octavia resources should never get stuck in PENDING_* unless something/someone e.g. kills -9 an Octavia controller service (worker, health manager or housekeeping) or reboots a node having the lock on and working on the resource. Should one find resources in PENDING_* without external interference, that is considered a bug in Octavia and should be addressed like it was in the patch attached to this BZ. The upstream community is working "to move to TaskFlow jobboard / using the persistence and resumption capability". This is a complex feature. Its development started during Train cycle. We expect it to be available in U-cycle (experimental). Hi Carlos, I understand what you mean, and you can go ahead and close this BZ if you wish. I just wanted to make my point that once in this PENDING_UPDATE state, the only option is to go into the DB. I agree that it's difficult to fix / work on this when the system is never meant to be in this state. But I wished there was some way for octavia to "auto heal" ... So to go into ERROR instead of looping in PENDING_UPDATE forever. - Andreas Carlos, I see that your patch [1] is not relevant to the pending lbs deletion. What needs to be verified in this bug? Is there a need to manually verify this bug or [2][3] cover this case? [1] https://review.opendev.org/#/c/667200/ [2] https://review.opendev.org/#/c/667200/4/octavia/tests/functional/api/v2/test_listener.py [3] https://review.opendev.org/#/c/667200/4/octavia/tests/functional/api/v2/test_load_balancer.py Carlos, I see that your patch [1] is not relevant to the pending lbs deletion. What needs to be verified in this bug? Is there a need to manually verify this bug or [2][3] cover this case? [1] https://review.opendev.org/#/c/667200/ [2] https://review.opendev.org/#/c/667200/4/octavia/tests/functional/api/v2/test_listener.py [3] https://review.opendev.org/#/c/667200/4/octavia/tests/functional/api/v2/test_load_balancer.py The functional tests do not cover the bug that was fixed. We need to test creating a TERMINATED_HTTPS listener with an invalid certificate (e.g. Common Name value unset). The API should not accept it and return an error. See the associated story with the patch for more details. If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to -. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:3788 This comment was flagged a spam, view the edit history to see the original text if required. |