Bug 2143732 - Health monitors stuck in PENDING_CREATE
Summary: Health monitors stuck in PENDING_CREATE
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-ovn-octavia-provider
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: beta
: 17.1
Assignee: Fernando Royo
QA Contact: Omer Schwartz
URL:
Whiteboard:
Depends On: 2159684
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-17 17:10 UTC by Michał Dulko
Modified: 2023-08-16 01:13 UTC (History)
4 users (show)

Fixed In Version: python-ovn-octavia-provider-1.0.3-1.20230106061050.b1c8472.el9ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-16 01:12:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 865003 0 None MERGED Add support for HM on a fullypopulated load balancers 2022-11-24 09:53:09 UTC
Red Hat Issue Tracker OSP-20269 0 None None None 2022-11-17 17:11:54 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:13:02 UTC

Description Michał Dulko 2022-11-17 17:10:53 UTC
Description of problem:
cloud-provider-openstack tries to create such health monitors on OVN LBs using the API to create fully populated loadbalancers:
+---------------------+-----------------------------------------------------------------------------+
| Field               | Value                                                                       |
+---------------------+-----------------------------------------------------------------------------+
| project_id          | 08b7f0b053904f16a7d65496d21f3efc                                            |
| name                | monitor_8082_kube_service_kubernetes_udp-lb-etplocal-ns_udp-lb-etplocal-svc |
| admin_state_up      | True                                                                        |
| pools               | 8925d51a-5aa8-4371-9dbd-4aabff1902ae                                        |
| created_at          | 2022-11-17T17:08:43                                                         |
| provisioning_status | PENDING_CREATE                                                              |
| updated_at          | None                                                                        |
| delay               | 60                                                                          |
| expected_codes      | None                                                                        |
| max_retries         | 5                                                                           |
| http_method         | None                                                                        |
| timeout             | 30                                                                          |
| max_retries_down    | 3                                                                           |
| url_path            | None                                                                        |
| type                | UDP-CONNECT                                                                 |
| id                  | 5e5a460a-3b14-4d9f-bec3-eeb38b8f553a                                        |
| operating_status    | OFFLINE                                                                     |
| http_version        | None                                                                        |
| domain_name         | None                                                                        |
| tags                |                                                                             |
+---------------------+-----------------------------------------------------------------------------+

These become stuck in PENDING_CREATE and is not functional at all. If I delete it, the LB it was tied to will become stuck in PENDING_UPDATE.

Version-Release number of selected component (if applicable):
That's 17.0_20220908.1

How reproducible:
Always

Steps to Reproduce:
1. Create monitor as seen above.
2. See it stuck in PENDING_CREATE.
3. Delete it.
4. See LB stuck in PENDING_UPDATE.

Actual results:
Stuff stuck.

Expected results:
Stuff work.

Additional info:

Comment 15 Omer Schwartz 2023-01-24 12:33:36 UTC
QE will verify this fix whenever it enters passed_phase2 puddle.

Comment 16 Omer Schwartz 2023-05-03 10:28:35 UTC
I tried running the following on RHOS-17.1-RHEL-9-20230426.n.1 with python-ovn-octavia-provider-1.0.3-1.20230413161016:

(overcloud) [stack@undercloud-0 ~]$ curl -ks -H "x-auth-token: $MY_TOKEN" -H "Content-Type: application/json" -X POST $MY_OCTAVIA_PATH/v2.0/lbaas/loadbalancers -d'
{
    "loadbalancer":{
       "vip_subnet_id":"13c428b0-0292-4e16-aaff-145dcceab7c6",
       "provider": "ovn",
       "name":"lb1",
       "admin_state_up":true,
       "listeners":[
          {
             "name": "listener1",
             "protocol":"TCP",
             "protocol_port":"80",
             "default_pool": {
                "name":"tcp_pool",
                "protocol":"TCP",
                "lb_algorithm":"SOURCE_IP_PORT",
                "healthmonitor":{
                    "delay":"10",
                    "max_retries":"4",
                    "timeout":5,
                    "type":"TCP"
                },
                "members":[
                    {
                    "address":"10.0.0.108",
                    "protocol_port":"30776",
                    "subnet_id":"48c0bbc8-df0b-4fd3-8536-1d8768c09843"
                    }
                ]
             }
          }
       ]
    }
 }'
{"loadbalancer": {"listeners": [{"l7policies": [], "id": "846ef530-0670-4e8d-93b7-3158e4ab009a", "name": "listener1", "description": "", "provisioning_status": "PENDING_CREATE", "operating_status": "OFFLINE", "admin_state_up": true, "protocol": "TCP", "protocol_port": 80, "connection_limit": -1, "default_tls_container_ref": null, "sni_container_refs": [], "project_id": "e8e639900efd450eb50e42c60b62ff43", "default_pool_id": "a1082d46-ef5d-4489-8cf7-5ab49a3d72f5", "insert_headers": {}, "created_at": "2023-05-03T10:16:29", "updated_at": "2023-05-03T10:16:29", "timeout_client_data": 50000, "timeout_member_connect": 5000, "timeout_member_data": 50000, "timeout_tcp_inspect": 0, "tags": [], "client_ca_tls_container_ref": null, "client_authentication": "NONE", "client_crl_container_ref": null, "allowed_cidrs": null, "tls_ciphers": null, "tls_versions": null, "alpn_protocols": null, "tenant_id": "e8e639900efd450eb50e42c60b62ff43"}], "pools": [{"members": [{"id": "8603f176-af49-47e8-ad49-f64701d92767", "name": "", "operating_status": "OFFLINE", "provisioning_status": "PENDING_CREATE", "admin_state_up": true, "address": "10.0.0.108", "protocol_port": 30776, "weight": 1, "backup": false, "subnet_id": "48c0bbc8-df0b-4fd3-8536-1d8768c09843", "project_id": "e8e639900efd450eb50e42c60b62ff43", "created_at": "2023-05-03T10:16:29", "updated_at": null, "monitor_address": null, "monitor_port": null, "tags": [], "tenant_id": "e8e639900efd450eb50e42c60b62ff43"}], "healthmonitor": null, "id": "a1082d46-ef5d-4489-8cf7-5ab49a3d72f5", "name": "tcp_pool", "description": "", "provisioning_status": "PENDING_CREATE", "operating_status": "OFFLINE", "admin_state_up": true, "protocol": "TCP", "lb_algorithm": "SOURCE_IP_PORT", "session_persistence": null, "project_id": "e8e639900efd450eb50e42c60b62ff43", "listeners": [{"id": "846ef530-0670-4e8d-93b7-3158e4ab009a"}], "created_at": "2023-05-03T10:16:29", "updated_at": null, "tags": [], "tls_container_ref": null, "ca_tls_container_ref": null, "crl_container_ref": null, "tls_enabled": false, "tls_ciphers": null, "tls_versions": null, "alpn_protocols": null, "tenant_id": "e8e639900efd450eb50e42c60b62ff43"}], "id": "5da70583-0c71-47b7-9c77-c22a69211863", "name": "lb1", "description": "", "provisioning_status": "PENDING_CREATE", "operating_status": "OFFLINE", "admin_state_up": true, "project_id": "e8e639900efd450eb50e42c60b62ff43", "created_at": "2023-05-03T10:16:28", "updated_at": null, "vip_address": "10.0.0.190", "vip_port_id": "1f9ec781-24de-4d84-b1f6-c454afbf158f", "vip_subnet_id": "13c428b0-0292-4e16-aaff-145dcceab7c6", "vip_network_id": "71d5676f-5df9-419b-9896-0cb7cb4a53e8", "provider": "ovn", "flavor_id": null, "vip_qos_policy_id": null, "tags": [], "availability_zone": null, "tenant_id": "e8e639900efd450eb50e42c60b62ff43"}}


(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer status show lb1                                                                                                                    
{
    "loadbalancer": {
        "id": "5da70583-0c71-47b7-9c77-c22a69211863",
        "name": "lb1",
        "operating_status": "ONLINE",
        "provisioning_status": "ACTIVE",
        "listeners": [
            {
                "id": "846ef530-0670-4e8d-93b7-3158e4ab009a",
                "name": "listener1",
                "operating_status": "ONLINE",
                "provisioning_status": "ACTIVE",
                "pools": [
                    {
                        "id": "a1082d46-ef5d-4489-8cf7-5ab49a3d72f5",
                        "name": "tcp_pool",
                        "provisioning_status": "ACTIVE",
                        "operating_status": "ONLINE",
                        "health_monitor": {
                            "id": "41d62b96-9f9a-496b-b605-a1c10d3bf0ec",
                            "name": "",
                            "type": "TCP",
                            "provisioning_status": "ACTIVE",
                            "operating_status": "ONLINE"
                        },
                        "members": [
                            {
                                "id": "8603f176-af49-47e8-ad49-f64701d92767",
                                "name": "",
                                "operating_status": "ONLINE",
                                "provisioning_status": "ACTIVE",
                                "address": "10.0.0.108",
                                "protocol_port": 30776
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

The loadbalancer status show command shows that the member is on "operating_status": "ONLINE" & "provisioning_status": "ACTIVE", even though there is no member server with "address": "10.0.0.108" (I just provided it randomly). In my opinion it should show ERROR or any similar value when the server does not exist. Curling the LB VIP does not return any response (obviously).

But anyway, it is not related to the BZ.

Deleting HM works as expected:
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer healthmonitor delete 41d62b96-9f9a-496b-b605-a1c10d3bf0ec

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer status show lb1
{
    "loadbalancer": {
        "id": "5da70583-0c71-47b7-9c77-c22a69211863",
        "name": "lb1",
        "operating_status": "ONLINE",
        "provisioning_status": "ACTIVE",
        "listeners": [
            {
                "id": "846ef530-0670-4e8d-93b7-3158e4ab009a",
                "name": "listener1",
                "operating_status": "ONLINE",
                "provisioning_status": "ACTIVE",
                "pools": [
                    {
                        "id": "a1082d46-ef5d-4489-8cf7-5ab49a3d72f5",
                        "name": "tcp_pool",
                        "provisioning_status": "ACTIVE",
                        "operating_status": "ONLINE",
                        "members": [
                            {
                                "id": "8603f176-af49-47e8-ad49-f64701d92767",
                                "name": "",
                                "operating_status": "NO_MONITOR",
                                "provisioning_status": "ACTIVE",
                                "address": "10.0.0.108",
                                "protocol_port": 30776
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

It looks good to me and I am moving the BZ status to verified.

Comment 26 errata-xmlrpc 2023-08-16 01:12:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577


Note You need to log in before you can comment on or make changes to this bug.