1874927 – Octavia LBs stuck in PENDING_UPDATE state after compute nodes reboot (Nova port detach failure)

Bug 1874927 - Octavia LBs stuck in PENDING_UPDATE state after compute nodes reboot (Nova port detach failure)

Summary: Octavia LBs stuck in PENDING_UPDATE state after compute nodes reboot (Nova po...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-octavia
Sub Component:
Version:	16.1 (Train)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z2
Target Release:	16.1 (Train on RHEL 8.2)
Assignee:	Gregory Thiemonge
QA Contact:	Omer Schwartz
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1874542 (view as bug list)
Depends On:	1723482
Blocks:
TreeView+	depends on / blocked

Reported:	2020-09-02 15:13 UTC by Gregory Thiemonge
Modified:	2020-10-28 15:39 UTC (History)
CC List:	13 users (show)
Fixed In Version:	openstack-octavia-5.0.3-0.20200725113403.68c0285.el8ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1723482
Environment:
Last Closed:	2020-10-28 15:39:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	739772	0	None	MERGED	Refactor the failover flows	2020-10-27 23:12:27 UTC
Red Hat Product Errata	RHEA-2020:4284	0	None	None	None	2020-10-28 15:39:55 UTC

Description Gregory Thiemonge 2020-09-02 15:13:00 UTC

+++ This bug was initially created as a clone of Bug #1723482 +++

Description of problem:
Octavia Load Balancers stuck in PENDING_UPDATE state after compute nodes reboot.

Version-Release number of selected component (if applicable):
[2019-06-24 11:24:52] (overcloud) [stack@undercloud-0 ~]$ cat /etc/yum.repos.d/latest-installed
14  -p 2019-06-19.2


[2019-06-24 11:30:05] (overcloud) [stack@undercloud-0 ~]$ rpm -qa | grep octavia
python2-octaviaclient-1.6.0-0.20180816134808.64d007f.el7ost.noarch
puppet-octavia-13.3.2-0.20190420064721.29482dd.el7ost.noarch
octavia-amphora-image-x86_64-14.0-20190617.1.el7ost.noarch

How reproducible:
Unclear

Steps to Reproduce:
1) Deploy tripleo + octavia
2) Create internal tenant network
3) Create 3 load balancers in internal network
4) Increase memory and vcpu number of compute nodes (virsh) - one compute node at a time. first compute-0 and then compute-1.


Actual results:
One of the 3 LBs is stating as ACTIVE and the other two are stuck at PENDING_UPDATE. Two of the amphorae are in ERROR state.


Expected results:
All Load Balancers and Amphorae are ACTIVE and ONLINE.

Steps:
[root@titan10 ~]# virsh shutdown compute-1
Domain compute-1 is being shutdown

[root@titan10 ~]# virsh edit compute-1
Domain compute-1 XML configuration edited. <------ Added more memory and more vcpus.

[root@titan10 ~]# virsh create /etc/libvirt/qemu/compute-1.xml
Domain compute-1 created from /etc/libvirt/qemu/compute-1.xml


[root@titan10 ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 4     undercloud-0                   running
 20    controller-2                   running
 22    controller-1                   running
 24    controller-0                   running
 25    compute-0                      running
 26    compute-1                      running

[root@titan10 ~]# ssh root@undercloud-0

[2019-06-24 09:34:49] (tester) [stack@undercloud-0 ~]$ openstack loadbalancer listener create --name listenerHTTP-one one --protocol HTTP --protocol-port 80
Load Balancer 789510be-eee5-4055-97c8-917e680b8e0e is immutable and cannot be updated. (HTTP 409) (Request-ID: req-19be68a2-9069-4d7d-b53f-2cbd8271475e)

[2019-06-24 09:35:25] (tester) [stack@undercloud-0 ~]$ openstack loadbalancer list
+--------------------------------------+-------+----------------------------------+-------------+---------------------+----------+
| id                                   | name  | project_id                       | vip_address | provisioning_status | provider |
+--------------------------------------+-------+----------------------------------+-------------+---------------------+----------+
| 789510be-eee5-4055-97c8-917e680b8e0e | one   | 635e7c28cd8e416cbc3225f642b8d28b | 10.0.1.13   | PENDING_UPDATE      | amphora  |
| 56804bfb-aefa-4569-a2ab-54b8fdde7542 | two   | 635e7c28cd8e416cbc3225f642b8d28b | 10.0.1.6    | ACTIVE              | amphora  |
| b94d6658-4266-4051-9674-881d280ac6ea | three | 635e7c28cd8e416cbc3225f642b8d28b | 10.0.1.10   | PENDING_UPDATE      | amphora  |
+--------------------------------------+-------+----------------------------------+-------------+---------------------+----------+

[2019-06-24 09:47:55] (overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer amphora list
+--------------------------------------+--------------------------------------+-----------+------------+---------------+-----------+
| id                                   | loadbalancer_id                      | status    | role       | lb_network_ip | ha_ip     |
+--------------------------------------+--------------------------------------+-----------+------------+---------------+-----------+
| 848ed1d0-d424-4f36-b7de-acecede5a95b | 789510be-eee5-4055-97c8-917e680b8e0e | ERROR     | STANDALONE | 172.24.0.26   | 10.0.1.13 |
| 0b6d11a7-97ed-46cb-8825-2349be8715ee | b94d6658-4266-4051-9674-881d280ac6ea | ERROR     | STANDALONE | 172.24.0.7    | 10.0.1.10 |
| 3fb72336-568f-46df-bfa3-907e90be55b5 | 56804bfb-aefa-4569-a2ab-54b8fdde7542 | ALLOCATED | STANDALONE | 172.24.0.22   | 10.0.1.6  |
+--------------------------------------+--------------------------------------+-----------+------------+---------------+-----------+

[2019-06-24 09:51:39] (overcloud) [stack@undercloud-0 ~]$ openstack server list --all
+--------------------------------------+----------------------------------------------+---------+-------------------------------------------------------------------------+----------------------------------------+---------------+
| ID                                   | Name                                         | Status  | Networks                                                                | Image                                  | Flavor        |
+--------------------------------------+----------------------------------------------+---------+-------------------------------------------------------------------------+----------------------------------------+---------------+
| bf07e809-25ce-4260-9b59-255e9f43411a | amphora-3fb72336-568f-46df-bfa3-907e90be55b5 | ACTIVE  | lb-mgmt-net=172.24.0.22; int_net_1=2001::f816:3eff:fea1:a23f, 10.0.1.16 | octavia-amphora-14.0-20190617.1.x86_64 | octavia_65    |
+--------------------------------------+----------------------------------------------+---------+-------------------------------------------------------------------------+----------------------------------------+---------------+


Am I maybe missing something? Should I have failovered the LBs before the compute node shutdown? What is the best practice for compute node manipulation with Octavia?

Thank you

Comment 3 Gregory Thiemonge 2020-09-02 15:17:54 UTC

*** Bug 1874542 has been marked as a duplicate of this bug. ***

Comment 8 Omer Schwartz 2020-09-24 15:08:17 UTC

After verification that involved these steps, I:

1) Deployed tripleo + octavia
2) Created internal tenant network
3) Created 3 load balancers in internal network
4) Increased memory and vcpu number of compute nodes (virsh) - one compute node at a time. first compute-0 and then compute-1.


A more detailed version of the steps:

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer list
+--------------------------------------+--------------------------------------+----------------------------------+---------------+---------------------+----------+
| id                                   | name                                 | project_id                       | vip_address   | provisioning_status | provider |
+--------------------------------------+--------------------------------------+----------------------------------+---------------+---------------------+----------+
| 62385ae9-3a52-41d4-87c8-4abd481e6261 | test_tree-loadbalancer-4t7iwpemjra2  | d3f6513b54b34c3bb4bf6ee58f78d030 | 192.168.1.29  | ACTIVE              | amphora  |
| 78c30ea1-c6b9-41e8-9e8d-61497c8f7734 | test_tree2-loadbalancer-atzbw64lfivs | d3f6513b54b34c3bb4bf6ee58f78d030 | 192.168.1.252 | ACTIVE              | amphora  |
| 83c3027c-df67-41d4-b5df-793fe0da370c | test_tree3-loadbalancer-f7arjwyhl2bh | d3f6513b54b34c3bb4bf6ee58f78d030 | 192.168.1.96  | ACTIVE              | amphora  |
+--------------------------------------+--------------------------------------+----------------------------------+---------------+---------------------+----------+

(overcloud) [stack@undercloud-0 ~]$ logout
Connection to undercloud-0 closed.

[root@seal19 ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 4     undercloud-0                   running
 17    compute-1                      running
 18    compute-0                      running
 19    controller-0                   running
 20    controller-2                   running
 21    controller-1                   running

[root@seal19 ~]# virsh dumpxml compute-0 | grep cpu
  <vcpu placement='static'>4</vcpu>
  <cpu mode='host-passthrough' check='none'/>

[root@seal19 ~]# virsh dumpxml compute-1 | grep cpu
  <vcpu placement='static'>4</vcpu>
  <cpu mode='host-passthrough' check='none'/>

[root@seal19 ~]# virsh dumpxml compute-0 | grep emo
  <memory unit='KiB'>12572672</memory>
  <currentMemory unit='KiB'>12572672</currentMemory>

[root@seal19 ~]# virsh dumpxml compute-1 | grep emo
  <memory unit='KiB'>12572672</memory>
  <currentMemory unit='KiB'>12572672</currentMemory>

[root@seal19 ~]# virsh shutdown compute-1
Domain compute-1 is being shutdown

[root@seal19 ~]# virsh edit compute-1
Domain compute-1 XML configuration edited.   <-- I doubled the memory and vcpu

[root@seal19 ~]# virsh create /etc/libvirt/qemu/compute-1.xml
Domain compute-1 created from /etc/libvirt/qemu/compute-1.xml

[root@seal19 ~]# virsh shutdown compute-0
Domain compute-0 is being shutdown

[root@seal19 ~]# virsh edit compute-0
Domain compute-0 XML configuration edited.  <-- I doubled the memory and vcpu

[root@seal19 ~]# virsh create /etc/libvirt/qemu/compute-0.xml
Domain compute-0 created from /etc/libvirt/qemu/compute-0.xml

[root@seal19 ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 4     undercloud-0                   running
 19    controller-0                   running
 20    controller-2                   running
 21    controller-1                   running
 22    compute-1                      running
 23    compute-0                      running

[root@seal19 ~]# ssh stack@undercloud-0
Warning: Permanently added 'undercloud-0' (ECDSA) to the list of known hosts.
Activate the web console with: systemctl enable --now cockpit.socket

This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register

Last login: Thu Sep 24 09:39:20 2020 from 172.16.0.1
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer list
+--------------------------------------+--------------------------------------+----------------------------------+---------------+---------------------+----------+
| id                                   | name                                 | project_id                       | vip_address   | provisioning_status | provider |
+--------------------------------------+--------------------------------------+----------------------------------+---------------+---------------------+----------+
| 62385ae9-3a52-41d4-87c8-4abd481e6261 | test_tree-loadbalancer-4t7iwpemjra2  | d3f6513b54b34c3bb4bf6ee58f78d030 | 192.168.1.29  | ACTIVE              | amphora  |
| 78c30ea1-c6b9-41e8-9e8d-61497c8f7734 | test_tree2-loadbalancer-atzbw64lfivs | d3f6513b54b34c3bb4bf6ee58f78d030 | 192.168.1.252 | ACTIVE              | amphora  |
| 83c3027c-df67-41d4-b5df-793fe0da370c | test_tree3-loadbalancer-f7arjwyhl2bh | d3f6513b54b34c3bb4bf6ee58f78d030 | 192.168.1.96  | ACTIVE              | amphora  |
+--------------------------------------+--------------------------------------+----------------------------------+---------------+---------------------+----------+
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer amphora list
+--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+
| id                                   | loadbalancer_id                      | status    | role       | lb_network_ip | ha_ip         |
+--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+
| fbeac58b-2cfc-4ffd-b93a-cf2fd6832b1d | 78c30ea1-c6b9-41e8-9e8d-61497c8f7734 | ALLOCATED | STANDALONE | 172.24.3.138  | 192.168.1.252 |
| 2d02afbe-cbc7-4fa3-92e2-3b2b00acde32 | 83c3027c-df67-41d4-b5df-793fe0da370c | ALLOCATED | STANDALONE | 172.24.3.215  | 192.168.1.96  |
| bd2156cd-3e8f-4148-86e3-f539696c3617 | 62385ae9-3a52-41d4-87c8-4abd481e6261 | ALLOCATED | STANDALONE | 172.24.3.156  | 192.168.1.29  |
+--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+
(overcloud) [stack@undercloud-0 ~]$ 

The provisioning_status of all the 3 LBs is ACTIVE.
The status of all the 3 Amphoras is ALLOCATED.


The puddle I checked is:
(overcloud) [stack@undercloud-0 virt]$ cat /var/lib/rhos-release/latest-installed
16.1  -p RHOS-16.1-RHEL-8-20200917.n.3

Looks good to me.

Comment 9 Omer Schwartz 2020-09-24 15:12:48 UTC

I forgot to attach the updated COMPUTEs servers details, so there they are:

[root@seal19 ~]# virsh dumpxml compute-0 | grep cpu
  <vcpu placement='static'>8</vcpu>
  <cpu mode='host-passthrough' check='none'/>

[root@seal19 ~]# virsh dumpxml compute-1 | grep cpu
  <vcpu placement='static'>8</vcpu>
  <cpu mode='host-passthrough' check='none'/>

[root@seal19 ~]# virsh dumpxml compute-0 | grep emo
  <memory unit='KiB'>25145344</memory>
  <currentMemory unit='KiB'>25145344</currentMemory>

[root@seal19 ~]# virsh dumpxml compute-1 | grep emo
  <memory unit='KiB'>25145344</memory>
  <currentMemory unit='KiB'>25145344</currentMemory>

[root@seal19 ~]#

Comment 15 errata-xmlrpc 2020-10-28 15:39:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4284

Note You need to log in before you can comment on or make changes to this bug.