Bug 1874542
| Summary: | Amphora compute resources are not cleaned up from Octavia's database when they're effectively not running | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Andrea Veri <averi> |
| Component: | openstack-octavia | Assignee: | Nate Johnston <njohnston> |
| Status: | CLOSED DUPLICATE | QA Contact: | Bruna Bonguardo <bbonguar> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 16.1 (Train) | CC: | gthiemon, ihrachys, lpeer, majopela, michjohn, scohen |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-09-02 15:17:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Andrea Veri
2020-09-01 14:57:12 UTC
The database state was accurate per the sosreports included on the customer ticket. The root cause of the issue is https://bugzilla.redhat.com/show_bug.cgi?id=1725189 where nova will fail to release attached resources from an instance when the compute host is down. This causes issues in Octavia as it can't detach the network ports from the instance to reallocate those resources to a replacement instance. The indication of this is: health-manager.log:2020-08-26 11:11:21.079 77 ERROR octavia.controller.worker.v1.controller_worker [-] Amphora e077e480-176e-4563-b385-823ba783fd87 failover exception: Port a8328583-43f8-43ff-b586-82240e1661db failed to detach (device_id 667153b6-beb5-4366-ad4d-4f2e14f66b58) within the required time (300 s).: octavia.network.base.TimeoutException: Port a8328583-43f8-43ff-b586-82240e1661db failed to detach (device_id 667153b6-beb5-4366-ad4d-4f2e14f66b58) within the required time (300 s). As the compute hosts were being shutdown, Octavia was attempting to repair the load balancers but getting blocked with the above nova behavior. In response to this issue in nova, the Octavia team has created a work around in support of https://bugzilla.redhat.com/show_bug.cgi?id=1723482 as it is not clear if/when the nova issue will be resolved. With the associated fix to BZ 1723482, Octavia would have either be successful at repairing the load balancer, or if no compute hosts were left functional, a followup load balancer failover (via the API) would have restored full service to the load balancer. Michael, ran another resiliency testing this morning and I can confirm the bug you mention (https://bugzilla.redhat.com/show_bug.cgi?id=1723482) is exactly what we're hitting: |__Flow 'octavia-failover-amphora-flow': octavia.network.base.TimeoutException: Port eda7e7c2-d0d7-4291-baea-71b6e1f73b7c failed to detach (device_id 4fdf25b0-5999-4b5b-82a8-8e15ba74ae7a) within the required time (300 s). 2020-09-02 08:51:45.565 77 ERROR octavia.controller.worker.v1.controller_worker octavia.network.base.TimeoutException: Port eda7e7c2-d0d7-4291-baea-71b6e1f73b7c failed to detach (device_id 4fdf25b0-5999-4b5b-82a8-8e15ba74ae7a) within the required time (300 s). 2020-09-02 08:51:45.603 77 ERROR octavia.controller.worker.v1.controller_worker [-] Amphora d4f54a14-1fb0-4adb-b82e-585cc6ebc2fd failover exception: Port eda7e7c2-d0d7-4291-baea-71b6e1f73b7c failed to detach (device_id 4fdf25b0-5999-4b5b-82a8-8e15ba74ae7a) within the required time (300 s).: octavia.network.base.TimeoutException: Port eda7e7c2-d0d7-4291-baea-71b6e1f73b7c failed to detach (device_id 4fdf25b0-5999-4b5b-82a8-8e15ba74ae7a) within the required time (300 s). Feel free to close this bug and we can continue on https://bugzilla.redhat.com/show_bug.cgi?id=1723482 to avoid duplicates, thanks! Marked as duplicate of BZ 1874927 (which is similar to 1723482 but for OSP16.1) *** This bug has been marked as a duplicate of bug 1874927 *** |