Bug 1659274 - Octavia, when used in HA environment (in a Backup-Master setup), drops traffic after the backup VM has been rebooted.
Summary: Octavia, when used in HA environment (in a Backup-Master setup), drops traffi...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-octavia
Version: 13.0 (Queens)
Hardware: All
OS: Linux
high
high
Target Milestone: Upstream M3
: ---
Assignee: Nir Magnezi
QA Contact: Alexander Stafeyev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-13 23:59 UTC by Michele Valsecchi
Modified: 2019-09-10 14:07 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-30 15:16:56 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3755241 None None None 2019-06-26 05:14:53 UTC

Description Michele Valsecchi 2018-12-13 23:59:58 UTC
Description of problem:
Some traffic does not go to the master but it sent instead to the backup and throws 503 errors, when the Amphora backup VM is rebooted.

Version-Release number of selected component (if applicable):
RHOSP13

How reproducible:
-

Steps to Reproduce:
1. Restart the backup side of Master-Backup in HA environment.

The time of the traffic being dropped, just matches when the backup of Amphora moved from STANDALONE to BACKUP. At the same time, these changes happened in ARP table inside the virtual router.
----------
[Tue Dec xx 16:49:48.064 201x] 192.168.10.6 dev qr-b11xxxxx-00 lladdr ab:00:00:00:a0:a0 DELAY
[Tue Dec xx 16:49:48.199 201x] 192.168.10.6 dev qr-b11xxxxx-00 lladdr ab:00:00:00:a0:a0 DELAY
[Tue Dec xx 16:49:48.322 201x] 192.168.10.6 dev qr-b11xxxxx-00 lladdr ab:00:00:00:a0:a0 DELAY
[Tue Dec xx 16:49:48.456 201x] 192.168.10.6 dev qr-b11xxxxx-00 lladdr ab:00:00:00:a0:a0 DELAY
[Tue Dec xx 16:49:48.589 201x] 192.168.10.6 dev qr-b11xxxxx-00 lladdr ab:00:00:00:a0:a0 STALE
[Tue Dec xx 16:49:48.719 201x] 192.168.10.6 dev qr-b11xxxxx-00 lladdr ab:00:00:00:a0:a0 STALE
[Tue Dec xx 16:49:48.856 201x] 192.168.10.6 dev qr-b11xxxxx-00 lladdr ab:00:00:00:a0:a0 DELAY
[Tue Dec xx 16:49:48.989 201x] 192.168.10.6 dev qr-b11xxxxx-00 lladdr ab:00:00:00:a0:a0 DELAY
[Tue Dec xx 16:49:49.120 201x] 192.168.10.6 dev qr-b11xxxxx-00 lladdr ab:00:00:00:a0:a0 DELAY
----------

2. You can see the problem in the client logs at 16:49:48, there are about 5 seconds where there are network outrages.

----------
[Tue Dec xx 16:49:48.064 201x] Tue Dec xx 16:49:47 JST 20xx 503 Service Unavailable
[Tue Dec xx 16:49:48.064 201x] No server is available to handle this request.
[Tue Dec xx 16:49:48.064 201x] 
----------


Actual results:
When restarting the backup side of Master-Backup, for a short span of time, the new instance of backup does not receive any traffic. After the VM has boot up, HAProxy throws 503 errors.

Expected results:
Based on the specs, everything should be sent back to the master, without any effect on the client side.

Additional info:
-


Note You need to log in before you can comment on or make changes to this bug.