Bug 1869047

Summary: [OSP 13][ML2/OVS] L3 HA router failure and stability issues with large floating IP count
Product: Red Hat OpenStack Reporter: Matt Flusche <mflusche>
Component: openstack-neutronAssignee: Slawek Kaplonski <skaplons>
Status: CLOSED DUPLICATE QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: akaiser, amuller, bdobreli, chrisw, jhardee, ralonsoh, rohara, scohen, skaplons
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-21 19:55:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matt Flusche 2020-08-15 21:30:12 UTC
Description of problem:

This specific environment has a single L3 router with about 200 FIPs.  The keepalived processes associated with the router consumes a large amount of CPU.  At times the keepalived instances will transition from active to standby and back every minute or so across the L3 agents.  There are about 350 other HA routers.  This router will also cause overall system stability on the controller nodes (high load average and pacemaker services failures).  Disabling this specific router will resolve the apparent issues.

Version-Release number of selected component (if applicable):
OSP 13

$ grep neutron installed-rpms 
openstack-neutron-12.1.1-6.el7ost.noarch                    Sun Apr 12 08:25:13 2020
openstack-neutron-common-12.1.1-6.el7ost.noarch             Sun Apr 12 08:21:38 2020
openstack-neutron-l2gw-agent-12.0.2-0.20190420004620.270972f.el7ost.noarch Sun Apr 12 08:26:48 2020
openstack-neutron-lbaas-12.0.1-0.20190803015156.b86fcef.el7ost.noarch Sun Apr 12 08:26:08 2020
openstack-neutron-lbaas-ui-4.0.1-0.20190723082436.ccf8621.el7ost.noarch Sun Apr 12 08:26:07 2020
openstack-neutron-linuxbridge-12.1.1-6.el7ost.noarch        Sun Apr 12 08:26:48 2020
openstack-neutron-metering-agent-12.1.1-6.el7ost.noarch     Sun Apr 12 08:26:48 2020
openstack-neutron-ml2-12.1.1-6.el7ost.noarch                Sun Apr 12 08:21:39 2020
openstack-neutron-openvswitch-12.1.1-6.el7ost.noarch        Sun Apr 12 08:26:46 2020
openstack-neutron-sriov-nic-agent-12.1.1-6.el7ost.noarch    Sun Apr 12 08:26:49 2020
puppet-neutron-12.4.1-10.el7ost.noarch                      Sun Apr 12 08:25:11 2020
python2-neutronclient-6.7.0-1.el7ost.noarch                 Thu Apr 18 10:02:11 2019
python2-neutron-lib-1.13.0-2.el7ost.noarch                  Sun Apr 12 08:21:24 2020
python-neutron-12.1.1-6.el7ost.noarch                       Sun Apr 12 08:21:29 2020
python-neutron-lbaas-12.0.1-0.20190803015156.b86fcef.el7ost.noarch Sun Apr 12 08:21:29 2020


How reproducible:
Unknown, This specific environment


Additional info:
I'll provide additional details in private comments

Comment 17 Slawek Kaplonski 2020-09-21 19:55:47 UTC

*** This bug has been marked as a duplicate of bug 1869355 ***