Bug 1565508
Summary: | Scalability problems with security group rules. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Jiří Mencák <jmencak> | ||||||
Component: | openstack-neutron | Assignee: | Assaf Muller <amuller> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Toni Freger <tfreger> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 11.0 (Ocata) | CC: | amuller, bcafarel, bhaley, chrisw, jlibosva, jmencak, njohnston, srevivo | ||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | aos-scalability-39 | ||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2019-04-18 15:46:28 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Jiří Mencák
2018-04-10 07:43:28 UTC
The errors from OVS agent are based on that agent can't talk to rabbit bus because of network stack being choked. A few questions: Can you observe any process like ovs-vswitchd spiking CPU utilization? We've been having issues with such lately. Do you use l2 population? How does the host networking on controllers look like - is the management network (API calls, rabbitmq) using OVS bridges? You can provide such information by issuing "ovs-vsctl show" command. "ip a" would be helpful too.(In reply to jmencak from comment #0) > Additional info: > Workaround: By making the the security group rules very permissive, I was > able to go beyond ~2100 VMs in the same environment. There is also a high > number of iptable rules on the the computes. Does it mean compute nodes also suffer networking issues or it's just an observation? Created attachment 1420702 [details]
ovs-vsctl show on the controller-0
Adding ovs-vsctl show on the controller-0 from the controller. Had to reinstall OpenStack, but the deployment should be exactly the same. As for the compute nodes suffering the same problem, this needs to be verifed, but the ssh disconnects from the controllers were more frequent.
Picking this back up; can you try this on Rocky? There have been multiple improvements in security group efficiency in recent months. Thanks! Closing as per comment #8, there have been many changes in recent releases related to optimization (including security groups), so currently supported releases should behave much better in this situation |