Bug 1827795 - Using security rules with remote security group may cause deployment to fail
Summary: Using security rules with remote security group may cause deployment to fail
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.2.z
Assignee: Martin André
QA Contact: David Sanz
URL:
Whiteboard:
Depends On: 1825973
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-24 19:52 UTC by Martin André
Modified: 2020-05-13 11:07 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The openstack IPI installer creates Security groups using `remote_group_id` to allow traffic origins. Consequence: Using `remote_group_id` in the security rules is very inefficient, triggering a lot of computation by ovs agent to generate the flows and possibly exceeding the time allocated for flow generation. In such cases, especially in environments already under stress, masters nodes may be unable to communicate with worker nodes, leading the deployment to fail. Fix: Use IP prefixes for whitelisting traffic origins instead of `remote_group_id` Result: Less load on Neutron resources should reduce the occurrence of timeouts
Clone Of:
Environment:
Last Closed: 2020-05-13 11:07:19 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift installer pull 3505 None closed Bug 1827795: OpenStack: Replace remote_group_id with remote_ip_prefix 2020-05-08 14:07:26 UTC
Red Hat Product Errata RHBA-2020:2023 None None None 2020-05-13 11:07:40 UTC

Description Martin André 2020-04-24 19:52:09 UTC
This bug was initially created as a copy of Bug #1825973

I am copying this bug because: 



+++ This bug was initially created as a clone of Bug #1825460 +++

+++ This bug was initially created as a clone of Bug #1824287 +++

Using security groups as the destination or source of a security rule on openstack is very resource intensive. This can lead to network traffic performance issues with openstack neutron. 
The degraded network traffic can lead to installation failure where the bootstrap process times out because pods can access resources through the openshift sdn internal network.  
For example, some pods are unable to succesfully resolv ip addresses because they can't reach the internal dns service of the cluster. 

Communication between pods is spotty and leads to cascade failures.

--- Additional comment from Martin André on 2020-04-16 07:50:21 UTC ---

Using `remote_group_id` in the security rules is very inefficient, triggering a lot of computation by ovs agent to generate the flows and possibly exceeding the time allocated for flow generation. In such cases, especially in environments already under stress, masters nodes may be unable to communicate with worker nodes, leading the deployment to fail.

We're seeing this behavior in MOC, the cloud we're using for our CI.

The workaround is to use the more efficient remote_ip_prefix rather than remote_group_id when creating security rules.

This was already done for openshift-ansible in the past: https://bugzilla.redhat.com/show_bug.cgi?id=1703947

--- Additional comment from errata-xmlrpc on 2020-04-17 16:23:19 UTC ---

This bug has been added to advisory RHBA-2020:51809 by OpenShift Release Team Bot (ocp-build/buildvm.openshift.eng.bos.redhat.com@REDHAT.COM)

--- Additional comment from errata-xmlrpc on 2020-04-17 16:23:22 UTC ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2020:51809-02
https://errata.devel.redhat.com/advisory/51809

Comment 3 David Sanz 2020-05-04 09:03:36 UTC
Verified on 4.2.0-0.nightly-2020-05-03-071723

Comment 5 errata-xmlrpc 2020-05-13 11:07:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2023


Note You need to log in before you can comment on or make changes to this bug.