Bug 1824287
Summary: | Using security rules with remote security group may cause deployment to fail | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Adolfo Duarte <adduarte> | |
Component: | Installer | Assignee: | Martin André <m.andre> | |
Installer sub component: | OpenShift on OpenStack | QA Contact: | David Sanz <dsanzmor> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | urgent | CC: | m.andre, pprinett, wsun | |
Version: | 4.5 | |||
Target Milestone: | --- | |||
Target Release: | 4.5.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: The openstack IPI installer creates Security groups using `remote_group_id` to allow traffic origins.
Consequence: Using `remote_group_id` in the security rules is very inefficient, triggering a lot of computation by ovs agent to generate the flows and possibly exceeding the time allocated for flow generation. In such cases, especially in environments already under stress, masters nodes may be unable to communicate with worker nodes, leading the deployment to fail.
Fix: Use IP prefixes for whitelisting traffic origins instead of `remote_group_id`
Result: Less load on Neutron resources should reduce the occurrence of timeouts
|
Story Points: | --- | |
Clone Of: | ||||
: | 1825286 1825460 (view as bug list) | Environment: | ||
Last Closed: | 2020-07-13 17:27:56 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1825286, 1825460 |
Description
Adolfo Duarte
2020-04-15 17:53:34 UTC
Using `remote_group_id` in the security rules is very inefficient, triggering a lot of computation by ovs agent to generate the flows and possibly exceeding the time allocated for flow generation. In such cases, especially in environments already under stress, masters nodes may be unable to communicate with worker nodes, leading the deployment to fail. We're seeing this behavior in MOC, the cloud we're using for our CI. The workaround is to use the more efficient remote_ip_prefix rather than remote_group_id when creating security rules. This was already done for openshift-ansible in the past: https://bugzilla.redhat.com/show_bug.cgi?id=1703947 A note for the verifier QE. This bug affects our CI. As a result, we can already prove the effectiveness of the patch: jobs are green again after the merge. We would still need your help for the usual regression / edge case testing. Thank you! No failure detected on latest 4.5 nightly after patch is merged, and secgroup rules are fine. Marking as verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |