Bug 1824203
Summary: | openshift-sdn does not update egress IPs on the node due to deadlock | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Pablo Alonso Rodriguez <palonsor> | |
Component: | Networking | Assignee: | Juan Luis de Sousa-Valadas <jdesousa> | |
Networking sub component: | openshift-sdn | QA Contact: | huirwang | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | high | CC: | aarapov, aivaras.laimikis, andbartl, bbennett, ckoep, erich, jdesousa, jnordell, openshift-bugs-escalate, pweil, rbost, rsandu, rsunog, travi, tsmetana | |
Version: | 3.11.0 | |||
Target Milestone: | --- | |||
Target Release: | 4.5.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | SDN-CUST-IMPACT | |||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause:
The egressIPTracker had methods that lock eit.mutex and that call evm functions that lock evm.mutex.
The problem with this was that evm.mutex had to write to the evm.updates channel which isn't buffered and becomes blocked until eit.setNodeOffline was run, and this function also locked eit.mutex.
Consequence:
When there was a vas amount of egressIPs there was a deadlock
Fix:
Removed the deadlock by making the updates channel buffered so that it's a mere notification system which doesn't contain the actual data.
Result:
There is no deadlock
|
Story Points: | --- | |
Clone Of: | ||||
: | 1824243 1840215 (view as bug list) | Environment: | ||
Last Closed: | 2020-07-13 17:27:56 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1824243, 1840215 |
Description
Pablo Alonso Rodriguez
2020-04-15 14:18:51 UTC
Assigning to 4.5 and adding a 3.11 clone to track the backport (when ready). Pablo, I looked at the code and I think it's safe to make the channel buffered, however I'm concerned about two things. When they make these massive egress IP migrations, how much namespaces and how many nodes are we speaking about? oc get netnamespace,hostsubnet. I'm asking so that I can size the buffer accordingly. Most usual massive egress IP migration scenario would be when one of the nodes is updated or lost and its IPs are moved to another node (either because it is deemed down or egress CIDR has been removed to force the IPs to move to other nodes). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |