Bug 1824203 - openshift-sdn does not update egress IPs on the node due to deadlock
Summary: openshift-sdn does not update egress IPs on the node due to deadlock
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.5.0
Assignee: Juan Luis de Sousa-Valadas
QA Contact: huirwang
URL:
Whiteboard: SDN-CUST-IMPACT
Depends On:
Blocks: 1824243 1840215
TreeView+ depends on / blocked
 
Reported: 2020-04-15 14:18 UTC by Pablo Alonso Rodriguez
Modified: 2021-12-17 08:03 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The egressIPTracker had methods that lock eit.mutex and that call evm functions that lock evm.mutex. The problem with this was that evm.mutex had to write to the evm.updates channel which isn't buffered and becomes blocked until eit.setNodeOffline was run, and this function also locked eit.mutex. Consequence: When there was a vas amount of egressIPs there was a deadlock Fix: Removed the deadlock by making the updates channel buffered so that it's a mere notification system which doesn't contain the actual data. Result: There is no deadlock
Clone Of:
: 1824243 1840215 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:27:56 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift sdn pull 132 0 None closed Bug 1824203: Make egressVXLANMonitor updates channel buffered 2021-02-12 18:32:59 UTC
Github openshift sdn pull 139 0 None closed Bug 1824203: Fix egressVXLANMonitor and egressIPTracker deadlock 2021-02-12 18:32:59 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:28:28 UTC

Description Pablo Alonso Rodriguez 2020-04-15 14:18:51 UTC
Description of problem:

On customer environment, we find out that during (usually massive) egress IP migrations (moved from one hostsubnet to another, either with or without egress CIDRs being involved), some of the changes in the host subnets are not reflected on the node (IPs are not added/removed from the interfaces, OVS and iptables are not updated, etc.).

After examining some coredumps, it was possible to find a deadlock in both of them. I'll post complete coredump analysis in subsequent internal comments, but quick summary is:
- A goroutine (either the one processing events from netnamespaces informer or hostsubnets informer) holds a lock on the egress IP tracker but is waiting on a lock on the egress vxlan monitor.
- VXLAN monitor poll goroutine is holding the lock on the egress vxlan monitor but is waiting on a write on the "updates" channel (which is an unbuffered channel, so writes block until the receiver reads).
- Goroutine in charge of reading from "updates" channel is blocked waiting to acquire the lock on the egress IP tracker, so it cannot read from "updates" channel and we are in a deadlock.
- In such deadlock, when a change event on a hostsubnet is processed, the goroutine doing it can either become blocked waiting to acquire the lock on the egress IP tracker or be the goroutine holding the lock on the egress IP tracking but waiting on the egress vxlan monitor lock (the one from first point of this list). This, in turn, makes no other hostsubnet change to be processed, so SDN would not update the node with egress IP changes.
- A possible side effect is that other nodes may be considered mistakenly offline because the "Ping" goroutines launched by vxlan monitor poll can also be blocked waiting on the lock on the egress IP tracker, but this effect has not been confirmed by the customer.

Please bear with me while I upload full coredump analysis, because I bet everything can be better understood on them.

Version-Release number of selected component (if applicable):

3.11.188 (but differences with 3.11.200 would not make a difference).

How reproducible:

Not consistently. Moving many egress IPs from one hostsubnet to another may make this more likely, but no clear pattern. Still working to get a more consistent reproducer.

Steps to Reproduce:
(read above)

Actual results:

Egress IP changes not applied on node.

Expected results:

Egress IP changes applied on node.

Additional info:
(in comments)

(edits problem description only consisted in minor typos fixing)

Comment 4 Ben Bennett 2020-04-15 15:39:29 UTC
Assigning to 4.5 and adding a 3.11 clone to track the backport (when ready).

Comment 7 Juan Luis de Sousa-Valadas 2020-04-16 15:04:13 UTC
Pablo, I looked at the code and I think it's safe to make the channel buffered, however I'm concerned about two things.

When they make these massive egress IP migrations, how much namespaces and how many nodes are we speaking about? oc get netnamespace,hostsubnet. I'm asking so that I can size the buffer accordingly.

Comment 10 Pablo Alonso Rodriguez 2020-04-16 16:34:13 UTC
Most usual massive egress IP migration scenario would be when one of the nodes is updated or lost and its IPs are moved to another node (either because it is deemed down or egress CIDR has been removed to force the IPs to move to other nodes).

Comment 47 errata-xmlrpc 2020-07-13 17:27:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.