Bug 1191633

Summary: ovs: excessive CPU usage in case of broadcast storm
Product: Red Hat Enterprise Linux 7 Reporter: Jiri Benc <jbenc>
Component: openvswitchAssignee: Jiri Benc <jbenc>
Status: CLOSED WORKSFORME QA Contact: Rick Alongi <ralongi>
Severity: medium Docs Contact:
Priority: low    
Version: 7.1CC: aloughla, atragler, fdinitto, fleitner, kzhang, majopela, mleitner, network-qe, rkhan
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-04 14:17:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1185521, 1191918, 1191922    

Description Jiri Benc 2015-02-11 16:02:01 UTC
Setup: three computers, ovs bridge on each of them with vxlan tunnels in full mesh topology. Send a broadcast packet to the internal port of the bridge.

Originally reproduced with openstack (see bug 1185521 comment 25 and following coments for details).

Happens at least with ovs 2.1.2 and 2.3.1.

The so far debugged symptoms are excessive flow dumping:

The leader revalidator thread is dumping flows from the kernel in a tight loop. The other revalidator threads and ovs-vswitchd thread are spending a lot of time on various locks (which is logical). The flow revalidation should not happen more often than every 500 msec but that's not the case here for some reason.

It seems the problem is seq_wait in udpif_revalidator which either sets poll_immediate_wake or causes poll_block to return immediately because of latch_wait(&waiter->thread->latch) in seq_wait__.

Comment 4 Rick Alongi 2016-02-22 20:39:49 UTC
Reverting the QA Ack until Dev has provided an affirmative Ack.

Comment 5 Jiri Benc 2016-09-04 14:17:15 UTC
Strangely, I cannot reproduce it anymore. The storm itself is easily reproducible but the CPU usage I get is as expected.

Given we don't have anyone complaining, this has never been observed during normal operation and I can't reproduce anymore, I'm closing this.

If we hit this again, we'll reopen.