Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1093337

Summary: OVS: kernel deadlock while updating flow stats
Product: Red Hat Enterprise Linux 6 Reporter: Ofer Blaut <oblaut>
Component: kernelAssignee: Flavio Leitner <fleitner>
Status: CLOSED NOTABUG QA Contact: Ofer Blaut <oblaut>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.5CC: apevec, chrisw, dhoward, fleitner, tgraf, yeylon
Target Milestone: pre-dev-freeze   
Target Release: 6.6   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1094867 (view as bug list) Environment:
Last Closed: 2014-05-14 14:46:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
OVS Panic
none
var-log-messages
none
answer files i used none

Description Ofer Blaut 2014-05-01 12:43:13 UTC
Created attachment 891502 [details]
OVS Panic

Description of problem:

I was using RHEL 7 +  RDO Icehouse 3

My host got disconnected and console displayed attached panic 

Version-Release number of selected component (if applicable):


How reproducible:


Setup was AIO with neutron OVS and VLAN configuration 
don't know how to reproduce 

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Ofer Blaut 2014-05-01 12:51:44 UTC
1. selinux was permissive 
2. versions

[root@cougar16 ~]# rpm -qa | grep kernel
abrt-addon-kerneloops-2.1.11-12.el7.x86_64
kernel-tools-3.10.0-121.el7.x86_64
erlang-kernel-R16B-03.2.el7.x86_64
kernel-tools-libs-3.10.0-121.el7.x86_64
kernel-3.10.0-121.el7.x86_64
[root@cougar16 ~]# rpm -qa | grep openv
openstack-neutron-openvswitch-2014.1-11.el7.noarch
openvswitch-2.0.0-6.el7.x86_64

Comment 2 Ofer Blaut 2014-05-01 13:05:09 UTC
Created attachment 891517 [details]
var-log-messages

Comment 3 Ofer Blaut 2014-05-01 13:07:22 UTC
Created attachment 891519 [details]
answer files i used

Comment 4 Thomas Graf 2014-05-06 15:58:04 UTC
Fixed upstream:

commit 4f647e0a3c37b8d5086214128614a136064110c3
Author: Flavio Leitner <fbl>
Date:   Thu Mar 27 11:05:34 2014 -0300

    openvswitch: fix a possible deadlock and lockdep warning
    
    There are two problematic situations.
    
    A deadlock can happen when is_percpu is false because it can get
    interrupted while holding the spinlock. Then it executes
    ovs_flow_stats_update() in softirq context which tries to get
    the same lock.
    
    The second sitation is that when is_percpu is true, the code
    correctly disables BH but only for the local CPU, so the
    following can happen when locking the remote CPU without
    disabling BH:
    
           CPU#0                            CPU#1
      ovs_flow_stats_get()
       stats_read()
     +->spin_lock remote CPU#1        ovs_flow_stats_get()
     |  <interrupted>                  stats_read()
     |  ...                       +-->  spin_lock remote CPU#0
     |                            |     <interrupted>
     |  ovs_flow_stats_update()   |     ...
     |   spin_lock local CPU#0 <--+     ovs_flow_stats_update()
     +---------------------------------- spin_lock local CPU#1

[...]

@Flavio: I didn't find a RHEL7 clone for this patch yet. Please confirm.

Comment 5 Thomas Graf 2014-05-06 16:09:31 UTC
Changing product to RHEL6 and component to kernel

Comment 6 Thomas Graf 2014-05-06 17:09:23 UTC
Looking at ovs_flow_cmd_fill_info() I see a spin_lock_bh(&flow->lock) before accessing stats which should ensure that no packets are being processed so I RHEL6.5 does not seem affected.

Flavio, please confirm.

Comment 7 Flavio Leitner 2014-05-14 14:46:41 UTC
There are two possible scenarios that can cause the issue.  The problem is well described in the upstream commit as pointed in comment#4.

Both issues were introduced when the openvswitch (kernel bits) moved from a single stats to per-cpu stats because of mega flow. The commit is below:

commit e298e505700604c97e6a9edb21cebb080bdb91f6
Author: Pravin B Shelar <pshelar>
Date:   Tue Oct 29 17:22:21 2013 -0700

    openvswitch: Per cpu flow stats.
    
    With mega flow implementation ovs flow can be shared between
    multiple CPUs which makes stats updates highly contended
    operation. This patch uses per-CPU stats in cases where a flow
    is likely to be shared (if there is a wildcard in the 5-tuple
    and therefore likely to be spread by RSS). In other situations,
    it uses the current strategy, saving memory and allocation time.
    
    Signed-off-by: Pravin B Shelar <pshelar>
    Signed-off-by: Jesse Gross <jesse>

RHEL-6 (up to 2.6.32-465.el6/May 13 2014) doesn't include the patch above, so it is using single stats which is correctly using spin_lock_bh(&flow->lock) before accessing stats as Thomas said.

Having said that, I am closing this one as NotABug.

RHEL-7 version of this bug: bz#1094867
Thanks