Bug 1093337
| Summary: | OVS: kernel deadlock while updating flow stats | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Ofer Blaut <oblaut> | ||||||||
| Component: | kernel | Assignee: | Flavio Leitner <fleitner> | ||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Ofer Blaut <oblaut> | ||||||||
| Severity: | urgent | Docs Contact: | |||||||||
| Priority: | urgent | ||||||||||
| Version: | 6.5 | CC: | apevec, chrisw, dhoward, fleitner, tgraf, yeylon | ||||||||
| Target Milestone: | pre-dev-freeze | ||||||||||
| Target Release: | 6.6 | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | |||||||||||
| : | 1094867 (view as bug list) | Environment: | |||||||||
| Last Closed: | 2014-05-14 14:46:41 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
1. selinux was permissive 2. versions [root@cougar16 ~]# rpm -qa | grep kernel abrt-addon-kerneloops-2.1.11-12.el7.x86_64 kernel-tools-3.10.0-121.el7.x86_64 erlang-kernel-R16B-03.2.el7.x86_64 kernel-tools-libs-3.10.0-121.el7.x86_64 kernel-3.10.0-121.el7.x86_64 [root@cougar16 ~]# rpm -qa | grep openv openstack-neutron-openvswitch-2014.1-11.el7.noarch openvswitch-2.0.0-6.el7.x86_64 Created attachment 891517 [details]
var-log-messages
Created attachment 891519 [details]
answer files i used
Fixed upstream:
commit 4f647e0a3c37b8d5086214128614a136064110c3
Author: Flavio Leitner <fbl>
Date: Thu Mar 27 11:05:34 2014 -0300
openvswitch: fix a possible deadlock and lockdep warning
There are two problematic situations.
A deadlock can happen when is_percpu is false because it can get
interrupted while holding the spinlock. Then it executes
ovs_flow_stats_update() in softirq context which tries to get
the same lock.
The second sitation is that when is_percpu is true, the code
correctly disables BH but only for the local CPU, so the
following can happen when locking the remote CPU without
disabling BH:
CPU#0 CPU#1
ovs_flow_stats_get()
stats_read()
+->spin_lock remote CPU#1 ovs_flow_stats_get()
| <interrupted> stats_read()
| ... +--> spin_lock remote CPU#0
| | <interrupted>
| ovs_flow_stats_update() | ...
| spin_lock local CPU#0 <--+ ovs_flow_stats_update()
+---------------------------------- spin_lock local CPU#1
[...]
@Flavio: I didn't find a RHEL7 clone for this patch yet. Please confirm.
Changing product to RHEL6 and component to kernel Looking at ovs_flow_cmd_fill_info() I see a spin_lock_bh(&flow->lock) before accessing stats which should ensure that no packets are being processed so I RHEL6.5 does not seem affected. Flavio, please confirm. There are two possible scenarios that can cause the issue. The problem is well described in the upstream commit as pointed in comment#4. Both issues were introduced when the openvswitch (kernel bits) moved from a single stats to per-cpu stats because of mega flow. The commit is below: commit e298e505700604c97e6a9edb21cebb080bdb91f6 Author: Pravin B Shelar <pshelar> Date: Tue Oct 29 17:22:21 2013 -0700 openvswitch: Per cpu flow stats. With mega flow implementation ovs flow can be shared between multiple CPUs which makes stats updates highly contended operation. This patch uses per-CPU stats in cases where a flow is likely to be shared (if there is a wildcard in the 5-tuple and therefore likely to be spread by RSS). In other situations, it uses the current strategy, saving memory and allocation time. Signed-off-by: Pravin B Shelar <pshelar> Signed-off-by: Jesse Gross <jesse> RHEL-6 (up to 2.6.32-465.el6/May 13 2014) doesn't include the patch above, so it is using single stats which is correctly using spin_lock_bh(&flow->lock) before accessing stats as Thomas said. Having said that, I am closing this one as NotABug. RHEL-7 version of this bug: bz#1094867 Thanks |
Created attachment 891502 [details] OVS Panic Description of problem: I was using RHEL 7 + RDO Icehouse 3 My host got disconnected and console displayed attached panic Version-Release number of selected component (if applicable): How reproducible: Setup was AIO with neutron OVS and VLAN configuration don't know how to reproduce Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: