Bug 1468631 - openvswitch segfaults when changing port VIF MTU and there's traffic flowing
openvswitch segfaults when changing port VIF MTU and there's traffic flowing
Status: ASSIGNED
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch (Show other bugs)
10.0 (Newton)
x86_64 Linux
urgent Severity urgent
: z4
: 10.0 (Newton)
Assigned To: Aaron Conole
Yariv
: Triaged, ZStream
: 1477785 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-07 11:07 EDT by Vincent S. Cojot
Modified: 2017-08-21 13:03 EDT (History)
23 users (show)

See Also:
Fixed In Version: openvswitch-2.6.1-13.git20161206.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vincent S. Cojot 2017-07-07 11:07:11 EDT
Description of problem:

It's easy to get openvswitch to segfault when changing the MTU of a vif port.
A customer of ours uses OVS with OSP10z3 and DPDK with Jumbo frames.
As VIF port do not currently inherit the MTU from the OVS-Bridge, the customer must run a cronjob to set the MTU on 'vhu*' ports when they come up.
This results in ovs-vswitchd segfaulting very often:

E.g: ovs-vsctl set interface vhu2fd7027c-33 mtu_request=9000

Results in:

Jul 05 12:52:46 tkll00p1 kernel: pmd459[6777]: segfault at 44 ip 00007fa334156dff sp 00007fa2517ef4d0 error 4 in ovs-vswitchd[7fa33407b000+3b1000]

Jul 05 12:52:46 tkll00p1 systemd[1]: ovs-vswitchd.service: main process exited, code=killed, status=11/SEGV

In ovsdb-server.log we see a line such a this one:
2017-07-05T17:52:47.975Z|00005|fatal_signal|WARN|terminating with signal 15 (Terminated)

Version-Release number of selected component (if applicable):

openvswitch-2.6.1-10.git20161206.el7fdp.x86_64

How reproducible:

From the field it seems there's about 50% chance of it segfaulting when setting MTU.



Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 Aaron Conole 2017-07-07 16:17:16 EDT
most likely this needs 546e57d44c473aac2915037f6906c9dd04294105

Will check to see if that is all.
Comment 2 Aaron Conole 2017-07-07 16:20:29 EDT
Is it also possible to get a crash dump from the customer?  That would confirm this is the issue.
Comment 9 Aaron Conole 2017-08-02 14:03:50 EDT
I've posted an upstream fix for the crash reported:

http://dpdk.org/ml/archives/dev/2017-August/072387.html
Comment 10 Andreas Karis 2017-08-02 14:33:15 EDT
Hi,

How fast are we going to have that downstream? Either as a hotfix or in the repos?

Thanks,

Andreas
Comment 11 Aaron Conole 2017-08-02 15:05:01 EDT
Needs to be accepted upstream first - I don't know how long that will take, usually a few days to a week.
Comment 12 Aaron Conole 2017-08-02 21:08:29 EDT
*** Bug 1477785 has been marked as a duplicate of this bug. ***
Comment 21 Ihar Hrachyshka 2017-08-04 11:26:50 EDT
The fix was applied in upstream repository. Please build a new package with it included.
Comment 37 Jim Sisul 2017-08-09 09:53:17 EDT
TAM and SA team met with Cisco 8/8.  Cisco is open to providing us with the necessary hardware, or access to a lab with the hardware, or helping us determine if hardware we have is functionally equivalent.

Ravi Anan (ravianan@cisco.com) is the Cisco engineer we need to contact about test environment.

Here are the notes from our meeting:

Current RH Software on Sprint Environment:
--RH OSP 10.z2
--OVS 2.6.1-3 beta (Compiled with DPDK)

Current OVS Deployment: Open vSwitch version number does not necessarily imply what version of DPDK library the upstream used to compile it.  Cisco does track what version of DPDK library works with VIC-1340.  We don't know if OVS 2.6.1-3 uses a compatible DPDK library.

Action Item: Cisco to confirm what DPDK libraries work (tested?) with VIC-1340, map DPDK library to OVS version (or check upstream opevswitch.org?).  Cisco PMD drivers also upstreamed to openvswitch.org.  Need to confirm which versions of OVS contain the correct DPDK library and correct PMD drivers.  Need to confirm that we're using a stable branch of DPDK libraries that will accumulate support patches going forward.

Current RH QE Test Lab:  RH may lack the correct hardware in their lab to test OVS hotfixes.

Action Item:  RH engineering will contact Ravi Anan to determine if our lab hardware is either the same as or compatible with UCS-B200 and VIC-1340 (Jim Sisul will pass contact info to RH QE) and if not, how to go about getting it for testing purposes.
Comment 42 Flavio Leitner 2017-08-21 11:48:55 EDT
Moving back to ASSIGNED.

Due to lack of HW to verify and enable support, the ENIC PMD driver will be disabled for 10z4.

fbl

Note You need to log in before you can comment on or make changes to this bug.