1493358 – The nic inside guest changed to down when change mtu many times with ping traffic

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1493358 - The nic inside guest changed to down when change mtu many times with ping traffic

Summary: The nic inside guest changed to down when change mtu many times with ping tra...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	openvswitch
Sub Component:
Version:	7.4
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Eelco Chaudron
QA Contact:	liting
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-20 03:01 UTC by liting
Modified:	2023-09-14 04:08 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-12-14 14:09:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description liting 2017-09-20 03:01:16 UTC

Description of problem:
The nic inside guest changed to down when change mtu many times with ping traffic

Version-Release number of selected component (if applicable):
openvswitch-2.7.2-8.git20170719.el7fdp.x86_64.rpm 

How reproducible:


Steps to Reproduce:
In the testing, there were two machines, one server was "cisco-c220m3-01.rhts.eng.pek2.redhat.com", the other was "dell-per730-02.rhts.eng.pek2.redhat.com", the two machines connect directly.
1. 
configured ovs on cisco machine as following,
/usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
 /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0x2
/usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=1024,0
/usr/bin/ovs-vsctl --timeout 10 set Open_vSwitch . other_config:pmd-cpu-mask=1
ovs-vsctl --timeout 10 set Open_vSwitch . other_config:pmd-cpu-mask=30
systemctl restart openvswitch
/usr/bin/ovs-vsctl --timeout 10 add-br br0 -- set bridge br0 datapath_type=netdev
/usr/bin/ovs-vsctl --timeout 10 add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:0b:00.0 
/usr/bin/ovs-vsctl --timeout 10 add-port br0 dpdkvhostuser0 -- set Interface dpdkvhostuser0 type=dpdkvhostuser 
sudo /usr/bin/ovs-ofctl -O OpenFlow13 --timeout 10 del-flows br0 
 /usr/bin/ovs-ofctl -O OpenFlow13 --timeout 10 add-flow br0 idle_timeout=0,in_port=1,action=output:2
/usr/bin/ovs-ofctl -O OpenFlow13 --timeout 10 add-flow br0 idle_timeout=0,in_port=2,action=output:1
chmod 777 /var/run/openvswitch/dpdkvhostuser0

There is guest inside cisco machine, configured ip of guest eth1 as following:
ip addr add 192.168.10.1/24 dev eth1

2.
configured ip on dell02 machine:
ip addr add 192.168.10.2/24 dev p3p1
and ping the guest inside cisco machine as following:
ping -n -i 0.001 192.168.10.1

3.
changed mtu many times on cisco machine, such as following commands.
ovs-vsctl set int dpdkvhostuser0 mtu_request=1900
ovs-vsctl set int dpdkvhostuser0 mtu_request=2000
ovs-vsctl set int dpdkvhostuser0 mtu_request=2200
ovs-vsctl set int dpdkvhostuser0 mtu_request=2300
ovs-vsctl set int dpdkvhostuser0 mtu_request=9000
ovs-vsctl set int dpdkvhostuser0 mtu_request=2000
ovs-vsctl set int dpdkvhostuser0 mtu_request=1500


Actual results:
After changed mtu many times with ping traffic, the eth1 inside cisco machine' guest changed to down as following, and dell02 ping "192.168.10.1" failed. 
eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
    link/ether 56:48:4f:53:54:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.1/24 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::5448:4fff:fe53:5401/64 scope link 
       valid_lft forever preferred_lft forever

There was no segfault in /var/log/messages. 
And the ovs service seems worked well as following.
[root@cisco-c220m3-01 openvswitch]# ovs-vsctl show
8ab35f85-52e0-498d-a1fd-3dd3408ba1ce
    Bridge "br0"
        Port "dpdkvhostuser0"
            Interface "dpdkvhostuser0"
                type: dpdkvhostuser
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:0b:00.0"}
        Port "br0"
            Interface "br0"
                type: internal
    ovs_version: "2.7.2"

Run "ip link set eth1 up" failed, it need to restart guest,and then the eth1 change to up.

I tested it on ixgbe nic and i40e nic, they also had this issue.

Expected results:
The eth1 should be keep up when changed mtu many times with ping traffic.

Additional info:

Comment 2 Eelco Chaudron 2017-10-27 11:40:33 UTC

I tried to replicate this issue on two of my setups with the configuration below, but I was not able to see the issue. Can you make your setup available so I can look at it? Also, do you have any vswitchd/system logs? Versions of Queme etc.? Maybe openvswitch is restarting, which will cause traffic to stop, etc.


INFO ON THE TWO SYSTEMS I TRIED IT ON:
======================================


[wsfd-netdev67:~]$ rpm -q openvswitch kernel qemu-kvm-rhev libvirt
openvswitch-2.7.2-8.git20170719.el7fdp.x86_64
kernel-3.10.0-693.el7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64
libvirt-3.2.0-14.el7_4.3.x86_64


[wsfd-netdev67:~]$ lshw -c network -businfo
Bus info          Device      Class          Description
========================================================
pci@0000:03:00.0  p6p1        network        MT27520 Family [ConnectX-3 Pro]
pci@0000:01:00.0              network        82599ES 10-Gigabit SFI/SFP+ Network Connection
pci@0000:01:00.1              network        82599ES 10-Gigabit SFI/SFP+ Network Connection



[wsfd-netdev64:~]$ rpm -q openvswitch kernel qemu-kvm-rhev libvirt
openvswitch-2.7.2-8.git20170719.el7fdp.x86_64
kernel-3.10.0-680.el7.gre_test_branch.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64
libvirt-3.2.0-9.el7.x86_64

[wsfd-netdev64:~]$ lshw -c network -businfo
Bus info          Device       Class          Description
=========================================================
pci@0000:05:00.0               network        Ethernet Controller XL710 for 40GbE QSFP+
pci@0000:05:00.1               network        Ethernet Controller XL710 for 40GbE QSFP+



for i in {1..100}
do
    echo "========== RUN $i =========="
    ovs-vsctl set int vhost0 mtu_request=1900
    sleep 1
    ovs-vsctl set int vhost0 mtu_request=2000
    sleep 1
    ovs-vsctl set int vhost0 mtu_request=2200
    sleep 1
    ovs-vsctl set int vhost0 mtu_request=2300
    sleep 1
    ovs-vsctl set int vhost0 mtu_request=9000
    sleep 1
    ovs-vsctl set int vhost0 mtu_request=2000
    sleep 1
    ovs-vsctl set int vhost0 mtu_request=1500
    sleep 5
done

Comment 3 Eelco Chaudron 2017-12-14 14:09:31 UTC

This BZ has been in need info/setup access for 1+ month. Will close the BZ for now, please re-open when the setup and additional info is ready.

Comment 4 Red Hat Bugzilla 2023-09-14 04:08:10 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.