RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1346232 - [fdBeta] OvS rtnetlink race condition
Summary: [fdBeta] OvS rtnetlink race condition
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openvswitch
Version: 7.2
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Aaron Conole
QA Contact: qding
URL:
Whiteboard:
Depends On:
Blocks: 1353185 1362393 1377912 1397050
TreeView+ depends on / blocked
 
Reported: 2016-06-14 10:30 UTC by Edward Haas
Modified: 2017-07-13 06:08 UTC (History)
11 users (show)

Fixed In Version: openvswitch-2.5.0-19.git20160727.el7fdb
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1397050 (view as bug list)
Environment:
Last Closed: 2017-01-12 15:42:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
dmesg (159.87 KB, text/plain)
2016-06-14 10:30 UTC, Edward Haas
no flags Details
openvswitch and dmesg logs during failure (600.00 KB, application/x-tar)
2016-06-15 07:17 UTC, Edward Haas
no flags Details
Open vSwitch RPM (2.36 MB, application/x-rpm)
2016-07-21 20:18 UTC, Aaron Conole
no flags Details
tnl-ports: fix missing netdev_close (1.54 KB, patch)
2016-10-14 14:07 UTC, Thadeu Lima de Souza Cascardo
no flags Details | Diff

Description Edward Haas 2016-06-14 10:30:59 UTC
Created attachment 1167827 [details]
dmesg

Description of problem:
Attempting to attach and create an internal interface to an OVS switch, fails.
The problem has been identified and recreated only after running some RHEV VDSM functional tests (where virtual interfaces are repeatedly created and destroyed).

Version-Release number of selected component (if applicable):
RHEL 7.2 , 3.10.0-327.18.2.el7.x86_64

ovs-vsctl (Open vSwitch) 2.5.0
Compiled Mar 18 2016 15:00:11
DB Schema 7.12.1

How reproducible:
Always on specific hosts following the VDSM functional networking test run.

Steps to Reproduce:
1. Run VDSM network functional tests.
2. Create an ifcfg file to create a bridge:
DEVICE=test-network4
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=yes
MTU=1500
DEFROUTE=no
NM_CONTROLLED=no
IPV6INIT=no

3. Issue: ifup test-network4 ; ifdown test-network4
4. Try to create an OVS bridge with an internal interface:
ovs-vsctl -- add-br vdsmbr_Z7Yaecvr -- add-port  vdsmbr_Z7Yaecvr test-network4 -- set Interface test-network4  type=internal

In order to repeat the test, restart ovs service and redo step 3.


Actual results:


Expected results:
Adding an OVS bridge and creating an internal interface to it should succeed.

Additional info:
- After ovs restart, the ovs command succeeds.
- Reproduced also on Centos 7.2

Comment 1 Edward Haas 2016-06-14 10:31:55 UTC
# ovs-vsctl -- add-br vdsmbr -- add-port vdsmbr test-network4 -- set Interface test-network4 type=internal
ovs-vsctl: Error detected while setting up 'test-network4'.  See ovs-vswitchd log for details.
[root@dhcp-0-228 tests]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 52:54:00:a2:0a:80 brd ff:ff:ff:ff:ff:ff
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 52:54:00:bd:d2:21 brd ff:ff:ff:ff:ff:ff
6: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT
    link/ether 92:df:72:ee:45:47 brd ff:ff:ff:ff:ff:ff
7: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT
    link/ether b2:a2:2e:09:56:0e brd ff:ff:ff:ff:ff:ff
188: vdsmbr: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT
    link/ether 4a:30:58:eb:bc:44 brd ff:ff:ff:ff:ff:ff
[root@dhcp-0-228 tests]# dmesg > dmesg.log

Comment 2 Edward Haas 2016-06-14 10:38:19 UTC
(In reply to Edward Haas from comment #0)
> Created attachment 1167827 [details]
> dmesg
> 
> Description of problem:
> Attempting to attach and create an internal interface to an OVS switch,
> fails.
> The problem has been identified and recreated only after running some RHEV
> VDSM functional tests (where virtual interfaces are repeatedly created and
> destroyed).
> 
> Version-Release number of selected component (if applicable):
> RHEL 7.2 , 3.10.0-327.18.2.el7.x86_64
> 
> ovs-vsctl (Open vSwitch) 2.5.0
> Compiled Mar 18 2016 15:00:11
> DB Schema 7.12.1
> 
> How reproducible:
> Always on specific hosts following the VDSM functional networking test run.
> 
> Steps to Reproduce:
> 1. Run VDSM network functional tests.
> 2. Create an ifcfg file to create a bridge:
> DEVICE=test-network4
> TYPE=Bridge
> DELAY=0
> STP=off
> ONBOOT=yes
> MTU=1500
> DEFROUTE=no
> NM_CONTROLLED=no
> IPV6INIT=no
> 
> 3. Issue: ifup test-network4 ; ifdown test-network4
3.1. brctl delbr test-network4

> 4. Try to create an OVS bridge with an internal interface:
> ovs-vsctl -- add-br vdsmbr_Z7Yaecvr -- add-port  vdsmbr_Z7Yaecvr
> test-network4 -- set Interface test-network4  type=internal
> 
> In order to repeat the test, restart ovs service and redo step 3.
> 
> 
> Actual results:
> 
> 
> Expected results:
> Adding an OVS bridge and creating an internal interface to it should succeed.
> 
> Additional info:
> - After ovs restart, the ovs command succeeds.
> - Reproduced also on Centos 7.2

Comment 4 Aaron Conole 2016-06-14 14:14:10 UTC
From the output:

ovs-vsctl: Error detected while setting up 'test-network4'.  See ovs-vswitchd log for details.

Can you please include this log file; it should be located in /var/log/openvswitch/

Comment 5 Edward Haas 2016-06-15 07:17:30 UTC
Created attachment 1168215 [details]
openvswitch and dmesg logs during failure

Following the VDSM functional tests run, the command to add an OVS bridge with an internal port fails:

[root@edwardh-host-1 tests]# ovs-vsctl show
9413d5d0-8531-41c3-abfc-2a4d33c3150a
    ovs_version: "2.5.0"
[root@edwardh-host-1 tests]# ovs-vsctl -- add-br vdsmbr_Z7Yaecvr -- add-port vdsmbr_Z7Yaecvr test-network4 -- set Interface test-network4  type=internal
ovs-vsctl: Error detected while setting up 'test-network4'.  See ovs-vswitchd log for details.


Note: The bridge itself is added, only test-network4 is not.

Comment 6 Petr Horáček 2016-06-28 12:20:13 UTC
Note that this happens only when there was a dummy attached to pre-ovs linux bridge. If there was veth instead of dummy, OVS is able to create internal iface with the same name as linux bridge had.

Comment 8 Aaron Conole 2016-07-21 20:18:39 UTC
Created attachment 1182660 [details]
Open vSwitch RPM

I have backported the commits and built this RPM.  Please test on your fedora VMs and see if it resolves your issues, if so, I will work with Flavio to make a working official release.

Comment 9 Petr Horáček 2016-07-26 14:28:22 UTC
I installed openvswitch from your RPM, but the problem is still there. Have you tried it on the provided reproducer-VM?

Comment 10 Aaron Conole 2016-07-26 14:47:45 UTC
Strange - it seemed to have earlier... now it's back.

Okay, sorry for getting hopes up.

I'll be on PTO for the next two weeks, but will pick this up after that.  I can say, for sure that current upstream master seemed to consistently work for me.

Comment 11 Aaron Conole 2016-09-01 13:24:17 UTC
This seems to be caused by the use of netdev_open() in various places in Open vSwitch, instead of using netdev_from_name() to acquire an interface handle.  This results in creating an incorrectly typed netdev, and confuses the ovs-vswitchd.

There is quite a bit of work to do this correctly, after discussion with Thadeu Cascardo (who has been working on this issue upstream).

Comment 12 Thadeu Lima de Souza Cascardo 2016-09-01 13:34:22 UTC
Hi, Aaron.

This problem is more of a race condition, as you mentioned. However, in Edward's particular case, I was not sure there was really a race, but more of a leak of reference.

When you pointed out the multiple addresses patchset from Pravin, it looked like there was indeed a race when you ran your tests that could have caused that reference leak. And I thought I fixed that when I added patch "route-table: flush addresses list when route table is reset" (commit c2a1ceed07cf3f0dff616b047012f9d3c7a879aa).

Have you tried latest master? Do you still reproduce the issue there? Maybe we can attack the symptom more easily?

Thanks.
Cascardo.

Comment 13 Aaron Conole 2016-09-01 13:58:27 UTC
Master wasn't reproducing it, at the time I tried a month ago.  However, that predates the commit you reference; I can try applying that patch and the requisite dependency patches and see if that "resolves" the issue, but I'm not sure.

Comment 14 Yaniv Lavi 2016-09-07 17:13:09 UTC
Any updates? This is a critical issue for RHV OVS support.

Comment 15 Aaron Conole 2016-09-07 18:31:38 UTC
Sorry, I am collecting some more info before requesting the backport.  There are multiple additional commits involved, but I seem to have a fix by using the series I referenced previously, as well as the commit which Thadeu references.  I can provide a private RPM if that helps to get past this issue.

Comment 16 Aaron Conole 2016-09-08 20:33:52 UTC
Please try the RPMs at https://copr.fedorainfracloud.org/coprs/aconole/openvswitch/ and confirm that this resolves your issues.  If so, we may be able to request a backport upstream.

Comment 17 Petr Horáček 2016-09-12 11:23:16 UTC
Thanks a lot Aaron, the openvswitch from your repository fixes our problem.

Comment 18 Aaron Conole 2016-09-30 19:51:07 UTC
After scrubbing through, I think the following are the minimum sets of commits needed.  Will ping Thadeu / Flavio.

2b02db1b4cb2152e4aa2ac441bcc984ef3b929e3
a8704b502785a9661721f041b2ee168d7a4eb460
3e6dc8b7a8250d21c3cba65cae482bb1524d89a4
c2a1ceed07cf3f0dff616b047012f9d3c7a879aa

This set passes a make-check and includes the suggested fix by Thadeu.

Comment 19 Thadeu Lima de Souza Cascardo 2016-09-30 20:17:34 UTC
I had some suspicious that the multiple address patch would be needed, but I was not sure. I am not sure yet. We can try to explain why it would be.

But this is a much more palatable patchset to backport, instead of the full IPv6 tunnel support enable.

If we have an explanation, maybe we can push the backport upstream.

Thanks.
Cascardo.

Comment 20 Thadeu Lima de Souza Cascardo 2016-10-14 14:07:22 UTC
Created attachment 1210548 [details]
tnl-ports: fix missing netdev_close

This patch seems to fix the bug. Aaron has tested it.

This is a much smaller patch to apply to OVS 2.5, instead of backporting a lot of changes that happened on insert_ipdev.

There is a better chance to get this applied upstream.

Cascardo.

Comment 21 Aaron Conole 2016-10-20 15:46:30 UTC
Thadeu's patch was accepted; I've applied it to -19.git20160727 of the RPM

Comment 26 Thadeu Lima de Souza Cascardo 2016-10-27 12:16:44 UTC
One option to test this is add a veth interface named veth0, set both ends of the veth ports up, add an address to veth0, then remove the ports, and add veth0 as an internal port to ovs.

ovs-vsctl add-br br0
ip link add type veth
ip link set veth1 up
ip link set veth0 up
ip addr add 172.16.99.100/24 dev veth0
ip link del veth0
ovs-vsctl add-port br0 veth0 -- set iface veth0 type=internal

However, as I recall, there might be a race involved, not sure if I had to have two pairs of veths and while I was doing down and up on one end of one pair, I was adding and removing a new pair or something like that.

Let me know if you manage to reproduce only with the above, maybe on a loop, where you remove the internal port before doing it all over again.

Regards.
Cascardo.

Comment 27 Petr Horáček 2016-10-31 10:40:31 UTC
> ovs-vsctl add-port br0 veth0 -- set iface veth0 type=internal

You have to use "interface" instead of "iface" for table name. However, I was not able to reproduce it with this code.

You don't have to run VDSM functional tests, you can just execute a few VDSM commands (I'm sorry, don't know how to reproduce it with brctl/ip/ovs-vsctl yet):

dnf install openvswitch
systemctl start openvswitch
dnf install http://resources.ovirt.org/pub/yum-repo/ovirt-release40.rpm
dnf install vdsm
vdsm-tool configure --force
systemctl start vdsmd
python
>>> from vdsm import vdscli
>>> c = vdscli.connect()
>>> c.setupNetworks({'test-network': {}}, {}, {'connectivityCheck': False})
>>> c.setupNetworks({'test-network': {'remove': True}}, {}, {'connectivityCheck': False})
ovs-vsctl add-br br1 -- add-port br1 test-network -- set Interface test-network type=internal

Hope it helps.

Comment 28 qding 2016-11-01 06:37:15 UTC
Hi Edward, Cascardo and Petr,

Thank you all for giving feedback and good help.
I reproduce it with the steps in Comment#26, but in a loop way. Please help check if they're the same issue. The script and logs are listed below. Thank you.


[root@dell-per730-04 openvswitch]# cat t
i=0

ovs-vsctl add-br br0
ip link set br0 up

while true
do
	echo -n .

	ip link add type veth
	ip link set veth1 up
	ip link set veth0 up
	ip addr add 172.16.99.100/24 dev veth0
	ip link del veth0

	ovs-vsctl add-port br0 veth0 -- set interface veth0 type=internal
	
	ovs-vsctl del-port br0 veth0

	((i++))
done

[root@dell-per730-04 openvswitch]# bash t
ovs-vsctl: cannot create a bridge named br0 because a bridge named br0 already exists
.........................................................................ovs-vsctl: Error detected while setting up 'veth0'.  See ovs-vswitchd log for details.
................ovs-vsctl: Error detected while setting up 'veth0'.  See ovs-vswitchd log for details.
.............^C
[root@dell-per730-04 openvswitch]# cat /var/log/openvswitch/ovs-vswitchd.log | grep WARN | tail -n2
2016-11-01T06:30:08.092Z|01214|netdev_linux|WARN|ethtool command ETHTOOL_GFLAGS on network device veth0 failed: No such device
2016-11-01T06:30:08.092Z|01215|dpif|WARN|system@ovs-system: failed to add veth0 as port: No such device
[root@dell-per730-04 openvswitch]# ip link show 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 14:18:77:35:5b:1b brd ff:ff:ff:ff:ff:ff
3: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 14:18:77:35:5b:1c brd ff:ff:ff:ff:ff:ff
4: em3: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 14:18:77:35:5b:1d brd ff:ff:ff:ff:ff:ff
5: em4: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 14:18:77:35:5b:1e brd ff:ff:ff:ff:ff:ff
6: p5p1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 90:e2:ba:90:e8:a4 brd ff:ff:ff:ff:ff:ff
7: p5p2: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 90:e2:ba:90:e8:a5 brd ff:ff:ff:ff:ff:ff
8: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT qlen 1
    link/gre 0.0.0.0 brd 0.0.0.0
9: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
10: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT qlen 1000
    link/ether 52:54:00:67:95:57 brd ff:ff:ff:ff:ff:ff
11: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT qlen 1000
    link/ether 52:54:00:67:95:57 brd ff:ff:ff:ff:ff:ff
12: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 2e:7b:53:89:66:dd brd ff:ff:ff:ff:ff:ff
2145: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1000
    link/ether 16:11:5b:b7:a4:48 brd ff:ff:ff:ff:ff:ff
9329: veth2@veth3: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 66:97:67:ac:9b:61 brd ff:ff:ff:ff:ff:ff
9330: veth3@veth2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 82:58:ed:46:59:e1 brd ff:ff:ff:ff:ff:ff
[root@dell-per730-04 openvswitch]# ip link show | grep veth
9329: veth2@veth3: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
9330: veth3@veth2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
[root@dell-per730-04 openvswitch]# ovs-vsctl show
32e56284-dd34-4b90-995a-47769e40e973
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
    ovs_version: "2.5.0"
[root@dell-per730-04 openvswitch]#

Comment 29 Petr Horáček 2016-11-08 12:27:04 UTC
Hello,

that does not look like a reproducer of our problem. We end up in state when after calling `ovs-vsctl add-port br0 veth0 -- set interface veth0 type=internal` we get warning and there is not interface veth0 listed in `ip link`, but it is listed in `ovs-vsctl show`.

In your test it looks more like there is a race and `ip link del veth0` was not completed before `ovs-vsctl add-port br0 veth0 -- set interface veth0 type=internal` was executed.

Could you try reproducer from Comment#27? I'm sorry it is not pure command line, but I can give you image of reproducing VM if it would help.

Comment 30 qding 2016-11-10 01:22:51 UTC
(In reply to Petr Horáček from comment #29)
> 
> Could you try reproducer from Comment#27? I'm sorry it is not pure command
> line, but I can give you image of reproducing VM if it would help.

Hi Petr,

Reproduced with the steps in Comment#27 and openvswitch-2.5.0-17.git20160727.el7fdb.x86_64.
And not found the issue with openvswitch-2.5.0-19.git20160727.el7fdb.x86_64
Thank you.

QJ


Note You need to log in before you can comment on or make changes to this bug.