| Summary: | Deleting an 'used' ovs port leads to ofport assigned duplication | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Miguel Angel Ajo <majopela> | ||||
| Component: | openvswitch | Assignee: | Eelco Chaudron <echaudro> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | ovs-qe | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 7.3 | CC: | aloughla, atragler, cascardo, echaudro, fleitner, kevin, majopela | ||||
| Target Milestone: | rc | Flags: | majopela:
needinfo+
|
||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-05-10 08:53:52 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Hi, Ajo. What do you mean by "kill the port"? Do you mean remove it from the switch, by using ovs-vsctl del-port? Thanks. Cascardo. It's an internal port, I'd guess it's both when we do a del-port ? Hey, team. Moving back to you. I have crossed this bug before, as it was reported on the mailing list, and my first investigation pointed out that a race was possible when assigning an ofport number. Take a look at alloc_ofp_port at ofproto/ofproto.c. Maybe it's not possible that multiple threads will run it, that was a check I needed to do. But maybe there is some error path in there that allows a given ofp_port to be reused. Cascardo. Hi Miguel, I tried to replicate this with your minimal info, but I can not see the problem both on 2.5, or on the latest 2.6.1 test image ( http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch/2.6.1/3.git20161206.el7fdb/x86_64/openvswitch-2.6.1-3.git20161206.el7fdb.x86_64.rpm). Can you give me more exact step on how to replicate this, i.e. command line's executed. Also if you are re-trying, please try also with 2.6.1. FYI I tried stuff like; ovs-vsctl add-br br0 REPEAT x times { ovs-vsctl add-port br0 vlanX -- set interface vlanX type=internal killall dnsmasq /sbin/dnsmasq --interface vlanX \ --dhcp-range 192.168.122.2,192.168.122.254 ovs-vsctl del-port br0 vlanX } ovs-vsctl add-port br0 enp129s0f0 Could you try leaving a few seconds between dnsmasq and del-port? could you also move the port inside a namespace, and run dnsmasq inside such namespace too? I wonder if we need to exercise any DHCP request at all, but I guess it doesn't change the picture. Thanks for trying, if those changes don't make an effect I will revisit the reproducer details with the Upstream Openstack logs where we saw that. I tried your suggestions but no luck replicating this. Please provide a simple reproducer so I can continue my investigation. Going over the code I see no obvious way how this could have occurred. @eelco, do you mind if I "Un-Private" our comments? I would like to open this thread up, to have help from upstream on this matter. Went over the mailing list and could not find the previous report Cascardo was talking about. Also walked a bit over the code and I see no obvious way a duplicate ofport could be assigned. Tried various ways to reproduce this, even talking to Miguel, but I'm not successful. To continue my investigation I need a "simple" reproducer. I'm removing the private flags of our messages to ask for help upstream. Sorry, re-adding my needinfo, I'm asking the neutron PTL for the details, I remember he added a workaround for avoiding this being reproduced in the neutron agents, but I can't remember the exact details. Hi, I don't have an easy script to reproduce it, I was only able to get it to work using the actual neutron DHCP agent running tempest tests back when I worked on the bug so it may be a combination of flow rules being setup for the port by the L2 agent as well. One option would be to do a devstack stable/newton setup and then revert commit 2f44402777a662fb68a069443b41c75b68b05287 and restart the agent to put it in the state before my bug fix. Then a cycle of creating and deleting networks with subnets while regular tempest tests are being executed might reproduce it. This was heavily impacting our gate jobs running xenial at the time. Sorry I don't have something more concrete. Two more things that may help: 1. This was with the version of OVS that shipped with Ubuntu xenial back in September. 2. The dhcp agent does immediately move the tap device after creating it into a namespace that it runs dnsmasq in. This may be an important component since we had problems with ports disappearing from vswitchd for a short period time after moving and then re-appearing (https://bugs.launchpad.net/neutron/+bug/1618987). As we where not able to get a reproducer for this I'm closing this BZ for now with insufficient data. We can re-open if we have a reproducer. |
Created attachment 1202461 [details] ovs-vswitchd log when reproduced. Description of problem: Within the neutron context, we found that sometimes openvswitch would assign a duplicated ofport. We initially tried to reproduce such behaviour with no success. Kevin Benton discovered that this happens with dnsmasq in place making use of the port. If you delete the port while dnsmasq is making use of it, eventually it makes ovs-vswitchd crazy 'added interface tap%% on port ##' happens a gazillion time pointing to the same ofport. Version-Release number of selected component (if applicable): 2.0.x and 2.5.x series. How reproducible: sometimes only. Steps to Reproduce: 1. create an internal port [we do it in a namespace, but I'm not sure that's critical to reproduce it] 2. attach dnsmasq to it 3. kill the port 4. repeat 1-3 several times 5. attach new ports Actual results: 2016-09-16T02:26:55.037Z|00583|dpif|WARN|system@ovs-system: port_del failed (No such device) 2016-09-16T02:26:55.083Z|00584|bridge|INFO|bridge br-int: added interface tap60c6b7ea-54 on port 163 2016-09-16T02:26:57.347Z|00585|dpif|WARN|system@ovs-system: port_del failed (No such device) 2016-09-16T02:26:57.413Z|00586|bridge|INFO|bridge br-int: added interface tap8b78341c-10 on port 166 2016-09-16T02:26:57.469Z|00587|bridge|INFO|bridge br-int: added interface tap10cc4d3a-5d on port 166 2016-09-16T02:26:57.469Z|00588|netdev_linux|WARN|Dropped 1 log messages in last 20 seconds (most recently, 20 seconds ago) due to excessive rate 2016-09-16T02:26:57.469Z|00589|netdev_linux|WARN|query tap8b78341c-10 qdisc failed (No such device) 2016-09-16T02:26:57.517Z|00590|bridge|INFO|bridge br-int: added interface tap8b78341c-10 on port 166 2016-09-16T02:26:57.565Z|00591|bridge|INFO|bridge br-int: added interface tap10cc4d3a-5d on port 166 2016-09-16T02:26:57.637Z|00592|bridge|INFO|bridge br-int: added interface tap8b78341c-10 on port 166 2016-09-16T02:26:57.693Z|00593|bridge|INFO|bridge br-int: added interface tap10cc4d3a-5d on port 166 2016-09-16T02:26:57.737Z|00594|bridge|INFO|bridge br-int: added interface tap8b78341c-10 on port 166 2016-09-16T02:26:57.773Z|00595|bridge|INFO|bridge br-int: added interface tap10cc4d3a-5d on port 166 2016-09-16T02:26:57.825Z|00596|bridge|INFO|bridge br-int: added interface tap8b78341c-10 on port 166 2016-09-16T02:26:57.869Z|00597|bridge|INFO|bridge br-int: added interface tap10cc4d3a-5d on port 166 2016-09-16T02:26:57.925Z|00598|bridge|INFO|bridge br-int: added interface tap8b78341c-10 on port 166 2016-09-16T02:26:57.965Z|00599|bridge|INFO|bridge br-int: added interface tap10cc4d3a-5d on port 166 2016-09-16T02:26:58.017Z|00600|bridge|INFO|bridge br-int: added interface tap8b78341c-10 on port 166 2016-09-16T02:26:58.077Z|00601|bridge|INFO|bridge br-int: added interface tap10cc4d3a-5d on port 166 2016-09-16T02:26:58.121Z|00602|bridge|INFO|bridge br-int: added interface tap8b78341c-10 on port 166 2016-09-16T02:26:58.165Z|00603|bridge|INFO|bridge br-int: added interface tap10cc4d3a-5d on port 166 2016-09-16T02:26:58.217Z|00604|bridge|INFO|bridge br-int: added interface tap8b78341c-10 on port 166 2016-09-16T02:26:58.249Z|00605|bridge|INFO|bridge br-int: added interface tap10cc4d3a-5d on port 166 2016-09-16T02:26:58.301Z|00606|bridge|INFO|bridge br-int: added interface tap8b78341c-10 on port 166 2016-09-16T02:26:58.373Z|00607|bridge|INFO|bridge br-int: added interface tap10cc4d3a-5d on port 166 2016-09-16T02:26:58.421Z|00608|bridge|INFO|bridge br-int: added interface tap8b78341c-10 on port 166 2016-09-16T02:26:58.484Z|00609|bridge|INFO|bridge br-int: added interface tap10cc4d3a-5d on port 166 Expected results: No duplicated port assignments Additional info: