Bug 1640045 - ovs-vswitchd crashes when open flow controller is disconnected and a new port is added to bridge
Summary: ovs-vswitchd crashes when open flow controller is disconnected and a new port...
Keywords:
Status: CLOSED DUPLICATE of bug 1637926
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Numan Siddique
QA Contact: Roee Agiman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-17 08:35 UTC by Miguel Angel Ajo
Modified: 2018-10-17 12:20 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-17 11:16:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Core dump captured during the crash (874.72 KB, application/x-gzip)
2018-10-17 08:35 UTC, Miguel Angel Ajo
no flags Details

Description Miguel Angel Ajo 2018-10-17 08:35:33 UTC
Created attachment 1494767 [details]
Core dump captured during the crash

Description of problem:

When ovs-vswitchd has a configured controller for a bridge, and that controller is disconnected, the moment a new port is added to such bridge, ovs-vswitchd will crash when trying to notify the controller.

Version-Release number of selected component (if applicable):

# rpm -qa | grep openvswitch

openvswitch-selinux-extra-policy-1.0-5.el7fdp.noarch
rhosp-openvswitch-2.10-0.1.el7ost.noarch
openvswitch2.10-debuginfo-2.10.0-4.el7fdp.x86_64
openvswitch2.10-2.10.0-4.el7fdp.x86_64
How reproducible:

100%

Steps to Reproduce:
1. Have a controller added (in our case openstack will configure neutron_ovs agent as controller)
2. Stop the controller, in our case:

[root@controller-0 heat-admin]# docker rm -f neutron_ovs_agent

We can see on the logs
[root@controller-0 heat-admin]# tail -f /var/log/openvswitch/ovs-vswitchd.log
2018-10-17T07:55:01.880Z|00221|rconn|INFO|br-int<->tcp:127.0.0.1:6633: waiting 4 seconds before reconnect
2018-10-17T07:55:01.880Z|00222|rconn|INFO|br-tun<->tcp:127.0.0.1:6633: connecting...
2018-10-17T07:55:01.881Z|00223|rconn|WARN|br-tun<->tcp:127.0.0.1:6633: connection failed (Connection refused)
2018-10-17T07:55:01.881Z|00224|rconn|INFO|br-tun<->tcp:127.0.0.1:6633: waiting 4 seconds before reconnect
2018-10-17T07:55:01.881Z|00225|rconn|INFO|br-ex<->tcp:127.0.0.1:6633: connecting...
2018-10-17T07:55:01.881Z|00226|rconn|WARN|br-ex<->tcp:127.0.0.1:6633: connection failed (Connection refused)
2018-10-17T07:55:01.881Z|00227|rconn|INFO|br-ex<->tcp:127.0.0.1:6633: waiting 4 seconds before reconnect
2018-10-17T07:55:01.881Z|00228|rconn|INFO|br-isolated<->tcp:127.0.0.1:6633: connecting...
2018-10-17T07:55:01.881Z|00229|rconn|WARN|br-isolated<->tcp:127.0.0.1:6633: connection failed (Connection refused)
2018-10-17T07:55:01.881Z|00230|rconn|INFO|br-isolated<->tcp:127.0.0.1:6633: waiting 4 seconds before reconnect

3. Add a port:


[root@controller-0 heat-admin]# ovs-vsctl add-port br-int p3 -- set Interface p3 type=internal
2018-10-17T07:55:25Z|00002|jsonrpc|WARN|unix:/var/run/openvswitch/db.sock: receive error: Connection reset by peer
2018-10-17T07:55:25Z|00003|reconnect|WARN|unix:/var/run/openvswitch/db.sock: connection dropped (Connection reset by peer)



Actual results:

ovs-vswitchd has crashed (will be restarted, but sometimes the restart process hangs forever..)


Expected results:

No crash, just no attempt to send the message.

Additional info:


(gdb) bt
#0  0x00007fb002f8b207 in raise () from /lib64/libc.so.6
#1  0x00007fb002f8c8f8 in abort () from /lib64/libc.so.6
#2  0x00007fb004953026 in ofputil_protocol_to_ofp_version (protocol=<optimized out>) at lib/ofp-protocol.c:123
#3  0x00007fb00494e38e in ofputil_encode_port_status (ps=ps@entry=0x7ffc66b7f400, protocol=<optimized out>) at lib/ofp-port.c:938
#4  0x00007fb004ef1c5b in connmgr_send_port_status (mgr=0x556d54a46630, source=source@entry=0x0, pp=pp@entry=0x7ffc66b7f590, reason=reason@entry=0 '\000') at ofproto/connmgr.c:1654
#5  0x00007fb004efa9f4 in ofport_install (p=p@entry=0x556d54a460e0, netdev=netdev@entry=0x556d54acc0f0, pp=pp@entry=0x7ffc66b7f590) at ofproto/ofproto.c:2418
#6  0x00007fb004efbfb2 in update_port (ofproto=ofproto@entry=0x556d54a460e0, name=name@entry=0x556d54acf360 "tap3d8cd951-00") at ofproto/ofproto.c:2665
#7  0x00007fb004efc7f9 in ofproto_port_add (ofproto=0x556d54a460e0, netdev=0x556d54acc0f0, ofp_portp=ofp_portp@entry=0x7ffc66b7f6f8) at ofproto/ofproto.c:2012
#8  0x0000556d540a3f95 in iface_do_create (errp=0x7ffc66b7f708, netdevp=0x7ffc66b7f700, ofp_portp=0x7ffc66b7f6f8, iface_cfg=0x556d54acc5e0, br=0x556d549eaa00) at vswitchd/bridge.c:1803
#9  iface_create (port_cfg=0x556d54acde70, iface_cfg=0x556d54acc5e0, br=0x556d549eaa00) at vswitchd/bridge.c:1841
#10 bridge_add_ports__ (br=br@entry=0x556d549eaa00, wanted_ports=wanted_ports@entry=0x556d549eaae0, with_requested_port=with_requested_port@entry=false) at vswitchd/bridge.c:935
#11 0x0000556d540a5a47 in bridge_add_ports (wanted_ports=0x556d549eaae0, br=0x556d549eaa00) at vswitchd/bridge.c:951
#12 bridge_reconfigure (ovs_cfg=ovs_cfg@entry=0x556d54a1eea0) at vswitchd/bridge.c:665
#13 0x0000556d540a9199 in bridge_run () at vswitchd/bridge.c:3023
#14 0x0000556d540a02a5 in main (argc=12, argv=0x7ffc66b7fc68) at vswitchd/ovs-vswitchd.c:125

(gdb) frame 3

(gdb) p *ps
$4 = {reason = OFPPR_ADD, desc = {port_no = 7, hw_addr = {{ea = "z\205\033\277\023\273", be16 = {34170, 48923, 47891}}}, hw_addr64 = {{ea64 = "\000\000\000\000\000\000\000", be16 = {0, 0, 0, 0}}},
    name = "tap3d8cd951-00\000\a\r(\000\000\000\000\220a\244TmU\000\000:\a\r(\000\000\000\000$m\231\004\260\177\000\000\220a\244TmU\000\000\240@\230\004\260\177\000\000", <incomplete sequence \363>,
    config = (unknown: 0), state = OFPUTIL_PS_STP_LISTEN, curr = (NETDEV_F_10GB_FD | NETDEV_F_COPPER), advertised = (unknown: 0), supported = (unknown: 0), peer = (unknown: 0), curr_speed = 10000000,
    max_speed = 0}}


(gdb) frame 4
#4  0x00007fb004ef1c5b in connmgr_send_port_status (mgr=0x556d54a46630, source=source@entry=0x0, pp=pp@entry=0x7ffc66b7f590, reason=reason@entry=0 '\000') at ofproto/connmgr.c:1654
1654                msg = ofputil_encode_port_status(&ps, ofconn_get_protocol(ofconn));
(gdb) list
1649                if (ofconn == source
1650                    && rconn_get_version(ofconn->rconn) < OFP15_VERSION) {
1651                    continue;
1652                }
1653
1654                msg = ofputil_encode_port_status(&ps, ofconn_get_protocol(ofconn));
1655                ofconn_send(ofconn, msg, NULL);
1656            }
1657        }
1658    }

(gdb) p *ofconn
$7 = {node = {prev = 0x556d54a46668, next = 0x556d54a46668}, hmap_node = {hash = 1565801656, next = 0x0}, connmgr = 0x556d54a46630, rconn = 0x556d54a931b0, type = OFCONN_PRIMARY, band = OFPROTO_OUT_OF_BAND,
  enable_async_msgs = true, want_packet_in_on_miss = true, role = OFPCR12_ROLE_EQUAL, protocol = (unknown: 0), packet_in_format = OFPUTIL_PACKET_IN_STD, packet_in_counter = 0x556d54a93400, schedulers = {0x0,
    0x0}, miss_send_len = 128, controller_id = 0, reply_counter = 0x556d54a93450, async_cfg = 0x0, n_add = 0, n_delete = 0, n_modify = 0, first_op = -9223372036854775808, last_op = -9223372036854775808,
  next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, monitors = {buckets = 0x556d54a93390, one = 0x0, mask = 0, n = 0}, monitor_paused = 0, monitor_counter = 0x556d54a934a0, updates = {
    prev = 0x556d54a933b8, next = 0x556d54a933b8}, sent_abbrev_update = false, bundles = {buckets = 0x556d54a933d8, one = 0x0, mask = 0, n = 0}, next_bundle_expiry_check = 43499835}

See also attached core dump

Comment 1 Eelco Chaudron 2018-10-17 11:16:23 UTC
This is a duplicate of  1637926, I marked it as such and it will close this BZ. I'm going to replicate and try some changes based on the other BZ. If you are further down the path let me know

*** This bug has been marked as a duplicate of bug 1637926 ***

Comment 2 Numan Siddique 2018-10-17 12:20:26 UTC
Hi Eelco - I didn't notice that it's a duplicate. I submitted the patch for review - https://patchwork.ozlabs.org/patch/985340/. Not sure if it's the right fix though :)


Note You need to log in before you can comment on or make changes to this bug.