Created attachment 1492509 [details] OVS LOG Description of problem: as part of our downstream OSP + ODL testing we bring opendaylight service down and up (it takes ~10mins between down and up) during that 10min while opendaylight is down ovs sees: 2018-10-10T09:03:37.631Z|01744|rconn|INFO|br-int<->tcp:172.17.1.21:6653: connection closed by peer 2018-10-10T09:03:37.881Z|01745|rconn|INFO|br-int<->tcp:172.17.1.21:6653: connecting... 2018-10-10T09:03:37.881Z|01746|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:03:37.881Z|01747|rconn|INFO|br-int<->tcp:172.17.1.21:6653: waiting 2 seconds before reconnect 2018-10-10T09:03:39.881Z|01748|rconn|INFO|br-int<->tcp:172.17.1.21:6653: connecting... 2018-10-10T09:03:39.882Z|01749|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:03:39.882Z|01750|rconn|INFO|br-int<->tcp:172.17.1.21:6653: waiting 4 seconds before reconnect 2018-10-10T09:03:43.881Z|01751|rconn|INFO|br-int<->tcp:172.17.1.21:6653: connecting... 2018-10-10T09:03:43.882Z|01752|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:03:43.882Z|01753|rconn|INFO|br-int<->tcp:172.17.1.21:6653: continuing to retry connections in the background but suppressing further logging 2018-10-10T09:03:59.881Z|01768|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:04:07.882Z|01770|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:04:15.881Z|01771|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:04:23.882Z|01772|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:04:31.886Z|01773|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:04:39.881Z|01774|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:04:47.884Z|01775|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:04:55.882Z|01776|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:05:03.881Z|01777|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:05:11.881Z|01778|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:05:19.881Z|01779|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:05:27.881Z|01780|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:05:28.201Z|00001|bfd(handler26)|INFO|tun70020c4c630: BFD state change: up->down "No Diagnostic"->"Neighbor Signaled Session Down". Forwarding: true Detect Multiplier: 3 Concatenated Path Down: false TX Interval: Approx 1000ms RX Interval: Approx 1000ms Detect Time: now +3000ms Next TX Time: now +160ms Last TX Time: now -800ms Local Flags: none Local Session State: up Local Diagnostic: No Diagnostic Local Discriminator: 0x3bc62f48 Local Minimum TX Interval: 1000ms Local Minimum RX Interval: 1000ms Remote Flags: none Remote Session State: down Remote Diagnostic: No Diagnostic Remote Discriminator: 0x62f329d8 Remote Minimum TX Interval: 1000ms Remote Minimum RX Interval: 1000ms Remote Detect Multiplier: 3 2018-10-10T09:05:28.201Z|00002|bfd(handler26)|INFO|tun70020c4c630: Remote signaled STATE_DOWN. vers:1 diag:"No Diagnostic" state:down mult:3 length:24 flags: none my_disc:0x62f329d8 your_disc:0x0 min_tx:1000000us (1000ms) min_rx:1000000us (1000ms) min_rx_echo:0us (0ms) Forwarding: true Detect Multiplier: 3 Concatenated Path Down: false TX Interval: Approx 1000ms RX Interval: Approx 1000ms Detect Time: now +2999ms Next TX Time: now +160ms Last TX Time: now -800ms Local Flags: none Local Session State: down Local Diagnostic: Neighbor Signaled Session Down Local Discriminator: 0x3bc62f48 Local Minimum TX Interval: 1000ms Local Minimum RX Interval: 1000ms Remote Flags: none Remote Session State: down Remote Diagnostic: No Diagnostic Remote Discriminator: 0x0 Remote Minimum TX Interval: 0ms Remote Minimum RX Interval: 1ms Remote Detect Multiplier: 3 2018-10-10T09:05:29.052Z|00003|bfd(handler26)|INFO|tun70020c4c630: New remote min_rx. vers:1 diag:"No Diagnostic" state:init mult:3 length:24 flags: none my_disc:0x62f329d8 your_disc:0x3bc62f48 min_tx:1000000us (1000ms) min_rx:1000000us (1000ms) min_rx_echo:0us (0ms) Forwarding: true Detect Multiplier: 3 Concatenated Path Down: false TX Interval: Approx 1000ms RX Interval: Approx 1000ms Detect Time: now +2148ms Next TX Time: now +69ms Last TX Time: now -691ms Local Flags: none Local Session State: down Local Diagnostic: Neighbor Signaled Session Down Local Discriminator: 0x3bc62f48 Local Minimum TX Interval: 1000ms Local Minimum RX Interval: 1000ms Remote Flags: none Remote Session State: init Remote Diagnostic: No Diagnostic Remote Discriminator: 0x62f329d8 Remote Minimum TX Interval: 0ms Remote Minimum RX Interval: 1000ms Remote Detect Multiplier: 3 2018-10-10T09:05:29.052Z|00004|bfd(handler26)|INFO|tun70020c4c630: BFD state change: down->up "Neighbor Signaled Session Down"->"Neighbor Signaled Session Down". Forwarding: true Detect Multiplier: 3 Concatenated Path Down: false TX Interval: Approx 1000ms RX Interval: Approx 1000ms Detect Time: now +3000ms Next TX Time: now +69ms Last TX Time: now -691ms Local Flags: none Local Session State: down Local Diagnostic: Neighbor Signaled Session Down Local Discriminator: 0x3bc62f48 Local Minimum TX Interval: 1000ms Local Minimum RX Interval: 1000ms Remote Flags: none Remote Session State: init Remote Diagnostic: No Diagnostic Remote Discriminator: 0x62f329d8 Remote Minimum TX Interval: 1000ms Remote Minimum RX Interval: 1000ms Remote Detect Multiplier: 3 2018-10-10T09:05:35.882Z|01781|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:05:42.933Z|01782|connmgr|INFO|br-int<->tcp:172.17.1.11:6653: 1 flow_mods 10 s ago (1 adds) 2018-10-10T09:05:43.881Z|01783|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:05:51.882Z|01784|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:05:59.881Z|01785|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:06:07.882Z|01786|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:06:15.882Z|01787|rconn|WARN|br-int<->tcp:172.17.1.21:6653: connection failed (Connection refused) 2018-10-10T09:06:16.455Z|00002|bfd(monitor34)|INFO|tunf301e6505ab: BFD state change: up->down "No Diagnostic"->"Control Detection Time Expired". Forwarding: true Detect Multiplier: 3 Concatenated Path Down: false TX Interval: Approx 1000ms RX Interval: Approx 1000ms Detect Time: now +0ms Next TX Time: now +970ms Last TX Time: now +0ms Local Flags: none Local Session State: up Local Diagnostic: No Diagnostic Local Discriminator: 0xa18d8eda Local Minimum TX Interval: 1000ms Local Minimum RX Interval: 1000ms Remote Flags: none Remote Session State: up Remote Diagnostic: No Diagnostic Remote Discriminator: 0x4f0a3226 Remote Minimum TX Interval: 1000ms Remote Minimum RX Interval: 1000ms Remote Detect Multiplier: 3 there's no further logfile entries in ovs after that systemctl/journal shows: Oct 10 10:06:17 controller-1 ovs-ctl[911167]: 2018-10-10T09:06:17Z|00001|unixctl|WARN|failed to connect to /var/run/openvswitch/ovs-vswitchd.1020.ctl Oct 10 10:06:17 controller-1 ovs-appctl[911206]: ovs|00001|unixctl|WARN|failed to connect to /var/run/openvswitch/ovs-vswitchd.1020.ctl Oct 10 10:06:17 controller-1 ovs-ctl[911167]: ovs-appctl: cannot connect to "/var/run/openvswitch/ovs-vswitchd.1020.ctl" (Connection refused) [root@controller-1 opendaylight]# systemctl status ovs-vswitchd ● ovs-vswitchd.service - Open vSwitch Forwarding Unit Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled) Active: inactive (dead) since Wed 2018-10-10 10:06:17 BST; 32min ago Process: 911167 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop (code=exited, status=0/SUCCESS) Oct 10 10:06:17 controller-1 ovs-ctl[911167]: 2018-10-10T09:06:17Z|00001|unixctl|WARN|failed to connect to /var/run/openvswitch/ovs-vswitchd.1020.ctl Oct 10 10:06:17 controller-1 ovs-ctl[911167]: ovs-appctl: cannot connect to "/var/run/openvswitch/ovs-vswitchd.1020.ctl" (Connection refused) Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable. the previous OSP (13) with ovs 2.9 was tested exactly the same way but ovs logfile doesn't show these BFD messages: 2018-10-09T03:59:13.367Z|00415|rconn|INFO|br-int<->tcp:172.17.1.15:6653: connection closed by peer 2018-10-09T03:59:13.846Z|00416|rconn|INFO|br-int<->tcp:172.17.1.15:6653: connecting... 2018-10-09T03:59:13.847Z|00417|rconn|WARN|br-int<->tcp:172.17.1.15:6653: connection dropped (Connection refused) 2018-10-09T03:59:13.847Z|00418|rconn|INFO|br-int<->tcp:172.17.1.15:6653: waiting 2 seconds before reconnect 2018-10-09T03:59:15.845Z|00419|rconn|INFO|br-int<->tcp:172.17.1.15:6653: connecting... 2018-10-09T03:59:15.846Z|00420|rconn|WARN|br-int<->tcp:172.17.1.15:6653: connection dropped (Connection refused) 2018-10-09T03:59:15.846Z|00421|rconn|INFO|br-int<->tcp:172.17.1.15:6653: waiting 4 seconds before reconnect 2018-10-09T03:59:19.846Z|00422|rconn|INFO|br-int<->tcp:172.17.1.15:6653: connecting... 2018-10-09T03:59:19.846Z|00423|rconn|WARN|br-int<->tcp:172.17.1.15:6653: connection dropped (Connection refused) 2018-10-09T03:59:19.846Z|00424|rconn|INFO|br-int<->tcp:172.17.1.15:6653: continuing to retry connections in the background but suppressing further logging 2018-10-09T03:59:27.848Z|00425|rconn|WARN|br-int<->tcp:172.17.1.15:6653: connection dropped (Connection refused) ... 2018-10-09T04:05:59.846Z|00554|rconn|WARN|br-int<->tcp:172.17.1.15:6653: connection dropped (Connection refused) 2018-10-09T04:06:07.846Z|00557|rconn|WARN|br-int<->tcp:172.17.1.15:6653: connection dropped (Connection refused) 2018-10-09T04:06:15.948Z|00559|rconn|INFO|br-int<->tcp:172.17.1.15:6653: connected Version-Release number of selected component (if applicable): osp14, openvswitch2.10-2.10.0-4 How reproducible: 100% Steps to Reproduce: 1. deploy OSP + ovs2.10 + ODL 2. stop opendaylight on any overcloud controller for >6mins 3. observe ovs dying Actual results: ovs dies after BFD reporting connection to one of the opendaylights down Expected results: ovs to survive and not become a zombie! Additional info:
According to the logs, it looks like ovs-vswitchd was stopped? Did I understand the timeline correctly?
yes, ovs was stopped but not by hand... must have died
@Lucas, @Numan is there any chance that it will reproduce/ effect on OVN?
Numan see comment 3
2 cents of mine... As OVN manages OVS in a different way than ODL (I've learned that by the way of https://bugzilla.redhat.com/show_bug.cgi?id=1626488 which was happening on ODL but not OVN instances) I think you may not be able to reproduce this (1637926) bug on OVN but it's worth a shot other than that I have a d/s machine with these symptoms I'm happy to share with developers to troubleshoot/dev the issue
(In reply to Eran Kuris from comment #4) > Numan see comment 3 Eran - to see if it affects OVN, may be you can bring down one of the controller node, wait for like 10 minutes and see the status of ovs-vswitchd in the other controller nodes. Before that, create a neutron router and attach gateway interface to it so that OVN configures BFD on the tunnel interfaces (and may be you can create a VM as well).
(In reply to Waldemar Znoinski from comment #0) ... > > [root@controller-1 opendaylight]# systemctl status ovs-vswitchd > ● ovs-vswitchd.service - Open vSwitch Forwarding Unit > Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; > vendor preset: disabled) > Active: inactive (dead) since Wed 2018-10-10 10:06:17 BST; 32min ago > Process: 911167 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl > --no-ovsdb-server stop (code=exited, status=0/SUCCESS) > > Oct 10 10:06:17 controller-1 ovs-ctl[911167]: > 2018-10-10T09:06:17Z|00001|unixctl|WARN|failed to connect to > /var/run/openvswitch/ovs-vswitchd.1020.ctl > Oct 10 10:06:17 controller-1 ovs-ctl[911167]: ovs-appctl: cannot connect to > "/var/run/openvswitch/ovs-vswitchd.1020.ctl" (Connection refused) > Warning: Journal has been rotated since unit was started. Log output is > incomplete or unavailable. Usually, when something is killed, it won't have an ExecStop= set, it will report that it was killed. Something here seems like it stopped ovs-vswitchd (if I'm reading the output correctly). Can you attach the full systemd journal (or somehow get it)?
(In reply to Aaron Conole from comment #7) > (In reply to Waldemar Znoinski from comment #0) > ... > > > > [root@controller-1 opendaylight]# systemctl status ovs-vswitchd > > ● ovs-vswitchd.service - Open vSwitch Forwarding Unit > > Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; > > vendor preset: disabled) > > Active: inactive (dead) since Wed 2018-10-10 10:06:17 BST; 32min ago > > Process: 911167 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl > > --no-ovsdb-server stop (code=exited, status=0/SUCCESS) > > > > Oct 10 10:06:17 controller-1 ovs-ctl[911167]: > > 2018-10-10T09:06:17Z|00001|unixctl|WARN|failed to connect to > > /var/run/openvswitch/ovs-vswitchd.1020.ctl > > Oct 10 10:06:17 controller-1 ovs-ctl[911167]: ovs-appctl: cannot connect to > > "/var/run/openvswitch/ovs-vswitchd.1020.ctl" (Connection refused) > > Warning: Journal has been rotated since unit was started. Log output is > > incomplete or unavailable. > > Usually, when something is killed, it won't have an ExecStop= set, it will > report that it was killed. Something here seems like it stopped > ovs-vswitchd (if I'm reading the output correctly). Can you attach the full > systemd journal (or somehow get it)? Aaron I think it works different than you described, because I 1. had ovs running 2. killed ovs straight 3. systemctl status shows ExecStop (not it's for ovsdb not ovs-vswitchd) [root@controller-1 openvswitch2.10-2.10.0-10]# sudo systemctl status ovs-vswitchd ● ovs-vswitchd.service - Open vSwitch Forwarding Unit Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled) Active: active (running) since Fri 2018-10-12 09:30:53 BST; 17s ago Process: 469088 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop (code=exited, status=0/SUCCESS) Process: 25807 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server --no-monitor --system-id=random ${OVSUSER} start $OPTIONS (code=exited, status=0/SUCCESS) Process: 25802 ExecStartPre=/usr/bin/chmod 0775 /dev/hugepages (code=exited, status=0/SUCCESS) Process: 25800 ExecStartPre=/bin/sh -c /usr/bin/chown :$${OVS_USER_ID##*:} /dev/hugepages (code=exited, status=0/SUCCESS) Tasks: 11 Memory: 87.5M CGroup: /system.slice/ovs-vswitchd.service └─25850 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --... Oct 12 09:30:52 controller-1 systemd[1]: Starting Open vSwitch Forwarding Unit... Oct 12 09:30:53 controller-1 ovs-ctl[25807]: Starting ovs-vswitchd [ OK ] Oct 12 09:30:53 controller-1 ovs-vsctl[25985]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=controller-1.localdomain Oct 12 09:30:53 controller-1 ovs-ctl[25807]: Enabling remote OVSDB managers [ OK ] Oct 12 09:30:53 controller-1 systemd[1]: Started Open vSwitch Forwarding Unit. [root@controller-1 openvswitch2.10-2.10.0-10]# ps aux | grep -i ovs-vswitch openvsw+ 25850 2.0 0.3 913216 106728 ? S<Lsl 09:30 0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach root 28699 0.0 0.0 112704 1008 pts/1 S+ 09:31 0:00 grep --color=auto -i ovs-vswitch [root@controller-1 openvswitch2.10-2.10.0-10]# kill 25850 [root@controller-1 openvswitch2.10-2.10.0-10]# sudo systemctl status ovs-vswitchd ● ovs-vswitchd.service - Open vSwitch Forwarding Unit Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled) Active: inactive (dead) since Fri 2018-10-12 09:31:27 BST; 5s ago Process: 29139 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop (code=exited, status=0/SUCCESS) Process: 25807 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server --no-monitor --system-id=random ${OVSUSER} start $OPTIONS (code=exited, status=0/SUCCESS) Process: 25802 ExecStartPre=/usr/bin/chmod 0775 /dev/hugepages (code=exited, status=0/SUCCESS) Process: 25800 ExecStartPre=/bin/sh -c /usr/bin/chown :$${OVS_USER_ID##*:} /dev/hugepages (code=exited, status=0/SUCCESS) Oct 12 09:30:52 controller-1 systemd[1]: Starting Open vSwitch Forwarding Unit... Oct 12 09:30:53 controller-1 ovs-ctl[25807]: Starting ovs-vswitchd [ OK ] Oct 12 09:30:53 controller-1 ovs-vsctl[25985]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=controller-1.localdomain Oct 12 09:30:53 controller-1 ovs-ctl[25807]: Enabling remote OVSDB managers [ OK ] Oct 12 09:30:53 controller-1 systemd[1]: Started Open vSwitch Forwarding Unit. Oct 12 09:31:27 controller-1 ovs-ctl[29139]: ovs-vswitchd is not running. [root@controller-1 openvswitch2.10-2.10.0-10]# Also, I've installed newest fast datapath ovs version 2.10-10 on all overcloud nodes and make ovs start, when I brought opendaylight docker container down the OVS on two overcloud nodes died after couple of minutes [root@controller-2 openvswitch2.10-2.10.0-10]# sudo systemctl status ovs-vswitchd ● ovs-vswitchd.service - Open vSwitch Forwarding Unit Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled) Active: inactive (dead) since Thu 2018-10-11 14:16:35 BST; 19h ago Process: 335143 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop (code=exited, status=0/SUCCESS) Process: 305530 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server --no-monitor --system-id=random ${OVSUSER} start $OPTIONS (code=exited, status=0/SUCCESS) Process: 305526 ExecStartPre=/usr/bin/chmod 0775 /dev/hugepages (code=exited, status=0/SUCCESS) Process: 305524 ExecStartPre=/bin/sh -c /usr/bin/chown :$${OVS_USER_ID##*:} /dev/hugepages (code=exited, status=0/SUCCESS) Main PID: 967023 (code=killed, signal=TERM)
Created attachment 1493244 [details] ovs core dump
Created attachment 1493247 [details] new ovs vswitchd.log
Created attachment 1493248 [details] journal
Aaron, I'm attaching newer ovs log and journal as requested note I've started ovs-vswitchd systemctl service manually at 11:33
I've tried settings ovs-appctl bfd/set-forwarding false on all overcloud nodes, then ran integration testing (which creates VMs and brings down odl containers + checks the VMs is still pingable)... ovs died again
so I check the same scenario on OVN deployment and I did not hit this issue.
*** Bug 1640045 has been marked as a duplicate of this bug. ***
Adding the backtrac for the crash, so it's easy to see it's a duplicate of 1640045. #0 0x00007f3c6524b207 in raise () from /lib64/libc.so.6 #1 0x00007f3c6524c8f8 in abort () from /lib64/libc.so.6 #2 0x00007f3c66c06cb7 in ofputil_encode_flow_removed (fr=fr@entry=0x7f3c59ff9b80, protocol=<optimized out>) at lib/ofp-monitor.c:293 #3 0x00007f3c671b1db3 in connmgr_send_flow_removed (mgr=mgr@entry=0x56197f5a4800, fr=fr@entry=0x7f3c59ff9b80) at ofproto/connmgr.c:1702 #4 0x00007f3c671b7464 in ofproto_rule_send_removed (rule=0x56197f69db80) at ofproto/ofproto.c:5729 #5 0x00007f3c671bdc3d in rule_destroy_cb (rule=0x56197f69db80) at ofproto/ofproto.c:2839 #6 0x00007f3c66c1e88e in ovsrcu_call_postponed () at lib/ovs-rcu.c:342 #7 0x00007f3c66c1ea94 in ovsrcu_postpone_thread (arg=<optimized out>) at lib/ovs-rcu.c:357 #8 0x00007f3c66c20d2f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:354 #9 0x00007f3c66000dd5 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f3c65313b3d in clone () from /lib64/libc.so.6 (gdb) Re-assigning BZ to duplicate BZ owner, as he has a fix posted upstream. https://patchwork.ozlabs.org/patch/985340/
Tomas, Waldek, can you attach the karaf.log from this bug when it occurs. There are also some u/s jobs with messy things going on with ovs. Wondering if we are seeing the same thing. But, the karaf.log will also show a lot of reconnects, I think.
jamo, the setup I have this problem on was changed plenty (diff ovs, many restarts, many pulling and tearing) so I don't think karaf will be a good example for you to compare vs. upstream, I'll redeploy and give you the karaf.log then (today)
(In reply to Waldemar Znoinski from comment #21) > jamo, the setup I have this problem on was changed plenty (diff ovs, many > restarts, many pulling and tearing) so I don't think karaf will be a good > example for you to compare vs. upstream, I'll redeploy and give you the > karaf.log then (today) oh, this wasn't from a CI job where you can just go pull the karaf.log files?
jamoluhrsen, hey, re https://bugzilla.redhat.com/show_bug.cgi?id=1637926 you're right, it's in CI too, i.e.: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-opendaylight-odl-netvirt-14_director-rhel-virthost-3cont_2comp-ipv4-vxlan-ha-csit/28/robot/report/log.html in that's job 'build artifacts' in controller-0/1/2.tar.gz in /var/log/containers/opendaylight is your karaf
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045