While working on an OSP15 OVN based setup I tried to clear up (wrongly) the external_mac field of a NAT entry: $ ovn-nbctl set NAT 6b58278e-f4ac-4465-8964-0fd96f905b4a external_mac=\"\" This turned out to cause excessive log rating in ovn-northd and eventually a crash: [...] 2019-11-06T16:08:39.584Z|00207|ovn_northd|WARN|bad mac for dnat in router 6b25f763-93d7-4382-a07f-3226c9fcae1d 2019-11-06T16:09:02.545Z|00208|ovn_northd|WARN|Dropped 3 log messages in last 22 seconds (most recently, 22 seconds ago) due to excessive rate 2019-11-06T16:09:02.545Z|00209|ovn_northd|WARN|bad mac for dnat in router 6b25f763-93d7-4382-a07f-3226c9fcae1d 2019-11-06T16:09:16.558Z|00210|ovn_northd|WARN|Dropped 11 log messages in last 14 seconds (most recently, 14 seconds ago) due to excessive rate 2019-11-06T16:09:16.558Z|00211|ovn_northd|WARN|bad mac for dnat in router 6b25f763-93d7-4382-a07f-3226c9fcae1d 2019-11-06T16:09:39.561Z|00212|ovn_northd|WARN|Dropped 3 log messages in last 23 seconds (most recently, 23 seconds ago) due to excessive rate 2019-11-06T16:09:39.561Z|00213|ovn_northd|WARN|bad mac for dnat in router 6b25f763-93d7-4382-a07f-3226c9fcae1d 2019-11-06T16:09:53.558Z|00214|ovn_northd|WARN|Dropped 11 log messages in last 14 seconds (most recently, 14 seconds ago) due to excessive rate 2019-11-06T16:09:53.558Z|00215|ovn_northd|WARN|bad mac for dnat in router 6b25f763-93d7-4382-a07f-3226c9fcae1d 2019-11-06T16:10:16.555Z|00216|ovn_northd|WARN|Dropped 3 log messages in last 23 seconds (most recently, 23 seconds ago) due to excessive rate 2019-11-06T16:10:16.555Z|00217|ovn_northd|WARN|bad mac for dnat in router 6b25f763-93d7-4382-a07f-3226c9fcae1d 2019-11-06T16:10:30.560Z|00218|ovn_northd|WARN|Dropped 10 log messages in last 14 seconds (most recently, 14 seconds ago) due to excessive rate 2019-11-06T16:10:30.560Z|00219|ovn_northd|WARN|bad mac for dnat in router 6b25f763-93d7-4382-a07f-3226c9fcae1d 2019-11-06T16:10:53.557Z|00220|ovn_northd|WARN|Dropped 3 log messages in last 23 seconds (most recently, 23 seconds ago) due to excessive rate 2019-11-06T16:10:53.557Z|00221|ovn_northd|WARN|bad mac for dnat in router 6b25f763-93d7-4382-a07f-3226c9fcae1d 2019-11-06T16:11:07.562Z|00222|ovn_northd|WARN|Dropped 8 log messages in last 14 seconds (most recently, 14 seconds ago) due to excessive rate 2019-11-06T16:11:07.562Z|00223|ovn_northd|WARN|bad mac for dnat in router 6b25f763-93d7-4382-a07f-3226c9fcae1d 2019-11-06T16:11:14.921Z|00002|daemon_unix(monitor)|ERR|1 crashes: pid 257 died, killed (Segmentation fault), core dumped, restarting 2019-11-06T16:11:14.924Z|00003|reconnect(ovn-northd)|INFO|unix:/var/run/openvswitch/ovnnb_db.sock: connecting... 2019-11-06T16:11:14.924Z|00004|reconnect(ovn-northd)|INFO|unix:/var/run/openvswitch/ovnsb_db.sock: connecting... 2019-11-06T16:11:14.924Z|00005|reconnect(ovn-northd)|INFO|unix:/var/run/openvswitch/ovnnb_db.sock: connected 2019-11-06T16:11:14.924Z|00006|reconnect(ovn-northd)|INFO|unix:/var/run/openvswitch/ovnsb_db.sock: connected 2019-11-06T16:11:14.925Z|00007|ovn_northd(ovn-northd)|INFO|ovn-northd lock acquired. This ovn-northd instance is now active. 2019-11-06T16:11:15.431Z|00003|daemon_unix(monitor)|WARN|2 crashes: pid 340216 died, killed (Segmentation fault), core dumped, waiting until 10 seconds since last restart 2019-11-06T16:11:24.431Z|00004|daemon_unix(monitor)|ERR|2 crashes: pid 340216 died, killed (Segmentation fault), core dumped, restarting 2019-11-06T16:11:24.432Z|00005|reconnect(ovn-northd)|INFO|unix:/var/run/openvswitch/ovnnb_db.sock: connecting... 2019-11-06T16:11:24.432Z|00006|reconnect(ovn-northd)|INFO|unix:/var/run/openvswitch/ovnsb_db.sock: connecting... 2019-11-06T16:11:24.432Z|00007|reconnect(ovn-northd)|INFO|unix:/var/run/openvswitch/ovnnb_db.sock: connected 2019-11-06T16:11:24.432Z|00008|reconnect(ovn-northd)|INFO|unix:/var/run/openvswitch/ovnsb_db.sock: connected 2019-11-06T17:28:51.957Z|00009|unixctl(ovn-northd)|DBG|replying with success, id=0: "" 2019-11-06T17:28:51.957Z|00010|jsonrpc(ovn-northd)|DBG|unix#0: send reply, result="", id=0 2019-11-06T17:28:51.958Z|00011|poll_loop(ovn-northd)|DBG|wakeup due to [POLLIN][POLLHUP] on fd 11 (/var/run/openvswitch/ovn-northd.340513.ctl<->) at lib/stream-fd.c:157 2019-11-06T17:29:12.877Z|00012|poll_loop(ovn-northd)|DBG|wakeup due to [POLLIN] on fd 8 (<->/var/run/openvswitch/ovnnb_db.sock) at lib/stream-fd.c:157 (0% CPU usage) After the crash, the monitor restarts the process but unfortunately, connection to SB database seems frozen leaving ovn-northd in a state where it won't receive or send any data to the southbound database and unnoticed to the monitor. This paragraph describes perhaps a separate bug so feel free to open a new BZ.
After digging with Dumitru, he found out that the crash is due to logical_port not being set when the external_mac was set to "". This is something not expected from CMS so the severity of this bug is not high. Still concerning the fact that SB database was not 'writeable' from northd after the monitor restarted the process. As I said earlier, maybe material for a separate BZ
reproduced on ovn2.11-2.11.1-8.el7fdp.x86_64: [root@dell-per740-12 bz1769709]# rpm -qa | grep -E "openvswitch|ovn" openvswitch2.11-2.11.0-35.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch ovn2.11-2.11.1-8.el7fdp.x86_64 ovn2.11-host-2.11.1-8.el7fdp.x86_64 ovn2.11-central-2.11.1-8.el7fdp.x86_64 [root@dell-per740-12 bz1769709]# bash -x rep1.sh + systemctl start openvswitch + systemctl start ovn-northd + ovn-nbctl set-connection ptcp:6641 + ovn-sbctl set-connection ptcp:6642 + ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.30.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.30.25 + systemctl restart ovn-controller + ovn-nbctl lr-add rtr + ovn-nbctl lrp-add rtr rtr-ls 00:00:00:00:01:00 42.42.42.1/24 + ovn-nbctl lrp-add rtr rtr-ls2 00:00:00:00:02:00 66.66.66.1/24 + ovn-nbctl ls-add ls + ovn-nbctl lsp-add ls ls-rtr + ovn-nbctl lsp-set-addresses ls-rtr 00:00:00:00:01:00 + ovn-nbctl lsp-set-type ls-rtr router + ovn-nbctl lsp-set-options ls-rtr router-port=rtr-ls + ovn-nbctl lsp-add ls vm1 + ovn-nbctl lsp-set-addresses vm1 00:00:00:00:00:01 + ovn-nbctl lsp-add ls vm2 + ovn-nbctl lsp-set-addresses vm2 00:00:00:00:00:02 + ovn-nbctl ls-add ls2 + ovn-nbctl lsp-add ls2 ls2-rtr + ovn-nbctl lsp-set-addresses ls2-rtr 00:00:00:00:02:00 + ovn-nbctl lsp-set-type ls2-rtr router + ovn-nbctl lsp-set-options ls2-rtr router-port=rtr-ls2 + ovn-nbctl lsp-add ls2 vm3 + ovn-nbctl lsp-set-addresses vm3 00:00:00:00:00:03 + ip netns add vm1 + ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal + ip link set vm1 netns vm1 + ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01 + ip netns exec vm1 ip addr add 42.42.42.2/24 dev vm1 + ip netns exec vm1 ip link set vm1 up + ovs-vsctl set Interface vm1 external_ids:iface-id=vm1 + ip netns add vm2 + ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal + ip link set vm2 netns vm2 + ip netns exec vm2 ip link set vm2 address 00:00:00:00:00:02 + ip netns exec vm2 ip addr add 42.42.42.3/24 dev vm2 + ip netns exec vm2 ip link set vm2 up + ovs-vsctl set Interface vm2 external_ids:iface-id=vm2 + ip netns add vm3 + ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal + ip link set vm3 netns vm3 + ip netns exec vm3 ip link set vm3 address 00:00:00:00:00:03 + ip netns exec vm3 ip addr add 66.66.66.3/24 dev vm3 + ip netns exec vm3 ip link set vm3 up + ovs-vsctl set Interface vm3 external_ids:iface-id=vm3 + ovn-nbctl lr-nat-add rtr dnat_and_snat 66.66.66.61 42.42.42.2 vm1 00:00:00:00:02:00 + ovn-nbctl lr-nat-add rtr dnat_and_snat 66.66.66.62 42.42.42.3 vm2 00:00:00:00:02:00 + ovn-nbctl lr-nat-add rtr snat 66.66.66.63 42.42.42.0/24 + ip netns exec vm1 ip r a default via 42.42.42.1 + ip netns exec vm2 ip r a default via 42.42.42.1 + ovs-vsctl set open . external_ids:system-id=local + ovn-nbctl set logical_router_port rtr-ls2 options:redirect-chassis=local [root@dell-per740-12 bz1769709]# ovn-nbctl list NAT _uuid : 118519f0-0aeb-4a7a-b320-8550153ae73c external_ids : {} external_ip : "66.66.66.61" external_mac : "00:00:00:00:02:00" logical_ip : "42.42.42.2" logical_port : "vm1" type : dnat_and_snat _uuid : 71495c1c-2288-40ca-8d0a-32eb7e9796e0 external_ids : {} external_ip : "66.66.66.63" external_mac : [] logical_ip : "42.42.42.0/24" logical_port : [] type : snat _uuid : 8b82becf-a7a3-4634-920b-d918e8a5b5a5 external_ids : {} external_ip : "66.66.66.62" external_mac : "00:00:00:00:02:00" logical_ip : "42.42.42.3" logical_port : "vm2" type : dnat_and_snat [root@dell-per740-12 bz1769709]# ovn-nbctl set NAT 118519f0-0aeb-4a7a-b320-8550153ae73c external_mac=\"\" [root@dell-per740-12 bz1769709]# ovn-nbctl clear NAT 118519f0-0aeb-4a7a-b320-8550153ae73c logical_port then ovn-northd is crashed: [root@dell-per740-12 ~]# tail -f /var/log/openvswitch/ovn-northd.log 2019-12-19T08:44:32.647Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-northd.log 2019-12-19T08:44:32.653Z|00002|reconnect|INFO|unix:/run/openvswitch/ovnnb_db.sock: connecting... 2019-12-19T08:44:32.653Z|00003|reconnect|INFO|unix:/run/openvswitch/ovnsb_db.sock: connecting... 2019-12-19T08:44:32.653Z|00004|reconnect|INFO|unix:/run/openvswitch/ovnnb_db.sock: connected 2019-12-19T08:44:32.653Z|00005|reconnect|INFO|unix:/run/openvswitch/ovnsb_db.sock: connected 2019-12-19T08:44:32.653Z|00006|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active. 2019-12-19T08:45:07.615Z|00007|ovn_northd|WARN|bad mac for dnat in router 66181cf2-02a6-4090-9e37-4ac4336584f1 2019-12-19T08:45:30.422Z|00002|daemon_unix(monitor)|ERR|1 crashes: pid 31885 died, killed (Segmentation fault), restarting 2019-12-19T08:45:30.424Z|00003|reconnect(ovn-northd)|INFO|unix:/run/openvswitch/ovnnb_db.sock: connecting... 2019-12-19T08:45:30.424Z|00004|reconnect(ovn-northd)|INFO|unix:/run/openvswitch/ovnsb_db.sock: connecting... 2019-12-19T08:45:30.425Z|00005|reconnect(ovn-northd)|INFO|unix:/run/openvswitch/ovnnb_db.sock: connected 2019-12-19T08:45:30.425Z|00006|reconnect(ovn-northd)|INFO|unix:/run/openvswitch/ovnsb_db.sock: connected 2019-12-19T08:45:30.425Z|00007|ovn_northd(ovn-northd)|INFO|ovn-northd lock acquired. This ovn-northd instance is now active. 2019-12-19T08:45:30.439Z|00008|ovn_northd(ovn-northd)|WARN|Duplicate MAC set: 00:00:00:00:02:00 2019-12-19T08:45:30.452Z|00003|daemon_unix(monitor)|WARN|2 crashes: pid 32065 died, killed (Segmentation fault), waiting until 10 seconds since last restart 2019-12-19T08:45:40.453Z|00004|daemon_unix(monitor)|ERR|2 crashes: pid 32065 died, killed (Segmentation fault), restarting 2019-12-19T08:45:40.455Z|00005|reconnect(ovn-northd)|INFO|unix:/run/openvswitch/ovnnb_db.sock: connecting... 2019-12-19T08:45:40.456Z|00006|reconnect(ovn-northd)|INFO|unix:/run/openvswitch/ovnsb_db.sock: connecting... 2019-12-19T08:45:40.456Z|00007|reconnect(ovn-northd)|INFO|unix:/run/openvswitch/ovnnb_db.sock: connected 2019-12-19T08:45:40.456Z|00008|reconnect(ovn-northd)|INFO|unix:/run/openvswitch/ovnsb_db.sock: connected 2019-12-19T08:45:40.456Z|00009|ovn_northd(ovn-northd)|INFO|ovn-northd lock acquired. This ovn-northd instance is now active. 2019-12-19T08:45:40.469Z|00010|ovn_northd(ovn-northd)|WARN|Duplicate MAC set: 00:00:00:00:02:00 2019-12-19T08:45:40.482Z|00005|daemon_unix(monitor)|WARN|3 crashes: pid 32074 died, killed (Segmentation fault), waiting until 10 seconds since last restart 2019-12-19T08:45:50.488Z|00006|daemon_unix(monitor)|ERR|3 crashes: pid 32074 died, killed (Segmentation fault), restarting 2019-12-19T08:45:50.490Z|00007|reconnect(ovn-northd)|INFO|unix:/run/openvswitch/ovnnb_db.sock: connecting... 2019-12-19T08:45:50.490Z|00008|reconnect(ovn-northd)|INFO|unix:/run/openvswitch/ovnsb_db.sock: connecting... 2019-12-19T08:45:50.490Z|00009|reconnect(ovn-northd)|INFO|unix:/run/openvswitch/ovnnb_db.sock: connected 2019-12-19T08:45:50.490Z|00010|reconnect(ovn-northd)|INFO|unix:/run/openvswitch/ovnsb_db.sock: connected 2019-12-19T08:45:50.491Z|00011|ovn_northd(ovn-northd)|INFO|ovn-northd lock acquired. This ovn-northd instance is now active. 2019-12-19T08:45:50.504Z|00012|ovn_northd(ovn-northd)|WARN|Duplicate MAC set: 00:00:00:00:02:00 2019-12-19T08:45:50.516Z|00007|daemon_unix(monitor)|WARN|4 crashes: pid 32085 died, killed (Segmentation fault), waiting until 10 seconds since last restart Verified on ovn2.11-2.11.1-24.el7fdp.x86_64: [root@dell-per740-12 bz1769709]# rpm -qa | grep -E "openvswitch|ovn" openvswitch2.11-2.11.0-35.el7fdp.x86_64 ovn2.11-2.11.1-24.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch ovn2.11-central-2.11.1-24.el7fdp.x86_64 ovn2.11-host-2.11.1-24.el7fdp.x86_64 [root@dell-per740-12 bz1769709]# ovn-nbctl list NAT _uuid : d571fb3e-00b3-4c08-b7ba-305c24b7c3fb external_ids : {} external_ip : "66.66.66.62" external_mac : "00:00:00:00:02:00" logical_ip : "42.42.42.3" logical_port : "vm2" options : {stateless="false"} type : dnat_and_snat _uuid : 1851726a-7c5e-4ea9-86d0-b0c813cd86c5 external_ids : {} external_ip : "66.66.66.63" external_mac : [] logical_ip : "42.42.42.0/24" logical_port : [] options : {stateless="false"} type : snat _uuid : 92857c37-c6ea-41d8-9cd7-9785a3842cfe external_ids : {} external_ip : "66.66.66.61" external_mac : "00:00:00:00:02:00" logical_ip : "42.42.42.2" logical_port : "vm1" options : {stateless="false"} type : dnat_and_snat [root@dell-per740-12 bz1769709]# ovn-nbctl set NAT d571fb3e-00b3-4c08-b7ba-305c24b7c3fb external_mac=\"\" [root@dell-per740-12 bz1769709]# ovn-nbctl clear NAT d571fb3e-00b3-4c08-b7ba-305c24b7c3fb logical_port [root@dell-per740-12 ~]# tail -f /var/log/openvswitch/ovn-northd.log 2019-12-19T08:50:48.829Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-northd.log 2019-12-19T08:50:48.833Z|00002|reconnect|INFO|unix:/run/openvswitch/ovnnb_db.sock: connecting... 2019-12-19T08:50:48.833Z|00003|reconnect|INFO|unix:/run/openvswitch/ovnsb_db.sock: connecting... 2019-12-19T08:50:48.833Z|00004|reconnect|INFO|unix:/run/openvswitch/ovnnb_db.sock: connected 2019-12-19T08:50:48.833Z|00005|reconnect|INFO|unix:/run/openvswitch/ovnsb_db.sock: connected 2019-12-19T08:50:48.833Z|00006|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active. 2019-12-19T08:51:07.189Z|00007|ovn_northd|WARN|bad mac for dnat in router 76589cc1-47d3-4ea5-a71d-ed50e94c42bd <=== not crash
also fixed on rhel8 version: [root@hp-dl380pg8-12 bz1769709]# ovn-nbctl set NAT 0e81f035-0691-4662-818d-9b2612f3fdd3 external_mac=\ "\" [root@hp-dl380pg8-12 bz1769709]# ovn-nbctl list NAT _uuid : 0e81f035-0691-4662-818d-9b2612f3fdd3 external_ids : {} external_ip : "66.66.66.61" external_mac : "" logical_ip : "42.42.42.2" logical_port : "vm1" options : {stateless="false"} type : dnat_and_snat _uuid : e626170e-64cc-4b27-b322-a648ea188923 external_ids : {} external_ip : "66.66.66.62" external_mac : "00:00:00:00:02:00" logical_ip : "42.42.42.3" logical_port : "vm2" options : {stateless="false"} type : dnat_and_snat _uuid : 1e3201e2-a1ae-490c-87e5-97e7c78b23c7 external_ids : {} external_ip : "66.66.66.63" external_mac : [] logical_ip : "42.42.42.0/24" logical_port : [] options : {stateless="false"} type : snat [root@hp-dl380pg8-12 bz1769709]# tail /var/log/openvswitch/ovn-northd.log 2019-12-19T08:58:08.350Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-northd.log 2019-12-19T08:58:08.352Z|00002|reconnect|INFO|unix:/run/openvswitch/ovnnb_db.sock: connecting... 2019-12-19T08:58:08.353Z|00003|reconnect|INFO|unix:/run/openvswitch/ovnsb_db.sock: connecting... 2019-12-19T08:58:08.353Z|00004|reconnect|INFO|unix:/run/openvswitch/ovnnb_db.sock: connected 2019-12-19T08:58:08.353Z|00005|reconnect|INFO|unix:/run/openvswitch/ovnsb_db.sock: connected 2019-12-19T08:58:08.353Z|00006|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active. 2019-12-19T08:58:39.522Z|00007|ovn_northd|WARN|bad mac for dnat in router 36e2b3b2-e913-452b-a369-c7e874ee81ba [root@hp-dl380pg8-12 bz1769709]# ovn-nbctl clear NAT 0e81f035-0691-4662-818d-9b2612f3fdd3 logical_port [root@hp-dl380pg8-12 bz1769709]# tail /var/log/openvswitch/ovn-northd.log 2019-12-19T08:58:08.350Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-northd.log 2019-12-19T08:58:08.352Z|00002|reconnect|INFO|unix:/run/openvswitch/ovnnb_db.sock: connecting... 2019-12-19T08:58:08.353Z|00003|reconnect|INFO|unix:/run/openvswitch/ovnsb_db.sock: connecting... 2019-12-19T08:58:08.353Z|00004|reconnect|INFO|unix:/run/openvswitch/ovnnb_db.sock: connected 2019-12-19T08:58:08.353Z|00005|reconnect|INFO|unix:/run/openvswitch/ovnsb_db.sock: connected 2019-12-19T08:58:08.353Z|00006|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active. 2019-12-19T08:58:39.522Z|00007|ovn_northd|WARN|bad mac for dnat in router 36e2b3b2-e913-452b-a369-c7e874ee81ba [root@hp-dl380pg8-12 bz1769709]# rpm -qa | grep ovn kernel-kernel-networking-openvswitch-ovn-basic-1.0-14.noarch ovn2.11-host-2.11.1-24.el8fdp.x86_64 ovn2.11-2.11.1-24.el8fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-common-1.0-6.noarch ovn2.11-central-2.11.1-24.el8fdp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0190