Bug 1740770

Summary: OVN-DVR HA | DNS Security group rule is applied but not working between VMs on different networks with FIPs
Product: Red Hat OpenStack Reporter: Udi Shkalim <ushkalim>
Component: openvswitchAssignee: Dumitru Ceara <dceara>
Status: CLOSED ERRATA QA Contact: Eduardo Olivares <eolivare>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: apevec, astupnik, chrisw, dalvarez, dceara, ekuris, fwissing, jlibosva, lhh, majopela, rgregory, rhos-maint, rsafrono, scohen, shdunne, slinaber
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openvswitch2.11-2.11.0-26.el8fdp Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1761461 (view as bug list) Environment:
Last Closed: 2020-03-10 11:52:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1761461    

Description Udi Shkalim 2019-08-13 15:29:30 UTC
Description of problem:
* Setup is OVN-DVR HA. Installing OpenShift 3.11 on top as a tenant.
* openshift instances have fips attached to them.
* On the same tenant, created an instance to act as the DNS server for the openshift instances.
* The DNS-instance is on a separated subnet than the openshift nodes.
* Each network is connected to an openstack router and the external_gateway network is the same on both routers
* Security group rules allow DNS traffic on both instances.

Eventually, DNS queries are not reaching the DNS server.

The same configuration on OVS and on OVN-HA is working (from a Jenkins job) - we only see failure on OVN-DVR HA

Note: ICMP and SSH are allowed on the SG and working ok.

Version-Release number of selected component (if applicable):
python-networking-ovn-4.0.3-7.el7ost

How reproducible:
100%

Steps to Reproduce:
1. Deploy OVN-DVR HA
2. Create a tenant and 2 networks+subnets
3. Create FIPS
4. Boot 2 instances - one on each network and attach the fips to them
5. allow icmp ssh and dns on the security group assigned to the instances
6. set one of the vms as the DNS server of the other and try to resolve a hostname (www.google.com) 

Actual results:
DNS traffic is going out of one of the vm but not seen in the other.

Expected results:
All allowed traffic should be seen.

Additional info:
sosreports attached

Comment 4 Jakub Libosvar 2019-08-14 16:20:31 UTC
I got access to the env and did some troubleshooting. The ping to fip from one VM to DNS works. I tried to debug the openflow rules and the output is as follows:

[root@compute-0 ~]# ovs-appctl ofproto/trace br-int in_port=4 fa163e979e54fa163ed18f3608004500003868d6400040118d05c0a863160a2e16eda42400350024fd9005290100000100000000000006676f6f676c6503636f6d0000010001
Flow: udp,in_port=4,vlan_tci=0x0000,dl_src=fa:16:3e:d1:8f:36,dl_dst=fa:16:3e:97:9e:54,nw_src=192.168.99.22,nw_dst=10.46.22.237,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=42020,tp_dst=53

bridge("br-int")
----------------
 0. in_port=4, priority 100
    set_field:0x1->reg13
    set_field:0xc->reg11
    set_field:0x11->reg12
    set_field:0x8->metadata
    set_field:0xd->reg14
    resubmit(,8)
 8. reg14=0xd,metadata=0x8,dl_src=fa:16:3e:d1:8f:36, priority 50, cookie 0x67403e67
    resubmit(,9)
 9. ip,reg14=0xd,metadata=0x8,dl_src=fa:16:3e:d1:8f:36,nw_src=192.168.99.22, priority 90, cookie 0x6a21fbd3
    resubmit(,10)
10. metadata=0x8, priority 0, cookie 0x2cfde4c8
    resubmit(,11)
11. ip,metadata=0x8, priority 100, cookie 0x6e86a67b
    load:0x1->NXM_NX_XXREG0[96]
    resubmit(,12)
12. metadata=0x8, priority 0, cookie 0xd097c17b
    resubmit(,13)
13. ip,reg0=0x1/0x1,metadata=0x8, priority 100, cookie 0x3d5e8cda
    ct(table=14,zone=NXM_NX_REG13[0..15])
    drop
     -> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 14.

Final flow: udp,reg0=0x1,reg11=0xc,reg12=0x11,reg13=0x1,reg14=0xd,metadata=0x8,in_port=4,vlan_tci=0x0000,dl_src=fa:16:3e:d1:8f:36,dl_dst=fa:16:3e:97:9e:54,nw_src=192.168.99.22,nw_dst=10.46.22.237,nw_tos=0,nw_ecn$0,nw_ttl=64,tp_src=42020,tp_dst=53
Megaflow: recirc_id=0,eth,udp,in_port=4,vlan_tci=0x0000/0x1000,dl_src=fa:16:3e:d1:8f:36,nw_src=192.168.99.22,nw_dst=10.46.22.237,nw_frag=no
Datapath actions: ct(zone=1),recirc(0x14fa)

===============================================================================
recirc(0x14fa) - resume conntrack with default ct_state=trk|new (use --ct-next to customize)
===============================================================================

Flow: recirc_id=0x14fa,ct_state=new|trk,ct_zone=1,eth,udp,reg0=0x1,reg11=0xc,reg12=0x11,reg13=0x1,reg14=0xd,metadata=0x8,in_port=4,vlan_tci=0x0000,dl_src=fa:16:3e:d1:8f:36,dl_dst=fa:16:3e:97:9e:54,nw_src=192.168.
99.22,nw_dst=10.46.22.237,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=42020,tp_dst=53

bridge("br-int")
----------------
    thaw
        Resuming from table 14
14. ct_state=+new-est+trk,ip,reg14=0xd,metadata=0x8, priority 2002, cookie 0xe66b6d51
    load:0x1->NXM_NX_XXREG0[97]
    resubmit(,15)
15. metadata=0x8, priority 0, cookie 0x5d4f6ff6
    resubmit(,16)
16. metadata=0x8, priority 0, cookie 0xce883650
    resubmit(,17)
17. metadata=0x8, priority 0, cookie 0x1636b72d
    resubmit(,18)
18. ip,reg0=0x2/0x2,metadata=0x8, priority 100, cookie 0xfaf2c006
    ct(commit,zone=NXM_NX_REG13[0..15],exec(load:0->NXM_NX_CT_LABEL[0]))
    load:0->NXM_NX_CT_LABEL[0]
    resubmit(,19)
19. metadata=0x8, priority 0, cookie 0x779117cc
    resubmit(,20)
20. metadata=0x8, priority 0, cookie 0xa9f4938c
    resubmit(,21)
21. metadata=0x8, priority 0, cookie 0xc08d5434
    resubmit(,22)
22. udp,metadata=0x8,tp_dst=53, priority 100, cookie 0xf1edfbc1
    controller(userdata=00.00.00.06.00.00.00.00.00.01.de.10.00.00.00.64,pause)

Final flow: recirc_id=0x14fa,eth,udp,reg0=0x3,reg11=0xc,reg12=0x11,reg13=0x1,reg14=0xd,metadata=0x8,in_port=4,vlan_tci=0x0000,dl_src=fa:16:3e:d1:8f:36,dl_dst=fa:16:3e:97:9e:54,nw_src=192.168.99.22,nw_dst=10.46.22
.237,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=42020,tp_dst=53
Megaflow: recirc_id=0x14fa,ct_state=+new-est-rel-rpl-inv+trk,ct_label=0/0x1,eth,udp,in_port=4,dl_src=fa:16:3e:d1:8f:36,nw_dst=0.0.0.0/1,nw_frag=no,tp_dst=53
Datapath actions: ct(commit,zone=1,label=0/0x1),userspace(pid=4294929842,controller(reason=1,dont_send=0,continuation=1,recirc_id=5371,rule_cookie=0xe66b6d51,controller_id=0,max_len=65535))

Comment 5 Jakub Libosvar 2019-08-15 09:56:41 UTC
Dumping some more info for troubleshooting. Some information about how packet goes:

VM1 is the origin VM with MAC fa:16:3e:d1:8f:36 and fixed IP 192.168.99.22
VM2 is the destination VM with MAC fa:16:3e:ca:51:a1 , fixed IP 192.168.23.12 and FIP 10.46.22.237

This is ovn-trace output for UDP packet going to port 53, so A query like:
# udp,reg14=0xd,vlan_tci=0x0000,dl_src=fa:16:3e:d1:8f:36,dl_dst=fa:16:3e:97:9e:54,nw_src=192.168.99.22,nw_dst=10.46.22.237,nw_tos=0,nw_ecn=0,nw_ttl=32,tp_src=0,tp_dst=53

ingress(dp="openshift-ansible-openshift.example.com-net", inport="openshift.example.com-infra_nodes-uemxzyd64o7k-2-x3fovb4iwk32-port-h7g4unkcw4af")
---------------------------------------------------------------------------------------------------------------------------------------------------
 0. ls_in_port_sec_l2 (ovn-northd.c:3869): inport == "openshift.example.com-infra_nodes-uemxzyd64o7k-2-x3fovb4iwk32-port-h7g4unkcw4af" && eth.src == {fa:16:3e:d1:8f:36}, priority 50, uuid 67403e67
    next;
 1. ls_in_port_sec_ip (ovn-northd.c:2851): inport == "openshift.example.com-infra_nodes-uemxzyd64o7k-2-x3fovb4iwk32-port-h7g4unkcw4af" && eth.src == fa:16:3e:d1:8f:36 && ip4.src == {192.168.99.22}, priority 90, uuid 6a21fbd3
    next;
 3. ls_in_pre_acl (ovn-northd.c:3152): ip, priority 100, uuid 6e86a67b
    reg0[0] = 1;
    next;
 5. ls_in_pre_stateful (ovn-northd.c:3289): reg0[0] == 1, priority 100, uuid 3d5e8cda
    ct_next;

ct_next(ct_state=est|trk /* default (use --ct to customize) */)
---------------------------------------------------------------
 6. ls_in_acl (ovn-northd.c:3497): !ct.new && ct.est && !ct.rpl && ct_label.blocked == 0 && (inport == "openshift.example.com-infra_nodes-uemxzyd64o7k-2-x3fovb4iwk32-port-h7g4unkcw4af" && ip4), priority 2002, uuid c4545831
    next;
14. ls_in_dns_lookup (ovn-northd.c:4129): udp.dst == 53, priority 100, uuid f1edfbc1
    reg0[4] = dns_lookup();
    *** dns_lookup action not implemented
    next;
16. ls_in_l2_lkup (ovn-northd.c:4263): eth.dst == fa:16:3e:97:9e:54, priority 50, uuid 53f27953
    outport = "d652c4";
    output;

egress(dp="openshift-ansible-openshift.example.com-net", inport="openshift.example.com-infra_nodes-uemxzyd64o7k-2-x3fovb4iwk32-port-h7g4unkcw4af", outport="d652c4")
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1. ls_out_pre_acl (ovn-northd.c:3111): ip && outport == "d652c4", priority 110, uuid 58adc831
    next;
 9. ls_out_port_sec_l2 (ovn-northd.c:4346): outport == "d652c4", priority 50, uuid 59f4dc46
    output;
    /* output to "d652c4", type "patch" */

ingress(dp="openshift-ansible-openshift.example.com-router", inport="lrp-d652c4")
---------------------------------------------------------------------------------
 0. lr_in_admission (ovn-northd.c:4892): eth.dst == fa:16:3e:97:9e:54 && inport == "lrp-d652c4", priority 50, uuid 51faa532
    next;
 7. lr_in_ip_routing (ovn-northd.c:4474): ip4.dst == 10.46.22.192/26, priority 53, uuid eb1bc3b6
    ip.ttl--;
    reg0 = ip4.dst;                                                                                                                                                                                        [53/1345]
    reg1 = 10.46.22.227;
    eth.src = fa:16:3e:48:ab:52;
    outport = "lrp-ac16ea";
    flags.loopback = 1;
    next;
 8. lr_in_arp_resolve (ovn-northd.c:6198): ip4, priority 0, uuid 7c3bb779
    get_arp(outport, reg0);
    /* MAC binding to fa:16:3e:1c:7d:58. */
    next;
 9. lr_in_gw_redirect (ovn-northd.c:5655): ip4.src == 192.168.99.22 && outport == "lrp-ac16ea", priority 100, uuid 455088b9
    next;
10. lr_in_arp_request (ovn-northd.c:6305): 1, priority 0, uuid 99698a3c
    output;

egress(dp="openshift-ansible-openshift.example.com-router", inport="lrp-d652c4", outport="lrp-ac16ea")
------------------------------------------------------------------------------------------------------
 0. lr_out_undnat (ovn-northd.c:5577): ip && ip4.src == 192.168.99.22 && outport == "lrp-ac16ea", priority 100, uuid 4803e23a
    eth.src = fa:16:3e:42:7d:33;
    ct_dnat;

ct_dnat /* assuming no un-dnat entry, so no change */
-----------------------------------------------------
 1. lr_out_snat (ovn-northd.c:5624): ip && ip4.src == 192.168.99.22 && outport == "lrp-ac16ea", priority 33, uuid 51ff932f
    eth.src = fa:16:3e:42:7d:33;
    ct_snat(10.46.22.246);

ct_snat(ip4.src=10.46.22.246)
-----------------------------
 3. lr_out_delivery (ovn-northd.c:6333): outport == "lrp-ac16ea", priority 100, uuid 66203690
    output;
    /* output to "lrp-ac16ea", type "patch" */

ingress(dp="nova", inport="ac16ea")
-----------------------------------
 0. ls_in_port_sec_l2 (ovn-northd.c:3869): inport == "ac16ea", priority 50, uuid 187c129c
    next;
16. ls_in_l2_lkup (ovn-northd.c:4287): eth.dst == fa:16:3e:1c:7d:58 && is_chassis_resident("8e829c"), priority 50, uuid b641caf6
    outport = "34960d";
    output;

egress(dp="nova", inport="ac16ea", outport="34960d")
----------------------------------------------------
 9. ls_out_port_sec_l2 (ovn-northd.c:4346): outport == "34960d", priority 50, uuid 5903326c
    output;
    /* output to "34960d", type "patch" */

ingress(dp="openshift_dns", inport="lrp-34960d")
------------------------------------------------
 0. lr_in_admission (ovn-northd.c:5642): eth.dst == fa:16:3e:1c:7d:58 && inport == "lrp-34960d" && is_chassis_resident("8e829c"), priority 50, uuid b3c9fb5d
    next;
 3. lr_in_unsnat (ovn-northd.c:5477): ip && ip4.dst == 10.46.22.237 && inport == "lrp-34960d", priority 100, uuid 041c713b
    ct_snat;

ct_snat /* assuming no un-snat entry, so no change */
-----------------------------------------------------
 4. lr_in_dnat (ovn-northd.c:5535): ip && ip4.dst == 10.46.22.237 && inport == "lrp-34960d", priority 100, uuid c04edae5
    ct_dnat(192.168.23.12);

ct_dnat(ip4.dst=192.168.23.12)
------------------------------
 7. lr_in_ip_routing (ovn-northd.c:4474): ip4.dst == 192.168.23.0/24, priority 49, uuid 26cf3eea
    ip.ttl--;
    reg0 = ip4.dst;
    reg1 = 192.168.23.1;
    eth.src = fa:16:3e:20:18:e4;
    outport = "lrp-1f7585";
    flags.loopback = 1;
    next;
 8. lr_in_arp_resolve (ovn-northd.c:6091): outport == "lrp-1f7585" && reg0 == 192.168.23.12, priority 100, uuid 881813c2
    eth.dst = fa:16:3e:ca:51:a1;
    next;
10. lr_in_arp_request (ovn-northd.c:6305): 1, priority 0, uuid cdff6673
    output;

egress(dp="openshift_dns", inport="lrp-34960d", outport="lrp-1f7585")
---------------------------------------------------------------------
 3. lr_out_delivery (ovn-northd.c:6333): outport == "lrp-1f7585", priority 100, uuid fd6cd974
    output;
    /* output to "lrp-1f7585", type "patch" */

ingress(dp="openshift_dns", inport="1f7585")
--------------------------------------------
 0. ls_in_port_sec_l2 (ovn-northd.c:3869): inport == "1f7585", priority 50, uuid 8c591e40
    next;
 3. ls_in_pre_acl (ovn-northd.c:3109): ip && inport == "1f7585", priority 110, uuid 326c783d
    next;
14. ls_in_dns_lookup (ovn-northd.c:4129): udp.dst == 53, priority 100, uuid 56b5a6e2
    reg0[4] = dns_lookup();
    *** dns_lookup action not implemented
    next;
16. ls_in_l2_lkup (ovn-northd.c:4202): eth.dst == fa:16:3e:ca:51:a1, priority 50, uuid 7e4e7f82
    outport = "8e829c";
    output;

egress(dp="openshift_dns", inport="1f7585", outport="8e829c")
-------------------------------------------------------------
 1. ls_out_pre_acl (ovn-northd.c:3154): ip, priority 100, uuid 2b6362c4
    reg0[0] = 1;
    next;
 2. ls_out_pre_stateful (ovn-northd.c:3291): reg0[0] == 1, priority 100, uuid 65e4c7a6
    ct_next;

ct_next(ct_state=est|trk /* default (use --ct to customize) */)
---------------------------------------------------------------
 4. ls_out_acl (ovn-northd.c:3497): !ct.new && ct.est && !ct.rpl && ct_label.blocked == 0 && (outport == "8e829c" && ip4 && ip4.src == 0.0.0.0/0 && udp && udp.dst == 53), priority 2002, uuid f1798f66
    next;
 8. ls_out_port_sec_ip (ovn-northd.c:2851): outport == "8e829c" && eth.dst == fa:16:3e:ca:51:a1 && ip4.dst == {255.255.255.255, 224.0.0.0/4, 192.168.23.12}, priority 90, uuid ad465931
    next;
 9. ls_out_port_sec_l2 (ovn-northd.c:4346): outport == "8e829c" && eth.dst == {fa:16:3e:ca:51:a1}, priority 50, uuid d6404cee
    output;
    /* output to "8e829c", type "" */



datapath flows on the compute node after DNS query is sent:
recirc_id(0),in_port(6),eth(src=8e:8b:75:56:3a:4e,dst=f6:d7:83:a7:eb:9e),eth_type(0x8100),vlan(vid=130,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:77400, bytes:5717223, used:0.510s, flags:SFPR., actions
:pop_vlan,7
recirc_id(0),in_port(7),eth(src=f6:d7:83:a7:eb:9e,dst=5a:93:92:20:30:3c),eth_type(0x0800),ipv4(frag=no), packets:68952, bytes:27545264, used:0.070s, flags:SFP., actions:push_vlan(vid=130,pcp=0),6
recirc_id(0x4985),in_port(10),ct_state(-new+est-rel+rpl-inv+trk),ct_label(0/0x1),eth(src=fa:16:3e:d1:8f:36,dst=fa:16:3e:97:9e:54),eth_type(0x0800),ipv4(src=192.168.99.22,dst=10.46.22.194,proto=6,ttl=64,frag=no),
packets:7, bytes:874, used:2.130s, flags:P., actions:ct_clear,set(eth(src=fa:16:3e:42:7d:33,dst=52:54:00:52:cc:e3)),set(ipv4(src=192.168.99.22,dst=10.46.22.194,ttl=63)),ct(zone=8,nat),recirc(0x4986)
recirc_id(0x49a3),in_port(10),eth_type(0x0800),ipv4(dst=10.46.22.237,frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=2,nat(dst=192.168.23.12)),recirc(0x49a4)
recirc_id(0x49a4),in_port(10),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=fa:16:3e:42:7d:33,dst=fa:16:3e:1c:7d:58),eth_type(0x0800),ipv4(dst=192.168.23.12,proto=17,ttl=63,frag=no),udp(dst=53), pack
ets:0, bytes:0, used:never, actions:ct_clear,set(eth(src=fa:16:3e:20:18:e4,dst=fa:16:3e:ca:51:a1)),set(ipv4(dst=192.168.23.12,ttl=62)),userspace(pid=4294963040,controller(reason=1,dont_send=0,continuation=1,recir
c_id=18853,rule_cookie=0,controller_id=0,max_len=65535))
recirc_id(0),in_port(9),eth(src=8e:9f:8a:f2:03:76,dst=76:8e:e8:5c:b4:27),eth_type(0x0806), packets:1, bytes:42, used:8.758s, actions:push_vlan(vid=133,pcp=0),6
recirc_id(0),in_port(6),eth(src=5a:93:92:20:30:3c,dst=f6:d7:83:a7:eb:9e),eth_type(0x8100),vlan(vid=130,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:70527, bytes:5436434, used:0.070s, flags:SPR., actions:
pop_vlan,7
recirc_id(0x4982),in_port(4),eth_type(0x0800),ipv4(dst=10.46.22.246,frag=no), packets:14, bytes:1148, used:2.130s, flags:P., actions:ct(commit,zone=8,nat(dst=192.168.99.22)),recirc(0x4983)
recirc_id(0),in_port(10),eth(src=fa:16:3e:d1:8f:36),eth_type(0x0800),ipv4(src=192.168.99.22,dst=10.46.22.192/255.255.255.224,frag=no), packets:7, bytes:874, used:2.130s, flags:P., actions:ct(zone=1),recirc(0x4985
)
recirc_id(0),in_port(6),eth(src=0e:e1:fd:1f:30:26,dst=8e:9f:8a:f2:03:76),eth_type(0x8100),vlan(vid=133,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:80331, bytes:9639720, used:0.203s, actions:pop_vlan,9
recirc_id(0),in_port(4),ct_state(-new-est-rel-rpl-inv-trk),ct_label(0/0x1),eth(src=2c:21:31:e3:8f:00,dst=01:00:5e:00:00:0d),eth_type(0x0800),ipv4(src=10.46.22.252/255.255.255.254,dst=224.0.0.0/240.0.0.0,frag=no),
 packets:0, bytes:0, used:never, actions:3,ct_clear,ct_clear,ct_clear
recirc_id(0),in_port(6),eth(src=76:8e:e8:5c:b4:27,dst=8e:9f:8a:f2:03:76),eth_type(0x8100),vlan(vid=133,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:80286, bytes:9634320, used:0.890s, actions:pop_vlan,9
recirc_id(0),in_port(15),eth(src=fa:16:3e:05:4b:95),eth_type(0x0800),ipv4(src=192.168.99.7,dst=10.46.22.237,proto=17,frag=no), packets:14, bytes:1282, used:2.912s, actions:ct(zone=25),recirc(0x4969)
recirc_id(0),in_port(7),eth(src=f6:d7:83:a7:eb:9e,dst=c2:f0:53:71:f8:52),eth_type(0x0800),ipv4(frag=no), packets:164360, bytes:61925920, used:0.543s, flags:SFP., actions:push_vlan(vid=130,pcp=0),6
recirc_id(0),in_port(9),eth(src=8e:9f:8a:f2:03:76,dst=0e:e1:fd:1f:30:26),eth_type(0x0800),ipv4(frag=no), packets:80362, bytes:9321992, used:0.197s, actions:push_vlan(vid=133,pcp=0),6
recirc_id(0),in_port(4),ct_state(-new-est-rel-rpl-inv-trk),ct_label(0/0x1),eth(src=00:00:5e:00:02:01,dst=33:33:00:00:00:12),eth_type(0x86dd),ipv6(src=fe80:52:0:2e16::fd,dst=ff02::12,proto=112,hlimit=255,frag=no),
 packets:80175, bytes:7536450, used:0.811s, actions:3,ct_clear,ct_clear,ct_clear
recirc_id(0x4984),in_port(4),ct_state(-new+est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=fa:16:3e:97:9e:54,dst=fa:16:3e:d1:8f:36),eth_type(0x0800),ipv4(src=0.0.0.0/128.0.0.0,dst=192.168.99.22,proto=6,frag=no),tcp(
dst=22), packets:14, bytes:1148, used:2.130s, flags:P., actions:10
recirc_id(0),in_port(7),eth(src=f6:d7:83:a7:eb:9e,dst=8e:8b:75:56:3a:4e),eth_type(0x0800),ipv4(frag=no), packets:82147, bytes:29259772, used:0.511s, flags:SFP., actions:push_vlan(vid=130,pcp=0),6
recirc_id(0x496f),in_port(15),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=fa:16:3e:1c:7d:58),eth_type(0x0800),ipv4(src=10.46.22.230/255.255.255.254,dst=10.46.22.237,proto=17,ttl=63,frag=no), packets:14, bytes:1282, used:2.911s, actions:ct_clear,ct_clear,ct(zone=7,nat),recirc(0x4971)
recirc_id(0x4972),in_port(15),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=fa:16:3e:70:24:cf,dst=fa:16:3e:1c:7d:58),eth_type(0x0800),ipv4(dst=192.168.23.12,proto=17,ttl=63,frag=no),udp(dst=53), packets:14, bytes:1282, used:2.911s, actions:ct_clear,set(eth(src=fa:16:3e:20:18:e4,dst=fa:16:3e:ca:51:a1)),set(ipv4(dst=192.168.23.12,ttl=62)),userspace(pid=4294963040,controller(reason=1,dont_send=0,continuation=1,recirc_id=18803,rule_cookie=0,controller_id=0,max_len=65535))
recirc_id(0),in_port(6),eth(src=76:8e:e8:5c:b4:27,dst=8e:9f:8a:f2:03:76),eth_type(0x8100),vlan(vid=133,pcp=0),encap(eth_type(0x0806)), packets:1, bytes:64, used:8.758s, actions:pop_vlan,9
recirc_id(0x4969),in_port(15),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=fa:16:3e:05:4b:95),eth_type(0x0800),ipv4(dst=0.0.0.0/128.0.0.0,proto=17,frag=no),udp(dst=53), packets:14, bytes:1282, used:2.912s, actions:ct(commit,zone=25,label=0/0x1),userspace(pid=4294929648,controller(reason=1,dont_send=0,continuation=1,recirc_id=18796,rule_cookie=0xa81974bb,controller_id=0,max_len=65535))
recirc_id(0),in_port(4),ct_state(-new-est-rel-rpl-inv-trk),ct_label(0/0x1),eth(src=00:00:5e:00:01:01,dst=01:00:5e:00:00:12),eth_type(0x0800),ipv4(src=10.46.22.252/255.255.255.254,dst=224.0.0.0/240.0.0.0,frag=no), packets:80179, bytes:4810740, used:0.304s, actions:3,ct_clear,ct_clear,ct_clear
recirc_id(0),in_port(4),eth(src=4c:16:fc:b0:3c:02,dst=01:80:c2:00:00:00),eth_type(0/0xffff), packets:38845, bytes:2330700, used:1.258s, actions:drop
recirc_id(0x4986),in_port(10),ct_state(-new+est-rel+rpl-inv+trk),ct_label(0/0x1),eth(src=fa:16:3e:42:7d:33,dst=52:54:00:52:cc:e3),eth_type(0x0800),ipv4(src=0.0.0.0/128.0.0.0,dst=10.46.22.192/255.255.255.224,frag=no), packets:7, bytes:874, used:2.130s, flags:P., actions:ct_clear,ct_clear,ct_clear,4
recirc_id(0x4983),in_port(4),ct_state(-new+est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=52:54:00:52:cc:e3,dst=fa:16:3e:42:7d:33),eth_type(0x0800),ipv4(dst=192.168.99.22,proto=6,ttl=64,frag=no), packets:14, bytes:1148, used:2.130s, flags:P., actions:ct_clear,set(eth(src=fa:16:3e:97:9e:54,dst=fa:16:3e:d1:8f:36)),set(ipv4(dst=192.168.99.22,ttl=63)),ct(zone=1),recirc(0x4984)
recirc_id(0),tunnel(tun_id=0x0,src=172.17.2.25,dst=172.17.2.18,flags(-df+csum+key)),in_port(1),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784), packets:80280, bytes:5298480, used:0.552s, actions:userspace(pid=4294963040,slow_path(bfd))
recirc_id(0x49a2),in_port(10),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=fa:16:3e:1c:7d:58),eth_type(0x0800),ipv4(src=10.46.22.240/255.255.255.248,dst=10.46.22.237,proto=17,ttl=63,frag=no), packets:0, bytes:0, used:never, actions:ct_clear,ct_clear,ct(zone=7,nat),recirc(0x49a3)
recirc_id(0x4985),in_port(10),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=fa:16:3e:d1:8f:36),eth_type(0x0800),ipv4(dst=0.0.0.0/128.0.0.0,proto=17,frag=no),udp(dst=53), packets:0, bytes:0, used:never, actions:ct(commit,zone=1,label=0/0x1),userspace(pid=4294929842,controller(reason=1,dont_send=0,continuation=1,recirc_id=18848,rule_cookie=0xe66b6d51,controller_id=0,max_len=65535))
recirc_id(0),in_port(6),eth(src=52:54:00:5e:aa:eb,dst=14:02:ec:7c:88:31),eth_type(0x0800),ipv4(frag=no), packets:107441, bytes:968156388, used:0.011s, flags:SFP., actions:8
recirc_id(0),in_port(9),eth(src=8e:9f:8a:f2:03:76,dst=32:e9:86:dd:6f:36),eth_type(0x0800),ipv4(frag=no), packets:80325, bytes:9317700, used:0.595s, actions:push_vlan(vid=133,pcp=0),6
recirc_id(0x4971),in_port(15),eth_type(0x0800),ipv4(dst=10.46.22.237,frag=no), packets:14, bytes:1282, used:2.911s, actions:ct(commit,zone=2,nat(dst=192.168.23.12)),recirc(0x4972)
recirc_id(0),in_port(9),eth(src=8e:9f:8a:f2:03:76,dst=76:8e:e8:5c:b4:27),eth_type(0x0800),ipv4(frag=no), packets:80364, bytes:9322224, used:0.685s, actions:push_vlan(vid=133,pcp=0),6
recirc_id(0),tunnel(tun_id=0x0,src=172.17.2.21,dst=172.17.2.18,flags(-df+csum+key)),in_port(1),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784), packets:80286, bytes:5298876, used:0.890s, actions:userspace(pid=4294963040,slow_path(bfd))
recirc_id(0),in_port(10),eth(src=fa:16:3e:d1:8f:36),eth_type(0x0800),ipv4(src=192.168.99.22,dst=10.46.22.237,proto=17,frag=no), packets:0, bytes:0, used:never, actions:ct(zone=1),recirc(0x4985)
recirc_id(0),in_port(8),eth(src=14:02:ec:7c:88:31,dst=52:54:00:5e:aa:eb),eth_type(0x0800),ipv4(frag=no), packets:102046, bytes:8485106, used:0.011s, flags:SFP., actions:6
recirc_id(0),in_port(6),eth(src=c2:f0:53:71:f8:52,dst=f6:d7:83:a7:eb:9e),eth_type(0x8100),vlan(vid=130,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:149344, bytes:182007153, used:0.543s, flags:SFPR., actions:pop_vlan,7
recirc_id(0),in_port(4),ct_state(-new-est-rel-rpl-inv-trk),ct_label(0/0x1),eth(src=52:54:00:52:cc:e3,dst=fa:16:3e:42:7d:33),eth_type(0x0800),ipv4(src=10.46.22.192/255.255.255.224,dst=10.46.22.246,proto=6,ttl=64,frag=no), packets:14, bytes:1148, used:2.130s, flags:P., actions:ct_clear,ct(zone=15,nat),recirc(0x4982)
recirc_id(0),tunnel(tun_id=0x0,src=172.17.2.14,dst=172.17.2.18,flags(-df+csum+key)),in_port(1),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784), packets:80331, bytes:5301846, used:0.203s, actions:userspace(pid=4294963040,slow_path(bfd))
recirc_id(0),in_port(6),eth(src=32:e9:86:dd:6f:36,dst=8e:9f:8a:f2:03:76),eth_type(0x8100),vlan(vid=133,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:80280, bytes:9633600, used:0.552s, actions:pop_vlan,9

Comment 6 Jakub Libosvar 2019-08-15 09:59:58 UTC
[root@compute-0 ~]# ovs-dpctl show
system@ovs-system:
        lookups: hit:1860249 missed:66563 lost:0
        flows: 29
        masks: hit:14810227 total:8 hit/pkt:7.69
        port 0: ovs-system (internal)
        port 1: genev_sys_6081 (geneve: packet_type=ptap)
        port 2: br-int (internal)
        port 3: br-ex (internal)
        port 4: ens1f0
        port 5: vlan132 (internal)
        port 6: ens1f1
        port 7: vlan130 (internal)
        port 8: br-isolated (internal)
        port 9: vlan133 (internal)
        port 10: tap5396479b-5a               <------ this is source port
        port 11: tap42a0a518-b9
        port 12: tapb74b4195-be
        port 13: tapf778d143-a0
        port 14: tap77f9b441-24
        port 15: tap82eb692b-b8
        port 16: tapb3377ee0-49
        port 17: tapb87046f7-15
        port 18: tapa5bd6940-80
        port 19: tap10c031e3-5a
        port 20: tap8e829c5c-71               <------ this is destination port
        port 21: tap91f57097-40
        port 22: tapa4de0c67-50

Comment 8 Jakub Libosvar 2019-08-20 13:02:53 UTC
The problem seems to be in the controller() action that does dns_lookup() for the second time in the DNS network. Although the packet is resumed to he pipeline, it gets lost and actually dropped in the datapath.

recirc_id(0x22e),dp_hash(0),skb_priority(0),in_port(0/0xffff0000),skb_mark(0),ct_state(+new-est-rel-rpl-inv+trk-snat-dnat),ct_zone(0x1e),ct_mark(0),ct_label(0),ct_tuple4(src=10.46.22.246,dst=192.168.23.12,proto=17,tp_src=47500,tp_dst=53),eth(src=fa:16:3e:20:18:e4,dst=fa:16:3e:ca:51:a1),eth_type(0x0800),ipv4(src=10.46.22.246,dst=192.168.23.12,proto=17,tos=0,ttl=62,frag=no),udp(src=47500,dst=53), packets:0, bytes:0, used:never, actions:drop

Comment 10 Jakub Libosvar 2019-08-27 15:02:13 UTC
I talked with Udi and he said they don't use Neutron DNS in OCP. If Neutron DNS is turned off (setting dns_domain to openstacklocal) the issue is mitigated.

I also see this is marked as Regression: Udi do you have a version where this used to work? I thought the bug has been there since ever.

Comment 12 Udi Shkalim 2019-08-29 09:43:16 UTC
(In reply to Jakub Libosvar from comment #10)
> I talked with Udi and he said they don't use Neutron DNS in OCP. If Neutron
> DNS is turned off (setting dns_domain to openstacklocal) the issue is
> mitigated.
> 
> I also see this is marked as Regression: Udi do you have a version where
> this used to work? I thought the bug has been there since ever.

Just to emphasis, We don't use neutron DNS in Openshift 3.x. 
Openshift 4.x deployments are using neutron DNS.

I marked it as regression since Shelley asked me to add the keyword. You can drop the regression keyword but this issue can affect customers as a blocker.
We have a w/a - not to use the DNS domain, not sure this w/a will be applicable on customers deployments.

Comment 14 Freddy Wissing 2019-10-24 13:49:59 UTC
Hi team,

What z-stream release is this targeted for?

Comment 15 Roman Safronov 2019-10-28 13:33:08 UTC
Can not be tested on the latest OSP13 with openvswitch 2.11
http://rhos-qe-mirror-tlv.usersys.redhat.com/rcm-guest/puddles/OpenStack/13.0-RHEL-7/2019-10-18.1/

It uses openvswitch2.11-2.11.0-21.el7fdp.x86_64 and not openvswitch2.11-2.11.0-26.el8fdp

Comment 16 Alex Stupnikov 2019-12-23 10:27:55 UTC
Hello.

According to [1] we already released openvswitch2.11-2.11.0-26.el7fdp.x86_64.rpm package. Can we triage this bug?

[1] https://access.redhat.com/downloads/content/rhel---7/x86_64/6671/openvswitch2.11/2.11.0-26.el7fdp/x86_64/fd431d51/package

Regards, Alex.

Comment 18 Eduardo Olivares 2020-02-19 16:23:15 UTC
Verified on OSP13

puddle 2020-02-10.8
openvswitch2.11-2.11.0-35.el7fdp.x86_64

verification procedure
1- create two different tenant networks and one subnet for each
2- create a router that connects these subnets
3- create a security rule (SR) for ingress DNS traffic (udp port 53)
4- create two servers, vm1 and vm2 with different subnets and with the previous SR
5- run tcpdump on both servers: tcpdump -n -i ens3 udp and port 53
6- at vm2, run a script that listens at UDP port 53 and answers: echo -n -e "wrong response!" | sudo nc -u -w1 -l 53
7- at vm1, send a DNS query towards vm2: host foo.com <vm2_ip_address>

Check that queries are received at vm2 and responses are received at vm1 (although responses will not be a valid answer for the host command).

I have also verified (THANKS, JAKUB) that the flows whose n_packets are incremented are these ones in the source compute:
 cookie=0xc8711a64, duration=8642.817s, table=22, n_packets=222, n_bytes=18558, idle_age=306, priority=100,udp,metadata=0x1d8,tp_dst=53 actions=controller(userdata=00.00.00.06.00.00.00.00.00.01.de.10.00.00.00.64,pause),resubmit(,23)
 cookie=0x66352cc2, duration=8630.177s, table=22, n_packets=60, n_bytes=4020, idle_age=306, priority=100,udp,metadata=0x1d9,tp_dst=53 actions=controller(userdata=00.00.00.06.00.00.00.00.00.01.de.10.00.00.00.64,pause),resubmit(,23)

And this ones in the destination compute:
 cookie=0x8076f5ed, duration=8706.927s, table=44, n_packets=52, n_bytes=3484, idle_age=389, priority=2002,ct_state=+new-est+trk,udp,reg15=0x3,metadata=0x1d9,tp_dst=53 actions=load:0x1->NXM_NX_XXREG0[97],resubmit(,45)

They have metadata=0x1d8 and 0x1d9, which correspond with the two tenant networks previously created (tunnel_keys=473 and 474 respectively)

So we see now that after OVN returns no answer for the dns_lookup call, the packet is forwarded to its destination address.

Comment 21 errata-xmlrpc 2020-03-10 11:52:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0769