Description of problem: When creating an ACL with logging enabled, only client to server packets are logged. This is important as it's a critical part of the Security Group Logging RFE in Openstack (BZ1619266). Version-Release number of selected component (if applicable): We would need this in ovn-2021, but this is affecting all of the releases up until the upstream main branch. How reproducible: Steps to Reproduce: 1.Create ACL with action 'allow-related' and logs enabled _uuid : e1929bcc-26b3-42aa-b13c-38b6766b25b3 action : allow-related direction : from-lport external_ids : {"neutron:security_group_rule_id"="bf7a7176-fe7b-4fa8-a492-aa0108c744e0"} label : 0 log : true match : "inport == @pg_7ddfbe4e_86ec_42a6_b8aa_b05cba7ad56a && ip4" meter : acl_log_meter name : neutron-edea5d09-fd79-49cc-9ae4-2d0d2cbbfa1e priority : 1002 severity : info 2. Ping one of the servers associated to that ACL. Check in ovn_controller.log how only client-to-server packets are logged. 2021-12-10T08:35:58.745Z|00013|acl_log(ovn_pinctrl0)|INFO|name="neutron-edea5d09-fd79-49cc-9ae4-2d0d2cbbfa1e", verdict=allow, severity=info: icmp,vlan_tci=0x0000,dl_src=fa:16:3e:06:51:a4,dl_dst=fa:16:3e:c6:4a:86,nw_src=172.24.4.1,nw_dst=10.100.0.106,nw_tos=0,nw_ecn=0,nw_ttl=63,icmp_type=8,icmp_code=0 2021-12-10T08:35:59.759Z|00014|acl_log(ovn_pinctrl0)|INFO|name="neutron-edea5d09-fd79-49cc-9ae4-2d0d2cbbfa1e", verdict=allow, severity=info: icmp,vlan_tci=0x0000,dl_src=fa:16:3e:06:51:a4,dl_dst=fa:16:3e:c6:4a:86,nw_src=172.24.4.1,nw_dst=10.100.0.106,nw_tos=0,nw_ecn=0,nw_ttl=63,icmp_type=8,icmp_code=0 3. If we check the flows on the switch, there are no OpenFlow rules that have a logging action for the returned traffic. a. ICMP request (client-to-server) cookie=0x3c29ab85, duration=15.254s, table=44, n_packets=1, n_bytes=98, priority=2002,icmp,reg0=0x80/0x80,reg15=0x3,metadata=0x2 actions=load:0x1->NXM_NX_XXREG0[97],controller(userdata=00.00.00.07.00.00.00.00.00.06.6e.65.75.74.72.6f.6e.2d.38.34.61.61.30.64.65.61.2d.35.33.63.31.2d.34.39.33.62.2d.61.33.33.39.2d.62.38.37.63.66.32.65.62.35.61.34.37,meter_id=1),resubmit(,45) b. ICMP reply (server-to-client) cookie=0xc7091af2, duration=15.251s, table=44, n_packets=1, n_bytes=98, priority=65532,ct_state=-new+est-rel+rpl-inv+trk,ct_label=0/0x1,metadata=0x2 actions=resubmit(,45) Actual results: We only get half of the connection logged for that ACL Expected results: Full stream of packets logged, both server-to-client and client-to-server. Additional info: Not sure if this helps in any way, but I tried to manually change the ACL action to allow-stateless and the problem is not present in this case: 2021-12-10T09:46:02.766Z|00051|acl_log(ovn_pinctrl0)|INFO|name="neutron-edea5d09-fd79-49cc-9ae4-2d0d2cbbfa1e", verdict=allow, severity=info: icmp,vlan_tci=0x0000,dl_src=fa:16:3e:06:51:a4,dl_dst=fa:16:3e:c6:4a:86,nw_src=172.24.4.1,nw_dst=10.100.0.106,nw_tos=0,nw_ecn=0,nw_ttl=63,icmp_type=8,icmp_code=0 2021-12-10T09:46:02.766Z|00052|acl_log(ovn_pinctrl0)|INFO|name="neutron-edea5d09-fd79-49cc-9ae4-2d0d2cbbfa1e", verdict=allow, severity=info: icmp,vlan_tci=0x0000,dl_src=fa:16:3e:c6:4a:86,dl_dst=fa:16:3e:06:51:a4,nw_src=10.100.0.106,nw_dst=172.24.4.1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
What logical ports are in the "pg_7ddfbe4e_86ec_42a6_b8aa_b05cba7ad56a" group? If the "server" port is not in that port group, then the traffic will not get logged, and this is expected. Since the ACL is specified as "from-lport" then only traffic that is received on a port in that port group will log anything. The "from-lport" and "to-lport" designations add the ACLs either to the ingress or egress pipelines for the switch. If it's desired to log return traffic, then either a) You can ensure the server port on the switch is in the specified port group. b) You can add a similar "to-lport" rule for the port group. c) You can file an RFE so that for allow-related traffic, return traffic can (optionally) be logged.
The option a. is not applicable to our scenario since the "server" does not have to be located in the same port group. Option b might be doable, but we still think this should not be necessary since there are other actions like "allow-stateless" that logs all allowed packets by that ACL. That's why we insist this is not really an RFE but expected behavior. I definitely believe having an optional log for related traffic in "allow-related" but then logging everything in "allow-stateless" by default would be inconsistent behavior.
> I definitely believe having an optional log for related traffic in "allow-related" but then logging everything in "allow-stateless" by default would be inconsistent behavior This struck me as odd, because the behavior *should* be consistent between the two actions. I set up a small test using ovn-fake-multinode, and I was able to reproduce the behavior. The key is that both the "client" and "server" ports need to be in the port group. I will review this more to determine what is going on here. It's possible this is the intended behavior and it's possible that it's a bug. I'll give more information once I have it.
I've taken a look, and I think I have a grasp on what is happening. When a stateful ACL is created on a logical switch, special flows at priority 65532 are installed that allow for *all* related and reply traffic to pass. These flows are higher priority than any ACL and are installed so that reply traffic isn't "accidentally" blocked by an ACL. The problem with these flows is that they are one-size-fits-all, so they are not installed per-ACL. When ACL logging is disabled, this is perfectly fine, traffic flows as expected. When ACL logging is enabled, though, this results in reply traffic not being logged. I thought about this, and I came up with three ideas: 1. Remove the high-priority flows. Instead, add flows alongside the ACL flows in table 17 that check the CT state and the ACL match. If it's a reply, and the ACL matches, then allow the traffic and log the match. 2. Add more high-priority flows that also include the ACL match. These would allow the traffic and also log it. 3. Add more high-priority flows that check the ct_label to see if they match the ACL label. If they match, then allow the traffic and log it. Of these, I think 2 is probably the best idea since it cannot be defeated by other ACLs and it doesn't require setting a label on the ACL. The nice thing is that these extra flows only need to be installed for ACLs on which logging is enabled, so it won't universally result in a higher flow count. I will work on a patch and post it for review when I have it ready.
I have a patch written that worked for my setup. I need to ensure that it does not cause test failures before I can post it for review. I also will need to create a test case or two before posting. It will be some time tomorrow that I post the patch.
I realized while writing tests that my patch could result in ambiguity when logging the reply traffic. My initial patch attempted to put all reply traffic logging at the same priority. The problem is that it is possible for reply traffic to match multiple ACLs. So the result is that the request traffic could appear in the logs as one ACL but the reply traffic could get logged as a different one. Example: ACL 1: priority 800 "inport == @pg1 && ip4" ACL 2: priority 400 "inport == @pg1" In this case, let's imagine that the client and server ports are part of pg1. The client sends a request to the server, and ACL1 is matched. So it gets logged. Then the reply comes from the server. If the reply comparison happens at the same priority, then it is unknown if ACL1 or ACL2 will get logged for the reply traffic. To disambiguate, the reply traffic logging needs to be placed at the same relative priorities as the ACLs themselves. However, the reply traffic needs to be at a higher priority than all ACLs (since otherwise, reply traffic could mistakenly be blocked by an unrelated ACL). The problem now is that there aren't enough numbers in the priority space to guarantee that we can fit all the reply logging flows. So that leaves two possible options: 1. Go with option 3 from my previous comment. This allows for us to match on ACL label + ACL match so that we can uniquely identify which ACL the reply traffic belongs to. However, it requires ACLs to have labels in order to have their replies logged. One option here would be to have ovn-northd implicitly apply labels to ACLs if they aren't explicitly assigned by the CMS. 2. Add a new table to ingress and egress processing between the pre_acl and acl tables whose sole purpose is to log ACLs. This table would more-or-less replicate the ACL table, except that rather than actually taking an action, it will only log the verdict that the following table will take. Both of these will add the same number of flows to the logical flow table, so neither is at an advantage there. Option 1 has the disadvantage of having to track additional state in ovn-northd (ACL labels), and option 2 has the disadvantage of shifting table numbers again, and it has the disadvantage of separating the logging logic from the allow/drop logic, which has the potential for bugs. Right now, I'm leaning toward using option 1, but I'm not married to it. I'll think on this some more to see if there is another possible way to handle this beyond what I've thought of.
I've been thinking about this a bit more, and I think reply traffic for stateful ACLs is a special case. For instance, let's assume a simple setup where we have a logical switch sw, and it has two ports, swp1 and swp2. We'll assume the client is on a VM attached to swp1 and the server is on a VM attached to swp2. Scenario 1: First, we create the following two ACLs: ACL1: from-lport 'inport == "swp1" && ip4' allow-stateless ACL2: from-lport 'inport == "swp2" && ip4' allow-stateless When the client sends traffic to the server, ACL1 matches and allows the traffic. When the server replies, ACL2 matches and allows the traffic. In this case, we should expect to see logs showing ACL1 matching the request and ACL2 matching the reply. Scenario 2: Now let's change the verdict on the ACLs ACL1: from-lport 'inport == "swp1" && ip4' allow-related ACL2: from-lport 'inport == "swp2" && ip4' allow-related When the client sends traffic to the server, ACL1 matches and allows the traffic. When the server replies, the reason the traffic is allowed is not because the reply traffic matches ACL2 but because ACL1 is set up to allow related traffic. We definitely log the request as having matched ACL1. But does it make sense to log a match on ACL2 in this case for the reply traffic? I would argue not since that ACL was not ever actually considered when the traffic was allowed through. Scenario 3: To really nail home the point, let's change ACL2 to drop traffic instead of allowing it. ACL1: from-lport 'inport == "swp1" && ip4' allow-related ACL2: from-lport 'inport == "swp2" && ip4' drop When the client sends traffic to the server, ACL1 matches and allows the traffic. When the server replies, even though the reply traffic would match ACL2, the reply traffic is allowed since ACL1 said to allow related traffic to pass. We definitely log the request as having matched ACL1. ACL2 definitely should not be logged for the reply in this case because the traffic is being allowed, not dropped. So now let's go back to the original scenario that was brought up in this issue. The situation was less clear since instead of having two different ACLs for the client and server, you have a single ACL that matches on multiple ports in a port group. When you set the verdict on that ACL to "allow-stateless" you basically had the behavior of scenario 1. The request and reply traffic happened to match on the same ACL, so both the request and reply traffic got logged. When the ACL was set to "allow-related", you basically had the behavior of scenario 2. The reply traffic meets the match criteria of the ACL, but it wasn't because of this match that the traffic was allowed; it was allowed because the initial match on the request automatically allowed the reply traffic to pass. The current behavior isn't inconsistent, but it's not exactly user-friendly either. From your point of view, the reply traffic appears to match the ACL, so why isn't it logged? It's because the reply traffic is allowed as a special case of stateful ACLs. As such, if we are going to log this sort of reply traffic, it also needs to be special-cased. We don't want it to appear that the reply traffic was allowed because it happened to match an ACL, we want the log to reflect that the reply traffic was logged because it is related to a stateful ACL. Because adding this sort of logging is not currently implemented, and because it results in more flows having to be added, I think this sort of logging should be made optional. Therefore, I plan to add an option to ACLs that states whether to log reply traffic or not. In order for this to work properly, it will be required for ACLs to have logging enabled and to have a label set on them. I will work up a patch today or tomorrow.
Patch posted here: https://patchwork.ozlabs.org/project/ovn/patch/20220201141118.1846390-1-mmichels@redhat.com/
Should this be on modified? I think the patch was already merged, right? @mmichels
ovn-2021 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2116619
The issue can be reproduced with any build using the following script. However using the log-related options in the allow-related acl, the ovn-controller will start logging reply packets as well. ############ VM2 ################ systemctl start ovn-northd systemctl start openvswitch ovs-vsctl set open . external_ids:system-id=hv0 ifconfig ens1f0 42.42.42.2 netmask 255.255.255.0 ovs-vsctl set open . external_ids:ovn-remote=tcp:42.42.42.1:6642 ovs-vsctl set open . external_ids:ovn-encap-type=geneve ovs-vsctl set open . external_ids:ovn-encap-ip=42.42.42.2 systemctl start ovn-controller ip netns add vm2 ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal ip link set vm2 netns vm2 ip netns exec vm2 ip link set vm2 address 00:00:00:00:00:02 ip netns exec vm2 ip addr add 42.42.42.2/24 dev vm2 ip netns exec vm2 ip addr add 2001::2/64 dev vm2 ip netns exec vm2 ip link set vm2 up ip netns exec vm2 ip link set lo up ovs-vsctl set Interface vm2 external_ids:iface-id=vm2 ip netns add vm3 ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal ip link set vm3 netns vm3 ip netns exec vm3 ip link set vm3 address 00:00:00:00:00:03 ip netns exec vm3 ip addr add 42.42.42.3/24 dev vm3 ip netns exec vm3 ip addr add 2001::3/64 dev vm3 ip netns exec vm3 ip link set vm3 up ip netns exec vm3 ip link set lo up ovs-vsctl set Interface vm3 external_ids:iface-id=vm3 ######## VM1 ############## systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 systemctl start openvswitch ovs-vsctl set open . external_ids:system-id=hv1 ifconfig ens1f0 42.42.42.1 netmask 255.255.255.0 ovs-vsctl set open . external_ids:ovn-remote=tcp:42.42.42.1:6642 ovs-vsctl set open . external_ids:ovn-encap-type=geneve ovs-vsctl set open . external_ids:ovn-encap-ip=42.42.42.1 systemctl start ovn-controller ovn-nbctl ls-add ls ovn-nbctl lsp-add ls vm1 ovn-nbctl lsp-set-addresses vm1 00:00:00:00:00:01 ovn-nbctl lsp-add ls vm2 ovn-nbctl lsp-set-addresses vm2 00:00:00:00:00:02 ovn-nbctl lsp-add ls vm3 ovn-nbctl lsp-set-addresses vm3 00:00:00:00:00:03 ip netns add vm1 ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal ip link set vm1 netns vm1 ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01 ip netns exec vm1 ip addr add 42.42.42.1/24 dev vm1 ip netns exec vm1 ip addr add 2001::1/64 dev vm1 ip netns exec vm1 ip link set vm1 up ovs-vsctl set Interface vm1 external_ids:iface-id=vm1 ovn-nbctl pg-add pg1 vm1 vm2 ovn-nbctl --log --severity=info --label=100 --name=acl_test acl-add ls from-lport 1002 "inport==@pg1 && ip" allow-related ovn-nbctl --wait=sb sync ip netns exec vm1 ping 42.42.42.2 -c 1 ip netns exec vm1 ping 42.42.42.3 -c 1 ip netns exec vm1 ping6 2001::2 -c 1 ip netns exec vm1 ping6 2001::3 -c 1 cat /var/log/ovn/ovn-controller.log |grep nw_src=42.42.42.2 # no result cat /var/log/ovn/ovn-controller.log |grep nw_src=42.42.42.3 # no result cat /var/log/ovn/ovn-controller.log |grep ipv6_src=2001::2 # no result cat /var/log/ovn/ovn-controller.log |grep ipv6_src=2001::3 # no result acl_uuid=$(ovn-nbctl --bare --column _uuid,label find acl |grep -B1 100 |head -1) ovn-nbctl set acl $acl_uuid options:log-related=true ovn-nbctl --wait=sb sync ip netns exec vm1 ping 42.42.42.2 -c 1 ip netns exec vm1 ping 42.42.42.3 -c 1 ip netns exec vm1 ping6 2001::2 -c 1 ip netns exec vm1 ping6 2001::3 -c 1 [root@dell-per740-81 ~]# cat /var/log/ovn/ovn-controller.log |grep nw_src=42.42.42.2 2022-09-26T09:15:17.445Z|00009|acl_log(ovn_pinctrl0)|INFO|name="acl_test", verdict=allow, severity=info, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.2,nw_dst=42.42.42.1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 [root@dell-per740-81 ~]# cat /var/log/ovn/ovn-controller.log |grep nw_src=42.42.42.3 2022-09-26T09:15:17.466Z|00011|acl_log(ovn_pinctrl0)|INFO|name="acl_test", verdict=allow, severity=info, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=00:00:00:00:00:03,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.3,nw_dst=42.42.42.1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 [root@dell-per740-81 ~]# cat /var/log/ovn/ovn-controller.log |grep ipv6_src=2001::2 2022-09-26T09:15:17.491Z|00013|acl_log(ovn_pinctrl0)|INFO|name="acl_test", verdict=allow, severity=info, direction=to-lport: icmp6,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,ipv6_src=2001::2,ipv6_dst=2001::1,ipv6_label=0xe6bd4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=129,icmp_code=0 [root@dell-per740-81 ~]# cat /var/log/ovn/ovn-controller.log |grep ipv6_src=2001::3 2022-09-26T09:15:17.517Z|00015|acl_log(ovn_pinctrl0)|INFO|name="acl_test", verdict=allow, severity=info, direction=to-lport: icmp6,vlan_tci=0x0000,dl_src=00:00:00:00:00:03,dl_dst=00:00:00:00:00:01,ipv6_src=2001::3,ipv6_dst=2001::1,ipv6_label=0xa3e83,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=129,icmp_code=0 Verified on: [root@dell-per740-81 ~]# rpm -qa |grep -E "ovn|openvswitch" ovn-2021-host-21.12.0-93.el8fdp.x86_64 ovn-2021-21.12.0-93.el8fdp.x86_64 ovn-2021-central-21.12.0-93.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch openvswitch2.17-2.17.0-50.el8fdp.x86_64 Also verified on: [root@dell-per740-81 ~]# rpm -qa |grep -E "ovn|openvswitch" ovn22.03-22.03.0-95.el8fdp.x86_64 ovn22.03-host-22.03.0-95.el8fdp.x86_64 ovn22.03-central-22.03.0-95.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-29.el8fdp.noarch openvswitch2.17-2.17.0-50.el8fdp.x86_64 [root@dell-per740-81 ~]# rpm -qa |grep -E "ovn|openvswitch" ovn22.06-22.06.0-57.el8fdp.x86_64 ovn22.06-central-22.06.0-57.el8fdp.x86_64 ovn22.06-host-22.06.0-57.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-29.el8fdp.noarch openvswitch2.17-2.17.0-37.el8fdp.x86_64 [root@wsfd-advnetlab155 ~]# rpm -qa | grep -E 'ovn|openvswitch' openvswitch-selinux-extra-policy-1.0-31.el9fdp.noarch openvswitch2.17-2.17.0-30.el9fdp.x86_64 ovn-2021-21.12.0-93.el9fdp.x86_64 ovn-2021-central-21.12.0-93.el9fdp.x86_64 ovn-2021-host-21.12.0-93.el9fdp.x86_64 [root@wsfd-advnetlab155 ~]# rpm -qa | grep -E 'ovn|openvswitch' openvswitch-selinux-extra-policy-1.0-31.el9fdp.noarch openvswitch2.17-2.17.0-43.el9fdp.x86_64 ovn22.03-22.03.0-95.el9fdp.x86_64 ovn22.03-central-22.03.0-95.el9fdp.x86_64 ovn22.03-host-22.03.0-95.el9fdp.x86_64 [root@wsfd-advnetlab155 ~]# rpm -qa | grep -E 'ovn|openvswitch' openvswitch-selinux-extra-policy-1.0-31.el9fdp.noarch openvswitch2.17-2.17.0-43.el9fdp.x86_64 ovn22.06-22.06.0-57.el9fdp.x86_64 ovn22.06-central-22.06.0-57.el9fdp.x86_64 ovn22.06-host-22.06.0-57.el9fdp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn-2021 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:6869