The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 2031150 - Only client to server packets are logged when ACL action is "allow-related"
Summary: Only client to server packets are logged when ACL action is "allow-related"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn-2021
Version: FDP 21.I
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Mark Michelson
QA Contact: Ehsan Elahi
URL:
Whiteboard:
Depends On:
Blocks: 1619266 1990279 2152877 2181805
TreeView+ depends on / blocked
 
Reported: 2021-12-10 15:56 UTC by Elvira
Modified: 2023-03-26 06:40 UTC (History)
4 users (show)

Fixed In Version: ovn-2021-21.12.0-88.el8fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-11 12:24:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1692 0 None None None 2021-12-10 16:01:16 UTC
Red Hat Product Errata RHBA-2022:6869 0 None None None 2022-10-11 12:24:20 UTC

Description Elvira 2021-12-10 15:56:47 UTC
Description of problem:
When creating an ACL with logging enabled, only client to server packets are logged. This is important as it's a critical part of the Security Group Logging RFE in Openstack (BZ1619266).

Version-Release number of selected component (if applicable):
We would need this in ovn-2021, but this is affecting all of the releases up until the upstream main branch.

How reproducible:


Steps to Reproduce:
1.Create ACL with action 'allow-related' and logs enabled

_uuid               : e1929bcc-26b3-42aa-b13c-38b6766b25b3
action              : allow-related
direction           : from-lport
external_ids        : {"neutron:security_group_rule_id"="bf7a7176-fe7b-4fa8-a492-aa0108c744e0"}
label               : 0
log                 : true
match               : "inport == @pg_7ddfbe4e_86ec_42a6_b8aa_b05cba7ad56a && ip4"
meter               : acl_log_meter
name                : neutron-edea5d09-fd79-49cc-9ae4-2d0d2cbbfa1e
priority            : 1002
severity            : info

2. Ping one of the servers associated to that ACL. Check in ovn_controller.log how only client-to-server packets are logged. 

2021-12-10T08:35:58.745Z|00013|acl_log(ovn_pinctrl0)|INFO|name="neutron-edea5d09-fd79-49cc-9ae4-2d0d2cbbfa1e", verdict=allow, severity=info: icmp,vlan_tci=0x0000,dl_src=fa:16:3e:06:51:a4,dl_dst=fa:16:3e:c6:4a:86,nw_src=172.24.4.1,nw_dst=10.100.0.106,nw_tos=0,nw_ecn=0,nw_ttl=63,icmp_type=8,icmp_code=0                                                                                                                          

2021-12-10T08:35:59.759Z|00014|acl_log(ovn_pinctrl0)|INFO|name="neutron-edea5d09-fd79-49cc-9ae4-2d0d2cbbfa1e", verdict=allow, severity=info: icmp,vlan_tci=0x0000,dl_src=fa:16:3e:06:51:a4,dl_dst=fa:16:3e:c6:4a:86,nw_src=172.24.4.1,nw_dst=10.100.0.106,nw_tos=0,nw_ecn=0,nw_ttl=63,icmp_type=8,icmp_code=0                                       

3. If we check the flows on the switch, there are no OpenFlow rules that have a logging action for the returned traffic.

a. ICMP request (client-to-server)

cookie=0x3c29ab85, duration=15.254s, table=44, n_packets=1, n_bytes=98, priority=2002,icmp,reg0=0x80/0x80,reg15=0x3,metadata=0x2 actions=load:0x1->NXM_NX_XXREG0[97],controller(userdata=00.00.00.07.00.00.00.00.00.06.6e.65.75.74.72.6f.6e.2d.38.34.61.61.30.64.65.61.2d.35.33.63.31.2d.34.39.33.62.2d.61.33.33.39.2d.62.38.37.63.66.32.65.62.35.61.34.37,meter_id=1),resubmit(,45)

b. ICMP reply (server-to-client)

cookie=0xc7091af2, duration=15.251s, table=44, n_packets=1, n_bytes=98, priority=65532,ct_state=-new+est-rel+rpl-inv+trk,ct_label=0/0x1,metadata=0x2 actions=resubmit(,45)


Actual results:
We only get half of the connection logged for that ACL

Expected results:
Full stream of packets logged, both server-to-client and client-to-server.

Additional info:
Not sure if this helps in any way, but I tried to manually change the ACL action to allow-stateless and the problem is not present in this case:

2021-12-10T09:46:02.766Z|00051|acl_log(ovn_pinctrl0)|INFO|name="neutron-edea5d09-fd79-49cc-9ae4-2d0d2cbbfa1e", verdict=allow, severity=info: icmp,vlan_tci=0x0000,dl_src=fa:16:3e:06:51:a4,dl_dst=fa:16:3e:c6:4a:86,nw_src=172.24.4.1,nw_dst=10.100.0.106,nw_tos=0,nw_ecn=0,nw_ttl=63,icmp_type=8,icmp_code=0                                       

2021-12-10T09:46:02.766Z|00052|acl_log(ovn_pinctrl0)|INFO|name="neutron-edea5d09-fd79-49cc-9ae4-2d0d2cbbfa1e", verdict=allow, severity=info: icmp,vlan_tci=0x0000,dl_src=fa:16:3e:c6:4a:86,dl_dst=fa:16:3e:06:51:a4,nw_src=10.100.0.106,nw_dst=172.24.4.1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0

Comment 1 Mark Michelson 2022-01-07 14:18:32 UTC
What logical ports are in the "pg_7ddfbe4e_86ec_42a6_b8aa_b05cba7ad56a" group? If the "server" port is not in that port group, then the traffic will not get logged, and this is expected. Since the ACL is specified as "from-lport" then only traffic that is received on a port in that port group will log anything. The "from-lport" and "to-lport" designations add the ACLs either to the ingress or egress pipelines for the switch. If it's desired to log return traffic, then either

a) You can ensure the server port on the switch is in the specified port group.
b) You can add a similar "to-lport" rule for the port group.
c) You can file an RFE so that for allow-related traffic, return traffic can (optionally) be logged.

Comment 2 Elvira 2022-01-10 17:05:27 UTC
The option a. is not applicable to our scenario since the "server" does not have to be located in the same port group. Option b might be doable, but we still think this should not be necessary since there are other actions like "allow-stateless" that logs all allowed packets by that ACL. That's why we insist this is not really an RFE but expected behavior. I definitely believe having an optional log for related traffic in "allow-related" but then logging everything in "allow-stateless" by default would be inconsistent behavior.

Comment 3 Mark Michelson 2022-01-24 21:19:35 UTC
> I definitely believe having an optional log for related traffic in "allow-related" but then logging everything in "allow-stateless" by default would be inconsistent behavior

This struck me as odd, because the behavior *should* be consistent between the two actions. I set up a small test using ovn-fake-multinode, and I was able to reproduce the behavior. The key is that both the "client" and "server" ports need to be in the port group. I will review this more to determine what is going on here. It's possible this is the intended behavior and it's possible that it's a bug. I'll give more information once I have it.

Comment 4 Mark Michelson 2022-01-25 21:12:12 UTC
I've taken a look, and I think I have a grasp on what is happening. When a stateful ACL is created on a logical switch, special flows at priority 65532 are installed that allow for *all* related and reply traffic to pass. These flows are higher priority than any ACL and are installed so that reply traffic isn't "accidentally" blocked by an ACL.

The problem with these flows is that they are one-size-fits-all, so they are not installed per-ACL. When ACL logging is disabled, this is perfectly fine, traffic flows as expected. When ACL logging is enabled, though, this results in reply traffic not being logged.

I thought about this, and I came up with three ideas:

1. Remove the high-priority flows. Instead, add flows alongside the ACL flows in table 17 that check the CT state and the ACL match. If it's a reply, and the ACL matches, then allow the traffic and log the match.
2. Add more high-priority flows that also include the ACL match. These would allow the traffic and also log it.
3. Add more high-priority flows that check the ct_label to see if they match the ACL label. If they match, then allow the traffic and log it.


Of these, I think 2 is probably the best idea since it cannot be defeated by other ACLs and it doesn't require setting a label on the ACL. The nice thing is that these extra flows only need to be installed for ACLs on which logging is enabled, so it won't universally result in a higher flow count. I will work on a patch and post it for review when I have it ready.

Comment 5 Mark Michelson 2022-01-25 21:36:15 UTC
I have a patch written that worked for my setup. I need to ensure that it does not cause test failures before I can post it for review. I also will need to create a test case or two before posting. It will be some time tomorrow that I post the patch.

Comment 6 Mark Michelson 2022-01-26 21:59:09 UTC
I realized while writing tests that my patch could result in ambiguity when logging the reply traffic. My initial patch attempted to put all reply traffic logging at the same priority. The problem is that it is possible for reply traffic to match multiple ACLs. So the result is that the request traffic could appear in the logs as one ACL but the reply traffic could get logged as a different one.

Example:

ACL 1: priority 800 "inport == @pg1 && ip4"
ACL 2: priority 400 "inport == @pg1"

In this case, let's imagine that the client and server ports are part of pg1. The client sends a request to the server, and ACL1 is matched. So it gets logged. Then the reply comes from the server. If the reply comparison happens at the same priority, then it is unknown if ACL1 or ACL2 will get logged for the reply traffic.

To disambiguate, the reply traffic logging needs to be placed at the same relative priorities as the ACLs themselves. However, the reply traffic needs to be at a higher priority than all ACLs (since otherwise, reply traffic could mistakenly be blocked by an unrelated ACL). The problem now is that there aren't enough numbers in the priority space to guarantee that we can fit all the reply logging flows.

So that leaves two possible options:

1. Go with option 3 from my previous comment. This allows for us to match on ACL label + ACL match so that we can uniquely identify which ACL the reply traffic belongs to. However, it requires ACLs to have labels in order to have their replies logged. One option here would be to have ovn-northd implicitly apply labels to ACLs if they aren't explicitly assigned by the CMS.

2. Add a new table to ingress and egress processing between the pre_acl and acl tables whose sole purpose is to log ACLs. This table would more-or-less replicate the ACL table, except that rather than actually taking an action, it will only log the verdict that the following table will take.

Both of these will add the same number of flows to the logical flow table, so neither is at an advantage there. Option 1 has the disadvantage of having to track additional state in ovn-northd (ACL labels), and option 2 has the disadvantage of shifting table numbers again, and it has the disadvantage of separating the logging logic from the allow/drop logic, which has the potential for bugs.

Right now, I'm leaning toward using option 1, but I'm not married to it. I'll think on this some more to see if there is another possible way to handle this beyond what I've thought of.

Comment 7 Mark Michelson 2022-01-27 14:28:49 UTC
I've been thinking about this a bit more, and I think reply traffic for stateful ACLs is a special case.

For instance, let's assume a simple setup where we have a logical switch sw, and it has two ports, swp1 and swp2. We'll assume the client is on a VM attached to swp1 and the server is on a VM attached to swp2.

Scenario 1:
First, we create the following two ACLs:

ACL1: from-lport 'inport == "swp1" && ip4' allow-stateless
ACL2: from-lport 'inport == "swp2" && ip4' allow-stateless

When the client sends traffic to the server, ACL1 matches and allows the traffic. When the server replies, ACL2 matches and allows the traffic. In this case, we should expect to see logs showing ACL1 matching the request and ACL2 matching the reply.


Scenario 2:
Now let's change the verdict on the ACLs

ACL1: from-lport 'inport == "swp1" && ip4' allow-related
ACL2: from-lport 'inport == "swp2" && ip4' allow-related

When the client sends traffic to the server, ACL1 matches and allows the traffic. When the server replies, the reason the traffic is allowed is not because the reply traffic matches ACL2 but because ACL1 is set up to allow related traffic. We definitely log the request as having matched ACL1. But does it make sense to log a match on ACL2 in this case for the reply traffic? I would argue not since that ACL was not ever actually considered when the traffic was allowed through.


Scenario 3:
To really nail home the point, let's change ACL2 to drop traffic instead of allowing it.

ACL1: from-lport 'inport == "swp1" && ip4' allow-related
ACL2: from-lport 'inport == "swp2" && ip4' drop

When the client sends traffic to the server, ACL1 matches and allows the traffic. When the server replies, even though the reply traffic would match ACL2, the reply traffic is allowed since ACL1 said to allow related traffic to pass. We definitely log the request as having matched ACL1. ACL2 definitely should not be logged for the reply in this case because the traffic is being allowed, not dropped.

So now let's go back to the original scenario that was brought up in this issue. The situation was less clear since instead of having two different ACLs for the client and server, you have a single ACL that matches on multiple ports in a port group. When you set the verdict on that ACL to "allow-stateless" you basically had the behavior of scenario 1. The request and reply traffic happened to match on the same ACL, so both the request and reply traffic got logged. When the ACL was set to "allow-related", you basically had the behavior of scenario 2. The reply traffic meets the match criteria of the ACL, but it wasn't because of this match that the traffic was allowed; it was allowed because the initial match on the request automatically allowed the reply traffic to pass.

The current behavior isn't inconsistent, but it's not exactly user-friendly either. From your point of view, the reply traffic appears to match the ACL, so why isn't it logged? It's because the reply traffic is allowed as a special case of stateful ACLs. As such, if we are going to log this sort of reply traffic, it also needs to be special-cased. We don't want it to appear that the reply traffic was allowed because it happened to match an ACL, we want the log to reflect that the reply traffic was logged because it is related to a stateful ACL.

Because adding this sort of logging is not currently implemented, and because it results in more flows having to be added, I think this sort of logging should be made optional. Therefore, I plan to add an option to ACLs that states whether to log reply traffic or not. In order for this to work properly, it will be required for ACLs to have logging enabled and to have a label set on them. I will work up a patch today or tomorrow.

Comment 9 Elvira 2022-05-19 13:40:44 UTC
Should this be on modified? I think the patch was already merged, right? @mmichels

Comment 10 OVN Bot 2022-08-09 04:06:57 UTC
ovn-2021 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2116619

Comment 14 Ehsan Elahi 2022-09-26 09:57:43 UTC
The issue can be reproduced with any build using the following script. However using the log-related options in the allow-related acl, the ovn-controller will start logging reply packets as well. 

############ VM2 ################

systemctl start ovn-northd
systemctl start openvswitch
ovs-vsctl set open . external_ids:system-id=hv0

ifconfig ens1f0 42.42.42.2 netmask 255.255.255.0

ovs-vsctl set open . external_ids:ovn-remote=tcp:42.42.42.1:6642
ovs-vsctl set open . external_ids:ovn-encap-type=geneve
ovs-vsctl set open . external_ids:ovn-encap-ip=42.42.42.2
systemctl start ovn-controller

ip netns add vm2
ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
ip link set vm2 netns vm2
ip netns exec vm2 ip link set vm2 address 00:00:00:00:00:02
ip netns exec vm2 ip addr add 42.42.42.2/24 dev vm2
ip netns exec vm2 ip addr add 2001::2/64 dev vm2
ip netns exec vm2 ip link set vm2 up
ip netns exec vm2 ip link set lo up
ovs-vsctl set Interface vm2 external_ids:iface-id=vm2

ip netns add vm3
ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal
ip link set vm3 netns vm3
ip netns exec vm3 ip link set vm3 address 00:00:00:00:00:03
ip netns exec vm3 ip addr add 42.42.42.3/24 dev vm3
ip netns exec vm3 ip addr add 2001::3/64 dev vm3
ip netns exec vm3 ip link set vm3 up
ip netns exec vm3 ip link set lo up
ovs-vsctl set Interface vm3 external_ids:iface-id=vm3

######## VM1 ##############
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
systemctl start openvswitch
ovs-vsctl set open . external_ids:system-id=hv1

ifconfig ens1f0 42.42.42.1 netmask 255.255.255.0

ovs-vsctl set open . external_ids:ovn-remote=tcp:42.42.42.1:6642
ovs-vsctl set open . external_ids:ovn-encap-type=geneve
ovs-vsctl set open . external_ids:ovn-encap-ip=42.42.42.1
systemctl start ovn-controller

ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls vm1
ovn-nbctl lsp-set-addresses vm1 00:00:00:00:00:01
ovn-nbctl lsp-add ls vm2
ovn-nbctl lsp-set-addresses vm2 00:00:00:00:00:02
ovn-nbctl lsp-add ls vm3
ovn-nbctl lsp-set-addresses vm3 00:00:00:00:00:03

ip netns add vm1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ip link set vm1 netns vm1
ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01
ip netns exec vm1 ip addr add 42.42.42.1/24 dev vm1
ip netns exec vm1 ip addr add 2001::1/64 dev vm1
ip netns exec vm1 ip link set vm1 up
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1

ovn-nbctl pg-add pg1 vm1 vm2
ovn-nbctl --log --severity=info --label=100 --name=acl_test acl-add ls from-lport 1002 "inport==@pg1 && ip" allow-related

ovn-nbctl --wait=sb sync
ip netns exec vm1 ping 42.42.42.2 -c 1
ip netns exec vm1 ping 42.42.42.3 -c 1
ip netns exec vm1 ping6 2001::2 -c 1
ip netns exec vm1 ping6 2001::3 -c 1

cat /var/log/ovn/ovn-controller.log |grep nw_src=42.42.42.2
# no result
cat /var/log/ovn/ovn-controller.log |grep nw_src=42.42.42.3
# no result
cat /var/log/ovn/ovn-controller.log |grep ipv6_src=2001::2
# no result
cat /var/log/ovn/ovn-controller.log |grep ipv6_src=2001::3
# no result

acl_uuid=$(ovn-nbctl --bare --column _uuid,label find acl |grep -B1 100 |head -1)
ovn-nbctl set acl $acl_uuid options:log-related=true

ovn-nbctl --wait=sb sync
ip netns exec vm1 ping 42.42.42.2 -c 1
ip netns exec vm1 ping 42.42.42.3 -c 1
ip netns exec vm1 ping6 2001::2 -c 1
ip netns exec vm1 ping6 2001::3 -c 1


[root@dell-per740-81 ~]# cat /var/log/ovn/ovn-controller.log |grep nw_src=42.42.42.2
2022-09-26T09:15:17.445Z|00009|acl_log(ovn_pinctrl0)|INFO|name="acl_test", verdict=allow, severity=info, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.2,nw_dst=42.42.42.1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
[root@dell-per740-81 ~]# cat /var/log/ovn/ovn-controller.log |grep nw_src=42.42.42.3
2022-09-26T09:15:17.466Z|00011|acl_log(ovn_pinctrl0)|INFO|name="acl_test", verdict=allow, severity=info, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=00:00:00:00:00:03,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.3,nw_dst=42.42.42.1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
[root@dell-per740-81 ~]# cat /var/log/ovn/ovn-controller.log |grep ipv6_src=2001::2
2022-09-26T09:15:17.491Z|00013|acl_log(ovn_pinctrl0)|INFO|name="acl_test", verdict=allow, severity=info, direction=to-lport: icmp6,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,ipv6_src=2001::2,ipv6_dst=2001::1,ipv6_label=0xe6bd4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=129,icmp_code=0
[root@dell-per740-81 ~]# cat /var/log/ovn/ovn-controller.log |grep ipv6_src=2001::3
2022-09-26T09:15:17.517Z|00015|acl_log(ovn_pinctrl0)|INFO|name="acl_test", verdict=allow, severity=info, direction=to-lport: icmp6,vlan_tci=0x0000,dl_src=00:00:00:00:00:03,dl_dst=00:00:00:00:00:01,ipv6_src=2001::3,ipv6_dst=2001::1,ipv6_label=0xa3e83,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=129,icmp_code=0


Verified on:
[root@dell-per740-81 ~]# rpm -qa |grep -E "ovn|openvswitch"
ovn-2021-host-21.12.0-93.el8fdp.x86_64
ovn-2021-21.12.0-93.el8fdp.x86_64
ovn-2021-central-21.12.0-93.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
openvswitch2.17-2.17.0-50.el8fdp.x86_64

Also verified on:
[root@dell-per740-81 ~]# rpm -qa |grep -E "ovn|openvswitch"
ovn22.03-22.03.0-95.el8fdp.x86_64
ovn22.03-host-22.03.0-95.el8fdp.x86_64
ovn22.03-central-22.03.0-95.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-29.el8fdp.noarch
openvswitch2.17-2.17.0-50.el8fdp.x86_64

[root@dell-per740-81 ~]# rpm -qa |grep -E "ovn|openvswitch"
ovn22.06-22.06.0-57.el8fdp.x86_64
ovn22.06-central-22.06.0-57.el8fdp.x86_64
ovn22.06-host-22.06.0-57.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-29.el8fdp.noarch
openvswitch2.17-2.17.0-37.el8fdp.x86_64

[root@wsfd-advnetlab155 ~]# rpm -qa | grep -E 'ovn|openvswitch'
openvswitch-selinux-extra-policy-1.0-31.el9fdp.noarch
openvswitch2.17-2.17.0-30.el9fdp.x86_64
ovn-2021-21.12.0-93.el9fdp.x86_64
ovn-2021-central-21.12.0-93.el9fdp.x86_64
ovn-2021-host-21.12.0-93.el9fdp.x86_64

[root@wsfd-advnetlab155 ~]# rpm -qa | grep -E 'ovn|openvswitch'
openvswitch-selinux-extra-policy-1.0-31.el9fdp.noarch
openvswitch2.17-2.17.0-43.el9fdp.x86_64
ovn22.03-22.03.0-95.el9fdp.x86_64
ovn22.03-central-22.03.0-95.el9fdp.x86_64
ovn22.03-host-22.03.0-95.el9fdp.x86_64

[root@wsfd-advnetlab155 ~]# rpm -qa | grep -E 'ovn|openvswitch'
openvswitch-selinux-extra-policy-1.0-31.el9fdp.noarch
openvswitch2.17-2.17.0-43.el9fdp.x86_64
ovn22.06-22.06.0-57.el9fdp.x86_64
ovn22.06-central-22.06.0-57.el9fdp.x86_64
ovn22.06-host-22.06.0-57.el9fdp.x86_64

Comment 16 errata-xmlrpc 2022-10-11 12:24:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn-2021 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6869


Note You need to log in before you can comment on or make changes to this bug.