1993841 – OVN-Kubernetes EgressFirewall blocks API server

Bug 1993841 - OVN-Kubernetes EgressFirewall blocks API server

Summary: OVN-Kubernetes EgressFirewall blocks API server

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.9
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Tim Rozet
QA Contact:	huirwang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-16 09:01 UTC by huirwang
Modified:	2022-05-25 19:39 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-05-25 19:39:44 UTC
Target Upstream Version:
Embargoed:
Flags:	trozet: needinfo- trozet: needinfo- trozet: needinfo- trozet: needinfo- trozet: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ovn-org ovn-kubernetes pull 2506	0	None	closed	Enable connectivity to the host network for egress firewall matching pods	2022-05-23 02:33:52 UTC
Github	ovn-org ovn-kubernetes pull 2786	0	None	open	Egress firewall don't block host network	2022-05-23 02:33:55 UTC

Description huirwang 2021-08-16 09:01:00 UTC

Description of problem:
OVN-Kubernetes EgressFirewall block API server , every egress firewall must allow essential accesses like the API endpoints

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-14-065522

How reproducible:
Always

Steps to Reproduce:
1.  Create a namespace test and a pod in it.
Before create egressfirewall, API server can be accessed.
oc rsh -n test hello-pod
/ #  curl -k https://172.30.0.1 -I
HTTP/2 403 
audit-id: 3ad34d91-7053-49f6-b078-23070401be10
cache-control: no-cache, private
content-type: application/json
x-content-type-options: nosniff
x-kubernetes-pf-flowschema-uid: 193f6e51-617f-49a8-b0da-ead5794486bf
x-kubernetes-pf-prioritylevel-uid: 2b134631-418d-424b-991f-e9756abdb9ef
content-length: 234
date: Tue, 03 Aug 2021 06:16:16 GMT

/ # 

2. Then create egressfirewall, with deny to 0.0.0.0/0
oc get egressfirewall -n test -o yaml
.....
  spec:
    egress:
    - to:
        cidrSelector: 0.0.0.0/0
      type: Deny
  status:
    status: EgressFirewall Rules applied
..........
oc rsh -n test hello-pod

/ # curl  -k https://172.30.0.1 -I --connect-timeout 5
curl: (28) Connection timed out after 5001 milliseconds


Actual results:
Egress firewall will block the access to API server.

Expected results:
Egressfirewall shoud not block access to API server.

Additional info:
Workaround: Add API service endpoints to allow rules.

oc get ep -n default
NAME         ENDPOINTS                                          AGE
kubernetes   10.0.50.67:6443,10.0.53.46:6443,10.0.77.215:6443   4h21m

Added ep's IP subnet to the allowed rule.
........
 spec:
    egress:
    - to:
        cidrSelector: 10.0.0.0/16
      type: Allow
    - to:
        cidrSelector: 0.0.0.0/0
      type: Deny
  status:
    status: EgressFirewall Rules applied
........

3. API Service can be accessed.
$ oc rsh -n test hello-pod
/ # 
/ # curl  -k https://172.30.0.1 -I --connect-timeout 5
HTTP/2 403 
audit-id: a5db328c-6762-4d01-95d8-7e65f4f011a9
cache-control: no-cache, private
content-type: application/json
x-content-type-options: nosniff
x-kubernetes-pf-flowschema-uid: 193f6e51-617f-49a8-b0da-ead5794486bf
x-kubernetes-pf-prioritylevel-uid: 2b134631-418d-424b-991f-e9756abdb9ef
content-length: 234
date: Tue, 03 Aug 2021 06:51:01 GMT

Comment 2 Immanuvel 2021-08-27 08:45:51 UTC

Hello Team,

Any update on this please ?

It's been almost  more than a month customer reported this issue / bug and  we are still not even progressing no where 

Thanks to provide  your inputs on this .

Regards
IMMANUVEL

Comment 11 Mridul Markandey 2022-01-20 08:11:49 UTC

Hello Team,

Can anyone please update me about the status of the Bugzilla? The customer is asking for an update? Is there any ETA for the fix?

Regards,
Mridul Markandey

Comment 12 Mridul Markandey 2022-01-20 08:26:25 UTC

Hello Team,

Can anyone please update me about the status of the Bugzilla? The customer is asking for an update? Is there any ETA for the fix?

Regards,
Mridul Markandey

Comment 17 Tim Rozet 2022-02-22 14:43:22 UTC

The openshift-sdn implementation of egress firewall does not implicitly allow access to node IPs. OVN-Kube will have functional parity with SDN features. Access to any external IP (including node IPs) must be enabled explicitly in the egress firewall rules.

Comment 18 Pablo Alonso Rodriguez 2022-02-22 15:17:21 UTC

If the reason for closing this is feature parity, please note that openshift-sdn egress firewall IS NOT applied on service cluster IPs.

A sample ofproto trace for openshift-sdn while trying to access 172.30.0.1 when there is an egressnetworkpolicy that blocks it:

$ ovs-appctl ofproto/trace br0 ip,nw_dst=172.30.0.1,nw_src=10.129.2.5,in_port=6
Flow: ip,in_port=6,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.2.5,nw_dst=172.30.0.1,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0

bridge("br0")
-------------
 0. ct_state=-trk,ip, priority 1000
    ct(table=0)
    drop
     -> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 0.
     -> Sets the packet to an untracked state, and clears all the conntrack fields.

Final flow: unchanged
Megaflow: recirc_id=0,ct_state=-trk,eth,ip,in_port=6,nw_frag=no
Datapath actions: ct,recirc(0x96b99)

===============================================================================
recirc(0x96b99) - resume conntrack with default ct_state=trk|new (use --ct-next to customize)
===============================================================================

Flow: recirc_id=0x96b99,ct_state=new|trk,eth,ip,in_port=6,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.2.5,nw_dst=172.30.0.1,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0

bridge("br0")
-------------
    thaw
        Resuming from table 0
 0. ip, priority 100
    goto_table:20
20. ip,in_port=6,nw_src=10.129.2.5, priority 100
    load:0x29f9e9->NXM_NX_REG0[]
    goto_table:21
21. priority 0
    goto_table:30
30. ip,nw_dst=172.30.0.0/16, priority 100
    goto_table:60
60. priority 200
    output:2

Final flow: recirc_id=0x96b99,ct_state=new|trk,eth,ip,reg0=0x29f9e9,in_port=6,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.2.5,nw_dst=172.30.0.1,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
Megaflow: recirc_id=0x96b99,ct_state=-rpl+trk,eth,ip,in_port=6,nw_src=10.129.2.5,nw_dst=172.30.0.0/16,nw_frag=no


This is because the service IP path in the OVS tables does not traverse table 101, where the egress network policy is applied:

ovs-ofctl -O OpenFlow13 dump-flows br0 table=101
 cookie=0x0, duration=3.060s, table=101, n_packets=0, n_bytes=0, priority=2,ip,reg0=0x29f9e9,nw_dst=1.2.3.0/24 actions=output:tun0
 cookie=0x0, duration=3.060s, table=101, n_packets=0, n_bytes=0, priority=1,ip,reg0=0x29f9e9 actions=drop
 cookie=0x0, duration=1289745.457s, table=101, n_packets=78758, n_bytes=63245038, priority=0 actions=output:tun0

So if the reason to reject this is feature parity, service IPs must not be blocked (either always or optionally). Otherwise, feature parity cannot be used as an argument to reject this.

Thanks and regards.

Comment 19 Pablo Alonso Rodriguez 2022-02-22 15:21:50 UTC

Or maybe that's a bug to solve on the openshift-sdn side, though

Comment 20 Tim Rozet 2022-02-22 16:25:42 UTC

Thanks Pablo for clarifying. After talking with Dan Winship this behavior is expected on openshift-sdn, so we need to make it the same for OVN. I'll re-open the upstream PR and suggest we need to fix pod -> service backed by node ips, not access to any node ip directly from a pod.

Comment 21 Eswar Vadla 2022-03-14 18:31:17 UTC

Hello Team,

Is there any update and case had been reopened long back.

BR
ESWAR

Comment 22 Tim Rozet 2022-03-17 19:17:27 UTC

I've been looking at potential solutions for this. Some background on egress firewall:
1. In shared gateway mode, egress firewall is implemented as ACLs applied to the join switch.
2. In local gateway mode, egress firewall is implemented as ACLs applied to the worker switch.

A typical OVNK topology looks like this:
pod-----worker switch----ovn_cluster_router---join switch---gateway router---<external network>

With OVNK service DNAT happens on the worker switch.

The desire to bypass egress firewall for packets destined for service is possible in local gateway mode because we can evaluate ACLs before the packet is load balanced or we can evaluate CT states on the packet in the worker switch post DNAT (if ct.dnat == true; then bypass egress firewall). However, in shared gateway mode by the time the packet hits the join switch we have no idea if the packet has been DNAT'ed. The CT states are cleared.

Additionally there are other bugs with egress firewall that we need to read CT state to know what to do with the packet. I think the best path forward here is to consolidate egress firewall in both gateway modes to be on the worker switch. This is going to take some more effort than just a simple bug fix so it is going to take more time. I would suggest as a workaround for now to poke holes in egress firewall for the specific k8s node backends that need connectivity. I'll provide an ETA or status update once I confirm with the rest of the OVNK community that the above approach is acceptable.

Comment 25 Tim Rozet 2022-05-23 18:45:58 UTC

After some further discussion with the OVNK upstream community as well as internal discussions about SDN behavior, we have decided that the current OVN behavior is in fact the originally desired function of egress firewall. The fact that OpenShift SDN allows services to bypass the egress firewall and access endpoints that should be blocked is a side effect of the SDN implementation and not the desired behavior. The current method to allow access to host IPs is to manually create allow rules in egress firewalls to those specific hosts, or a generic CIDR rule that encompasses the host networks. However, we understand that manually configuring this isn't ideal as nodes can come and go, and allowing an entire CIDR may not be plausible as a user may not want pods to reach other hosts on the external network. Therefore we will add a node selector to the egress firewall destination spec in OVN. That way a user can specify a label applied to one or more nodes and the kubernetes node IPs will be included in that rule. The OpenShift SDN existing behavior will not change.

Comment 26 Tim Rozet 2022-05-25 19:39:44 UTC

I pushed a patch adding this behavior upstream:
https://github.com/ovn-org/ovn-kubernetes/pull/3002

We will target 4.12 for adding this into OCP and track it in JIRA:
https://issues.redhat.com/browse/SDN-3098

For now as previously mentioned the workaround is to manually add rules to allow access to the required node IPs.

Note You need to log in before you can comment on or make changes to this bug.