Bug 2017650
| Summary: | [OVN]EgressFirewall cannot be applied correctly if cluster has windows nodes | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | huirwang |
| Component: | Networking | Assignee: | Surya Seetharaman <surya> |
| Networking sub component: | ovn-kubernetes | QA Contact: | huirwang |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | medium | CC: | surya |
| Version: | 4.7 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-10 16:22:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Apologies on the delay in getting to this bug, is this still an issue on the 4.7 cluster? If it is still an issue, can I please have access to the windows cluster where this is being observed? I am unable to reproduce this rn, so would be good to get my hands on a reproducer. Cheers, Surya. Took a look at the cluster, this is a bug and happens only on hybrid-overlay mode. We try to create acls with the following command to attach them to the node-switch in LGW (<4.8 OCP):
for _, logicalSwitch := range logicalSwitches {
if uuids == "" {
_, stderr, err := util.RunOVNNbctl("--id=@acl", "create", "acl",
fmt.Sprintf("priority=%d", priority),
fmt.Sprintf("direction=%s", toLport), match, "action="+action,
fmt.Sprintf("external-ids:egressFirewall=%s", externalID),
"--", "add", "logical_switch", logicalSwitch,
"acls", "@acl")
if err != nil {
return fmt.Errorf("error executing create ACL command, stderr: %q, %+v", stderr, err)
}
} else {
for _, uuid := range strings.Fields(uuids) {
_, stderr, err := util.RunOVNNbctl("add", "logical_switch", logicalSwitch, "acls", uuid)
if err != nil {
return fmt.Errorf("error adding ACL to joinsSwitch %s failed, stderr: %q, %+v",
logicalSwitch, stderr, err)
}
}
}
}
and logicalSwitches are constructed from:
if config.Gateway.Mode == config.GatewayModeLocal {
nodes, err := oc.watchFactory.GetNodes()
if err != nil {
return fmt.Errorf("unable to setup egress firewall ACLs on cluster nodes, err: %v", err)
}
for _, node := range nodes {
logicalSwitches = append(logicalSwitches, node.Name)
}
} else {
logicalSwitches = append(logicalSwitches, types.OVNJoinSwitch)
}
the whole list of nodes in the cluster, we need to avoid hybrid-overlay nodes because hybrid overlay nodes won't have ovn-k topology configured.
sh-4.4# ovn-nbctl ls-list
1e1e489b-eea2-49f2-97c0-6ca8522c73a1 (ext_huirwang-011347-7vxst-master-0)
9d332752-3bd7-485d-996b-2eedd19b02ee (ext_huirwang-011347-7vxst-master-1)
78dce192-e897-4701-9108-83fe47afbd07 (ext_huirwang-011347-7vxst-master-2)
41db2b20-4a2c-4a3b-9e92-17b12469932e (ext_huirwang-011347-7vxst-worker-g6dzw)
b58453d0-299f-48cd-98ee-5fa7f754d5c6 (ext_huirwang-011347-7vxst-worker-gdmzq)
12cc994f-a881-4d85-b876-e57993bac112 (huirwang-011347-7vxst-master-0)
d7a4e30d-5e35-4ec9-8c26-e1cedab6f13e (huirwang-011347-7vxst-master-1)
a4051baa-19e8-4de7-ab87-32b07156c099 (huirwang-011347-7vxst-master-2)
524a94bf-cc95-4202-9789-f2761e422eee (huirwang-011347-7vxst-worker-g6dzw)
fca70df5-a7b9-4b47-a8d8-ba1632ca70d5 (huirwang-011347-7vxst-worker-gdmzq)
ce7416ad-5200-4c2d-9cef-d43f5f845ad1 (join)
290a2ea3-8df0-44af-87bd-984a219a2eab (node_local_switch)
Setting severity and priority to medium.
https://github.com/ovn-org/ovn-kubernetes/pull/2749 posted upstream fix, Once it lands need to backport it downstream and do the nbctl equivalent of this in <4.10 releases. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |
Description of problem: [OVN]EgressFirewall cannot be applied correctly if cluster has windows nodes Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2021-10-21-232712 How reproducible: Steps to Reproduce: 1. Setup a vsphere cluster with flexy job with profile 73_IPI on vSphere 7.0 & OVN & WindowsContainer 2. Then create one test project and an EgressFirewall kind: EgressFirewall apiVersion: k8s.ovn.org/v1 metadata: name: default spec: egress: - type: Allow to: dnsName: www.badiu.com - type: Allow to: dnsName: yahoo.com ports: - protocol: TCP port: 80 - type: Deny to: cidrSelector: 0.0.0.0/0 3. Actual results: $ oc get egressfirewall -n test NAME EGRESSFIREWALL STATUS default EgressFirewall Rules not correctly added I1026 10:38:38.047876 1 egressfirewall.go:210] Adding egressFirewall default in namespace test 2021-10-26T10:38:38.061Z|02429|nbctl|INFO|Running command run -- create address_set name=a5773229689678011375 external-ids:name=www.badiu.com_v4 2021-10-26T10:38:38.072Z|02430|nbctl|INFO|Running command run --id=@acl -- create acl priority=9999 direction=to-lport "match=\"(ip4.dst == $a5773229689678011375) && ip4.src == $a5811396932658691220 && ip4.dst != 10.128.0.0/14\"" action=allow external-ids:egressFirewall=test -- add logical_switch huirwang1026a-46x5j-master-0 acls @acl 2021-10-26T10:38:38.083Z|02431|nbctl|INFO|Running command run --id=@acl -- create acl priority=9999 direction=to-lport "match=\"(ip4.dst == $a5773229689678011375) && ip4.src == $a5811396932658691220 && ip4.dst != 10.128.0.0/14\"" action=allow external-ids:egressFirewall=test -- add logical_switch huirwang1026a-46x5j-worker-rlj72 acls @acl 2021-10-26T10:38:38.095Z|02432|nbctl|INFO|Running command run --id=@acl -- create acl priority=9999 direction=to-lport "match=\"(ip4.dst == $a5773229689678011375) && ip4.src == $a5811396932658691220 && ip4.dst != 10.128.0.0/14\"" action=allow external-ids:egressFirewall=test -- add logical_switch huirwang1026a-46x5j-worker-8pd4l acls @acl 2021-10-26T10:38:38.106Z|02433|nbctl|INFO|Running command run --id=@acl -- create acl priority=9999 direction=to-lport "match=\"(ip4.dst == $a5773229689678011375) && ip4.src == $a5811396932658691220 && ip4.dst != 10.128.0.0/14\"" action=allow external-ids:egressFirewall=test -- add logical_switch winworker-mff7b acls @acl E1026 10:38:38.106660 1 ovn.go:893] error executing create ACL command, stderr: "ovn-nbctl: no row \"winworker-mff7b\" in table Logical_Switch\n", OVN command '/usr/bin/ovn-nbctl --timeout=15 --id=@acl create acl priority=9999 direction=to-lport match="(ip4.dst == $a5773229689678011375) && ip4.src == $a5811396932658691220 && ip4.dst != 10.128.0.0/14" action=allow external-ids:egressFirewall=test -- add logical_switch winworker-mff7b acls @acl' failed: exit status 1 I1026 10:38:38.106699 1 kube.go:131] Updating status on EgressFirewall default in namespace test hello-pod is located on Linux nodes # oc get pod -n test -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES hello-pod 1/1 Running 0 20h 10.128.2.151 huirwang1026a-46x5j-worker-8pd4l <none> <none> $ oc rsh -n test hello-pod / # curl -I www.google.com HTTP/1.1 200 OK Content-Type: text/html; charset=ISO-8859-1 P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info." Date: Wed, 27 Oct 2021 06:11:51 GMT Server: gws X-XSS-Protection: 0 X-Frame-Options: SAMEORIGIN Transfer-Encoding: chunked Expires: Wed, 27 Oct 2021 06:11:51 GMT Cache-Control: private Set-Cookie: 1P_JAR=2021-10-27-06; expires=Fri, 26-Nov-2021 06:11:51 GMT; path=/; domain=.google.com; Secure Set-Cookie: NID=511=LCkmCuPBsCzQ4rBD-NJw4t9TW1YslnqffNuY4mFS5xTg5hTBtVT53rlKOeKlTE1anRSM6Pa3-jUt6ML52lBpl_dtql3O8S2kb06U8NKCOKgtOUXKgKDMyL4T--WK7p8aqtz2-JLrJU7kazn6_THsMT2lJM4tceHdZFAuXlaTUK4; expires=Thu, 28-Apr-2022 06:11:51 GMT; path=/; domain=.google.com; HttpOnly This cluster has mixed linux nodes and windows nodes. $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME huirwang1026a-46x5j-master-0 Ready master 22h v1.20.0+bbbc079 172.31.249.90 172.31.249.90 Red Hat Enterprise Linux CoreOS 47.84.202110212231-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.20.5-7.rhaos4.7.gite80c8db.el8 huirwang1026a-46x5j-master-1 Ready master 22h v1.20.0+bbbc079 172.31.249.59 172.31.249.59 Red Hat Enterprise Linux CoreOS 47.84.202110212231-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.20.5-7.rhaos4.7.gite80c8db.el8 huirwang1026a-46x5j-master-2 Ready master 22h v1.20.0+bbbc079 172.31.249.92 172.31.249.92 Red Hat Enterprise Linux CoreOS 47.84.202110212231-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.20.5-7.rhaos4.7.gite80c8db.el8 huirwang1026a-46x5j-worker-8pd4l Ready worker 22h v1.20.0+bbbc079 172.31.249.16 172.31.249.16 Red Hat Enterprise Linux CoreOS 47.84.202110212231-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.20.5-7.rhaos4.7.gite80c8db.el8 huirwang1026a-46x5j-worker-rlj72 Ready worker 22h v1.20.0+bbbc079 172.31.249.22 172.31.249.22 Red Hat Enterprise Linux CoreOS 47.84.202110212231-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.20.5-7.rhaos4.7.gite80c8db.el8 winworker-mff7b Ready worker 22h v1.20.0-1081+d0b1ad449a08b3 172.31.249.219 172.31.249.219 Windows Server Standard 10.0.19041.508 docker://20.10.7 winworker-wz8f5 Ready worker 22h v1.20.0-1081+d0b1ad449a08b3 172.31.249.140 172.31.249.140 Windows Server Standard 10.0.19041.508 docker://20.10.7 Expected results: EgressFirewall can be added successfully in this kind of cluster and works for pods which are located on linux nodes. Additional info: BTW, I didn't reproduce this issue in 4.9 build 4.9.0-0.nightly-2021-10-26-041726 with same flexy profile cluster.