Bug 2091167

Summary: IPsec runtime enabling not work in hypershift
Product: OpenShift Container Platform Reporter: Weibin Liang <weliang>
Component: NetworkingAssignee: Mohamed Mahmoud <mmahmoud>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aaleman, adistefa, cewong, lwan, mifiedle, mmahmoud, sjenning
Version: 4.11Keywords: TestBlocker
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:14:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Weibin Liang 2022-05-27 17:27:10 UTC
Description of problem:
IPsec runtime enabling not work in hypershift

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-05-20-213928

How reproducible:
Every time

Steps to Reproduce:
#### Hypershift
[weliang@weliang bin]$ oc patch networks.operator.openshift.io cluster --type=merge -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipsecConfig":{ }}}}}'
network.operator.openshift.io/cluster patched
[weliang@weliang bin]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-05-20-213928   True        False         122m    Cluster version is 4.11.0-0.nightly-2022-05-20-213928
[weliang@weliang ~]$ oc debug node/ip-10-0-138-246.us-east-2.compute.internal
[root@ip-10-0-138-246 /]# tcpdump  -i br-ex | grep ESP    
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-ex, link-type EN10MB (Ethernet), capture size 262144 bytes


#### OCP dual-stack cluster
[weliang@weliang bin]$ oc patch networks.operator.openshift.io cluster --type=merge -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipsecConfig":{ }}}}}'
network.operator.openshift.io/cluster patched
[weliang@weliang bin]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-05-25-193227   True        False         142m    Cluster version is 4.11.0-0.nightly-2022-05-25-193227
[weliang@weliang bin]$ oc debug node/worker-00.weliang-5272.qe.devcluster.openshift.com
[root@worker-00 /]# tcpdump  -i br-ex | grep ESP
17:22:49.425809 IP worker-00.weliang-5272.qe.devcluster.openshift.com > master-00.weliang-5272.qe.devcluster.openshift.com: ESP(spi=0x8966e58c,seq=0x3e4), length 188
17:22:49.425873 IP worker-00.weliang-5272.qe.devcluster.openshift.com > master-00.weliang-5272.qe.devcluster.openshift.com: ESP(spi=0x8966e58c,seq=0x3e5), length 448
17:22:49.425938 IP master-00.weliang-5272.qe.devcluster.openshift.com > worker-00.weliang-5272.qe.devcluster.openshift.com: ESP(spi=0xdd8582b6,seq=0x447), length 160
17:22:49.426070 IP master-00.weliang-5272.qe.devcluster.openshift.com > worker-00.weliang-5272.qe.devcluster.openshift.com: ESP(spi=0xdd8582b6,seq=0x448), length 160
17:22:49.426108 IP worker-00.weliang-5272.qe.devcluster.openshift.com > master-00.weliang-5272.qe.devcluster.openshift.com: ESP(spi=0x8966e58c,seq=0x3e6), length 124


Actual results:
"tcpdump  -i br-ex | grep ESP" return no packets in hypershift

Expected results:
"tcpdump  -i br-ex | grep ESP" should return packets in hypershift

Additional info:

Comment 1 aaleman 2022-05-27 17:51:19 UTC
This is because the `ovn-keys` init container in the `ovn-ipsec` DS fails due to incorrect rbac:

```
+ kubectl delete --ignore-not-found=true csr/ip-10-0-133-131
Error from server (Forbidden): certificatesigningrequests.certificates.k8s.io "ip-10-0-133-131" is forbidden: User "system:serviceaccount:openshift-ovn-kubernetes:ovn-kubernetes-node" cannot delete resource "certificatesigningrequests" in API group "certificates.k8s.io" at the cluster scope
```

This seems to be caused by https://github.com/openshift/cluster-network-operator/pull/1450 which moved the CSR management permissions from a ClusterRole to a Role. I can see in the above output that the OCP version you used for Hypershift is newer than the one for the dual stack cluster, which explains why the Hypershift cluster has this issue, despite it not being caused by Hypershift itself.

Reassigning this to the networking team.

Comment 3 wang lin 2022-06-01 10:21:22 UTC
yes not Hypershift specific, I hit the same issue in an arm cluster with ipsec enabled. 

ocp version: 4.11.0-0.nightly-arm64-2022-05-31-155531


          ++ hostname
          + kubectl delete --ignore-not-found=true csr/master-02.lwan-38983.qeclusters.arm.eng.rdu2.redhat.com
          Error from server (Forbidden): certificatesigningrequests.certificates.k8s.io "master-02.lwan-38983.qeclusters.arm.eng.rdu2.redhat.com" is forbidden: User "system:serviceaccount:openshift-ovn-kubernetes:ovn-kubernetes-node" cannot delete resource "certificatesigningrequests" in API group "certificates.k8s.io" at the cluster scope
        reason: Error

Comment 6 Weibin Liang 2022-06-06 15:21:02 UTC
Tested and verified in 4.11.0-0.nightly-2022-06-04-014713

[root@weliang-662-9cw5x-worker-a-49wrg /]# tcpdump  -i br-ex | grep ESP
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-ex, link-type EN10MB (Ethernet), capture size 262144 bytes
15:19:35.772883 IP weliang-662-9cw5x-worker-c-vtqbv.c.openshift-qe.internal > weliang-662-9cw5x-worker-a-49wrg.c.openshift-qe.internal: ESP(spi=0xc8b587e4,seq=0x120), length 164
15:19:35.773608 IP weliang-662-9cw5x-worker-a-49wrg.c.openshift-qe.internal > weliang-662-9cw5x-worker-c-vtqbv.c.openshift-qe.internal: ESP(spi=0x95fbb5c3,seq=0x132), length 124
15:19:35.776499 IP weliang-662-9cw5x-worker-a-49wrg.c.openshift-qe.internal > weliang-662-9cw5x-worker-c-vtqbv.c.openshift-qe.internal: ESP(spi=0x95fbb5c3,seq=0x133), length 184
15:19:35.776585 IP weliang-662-9cw5x-worker-a-49wrg.c.openshift-qe.internal > weliang-662-9cw5x-worker-c-vtqbv.c.openshift-qe.internal: ESP(spi=0x95fbb5c3,seq=0x134), length 1432

Comment 7 Mike Fiedler 2022-06-07 13:01:04 UTC
*** Bug 2093393 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2022-08-10 11:14:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069