Bug 2028812 - [ipv6dualstack] SVC conversion from single stack only to RequireDualStack does not work node in OVN dualstack environments
Summary: [ipv6dualstack] SVC conversion from single stack only to RequireDualStack doe...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.z
Assignee: Andreas Karis
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On: 2031926
Blocks: 2031910
TreeView+ depends on / blocked
 
Reported: 2021-12-03 13:01 UTC by zhaozhanqi
Modified: 2022-02-23 20:03 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2031910 (view as bug list)
Environment:
Last Closed: 2022-02-23 20:02:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 872 0 None open Bug 2028812: Modification of ClusterIPs shall trigger svc update 2021-12-13 19:23:50 UTC
Red Hat Product Errata RHSA-2022:0561 0 None None None 2022-02-23 20:03:43 UTC

Description zhaozhanqi 2021-12-03 13:01:52 UTC
Description of problem:
When creating nodeport service on ipv6 dual stack cluster. service cannot be accessed by node ipv6 address with nodeport. 


Version-Release number of selected component (if applicable):
4.10

How reproducible:
always

Steps to Reproduce:
1. Deploy ipv6 dual stack cluster
2. Create test pod and nodePort service

# oc describe svc -n w8re5
Name:                     hello-pod
Namespace:                w8re5
Labels:                   name=hello-pod
Annotations:              <none>
Selector:                 name=hello-pod
Type:                     NodePort
IP Family Policy:         RequireDualStack
IP Families:              IPv4,IPv6
IP:                       172.30.30.65
IPs:                      172.30.30.65,fd02::a052
Port:                     http  27017/TCP
TargetPort:               8080/TCP
NodePort:                 http  30000/TCP
Endpoints:                10.128.2.72:8080
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>


3. access the service

####service ipv6 address   ---> works

# curl [fd02::a052]:27017
Hello OpenShift!

####access node ipv4 address --> works

# curl 10.73.116.62:30000
Hello OpenShift!

####access node ipv6 address --> NOT works

 curl --connect-timeout 5 [2620:52:0:4974:fc6c:8b36:e62e:7e06]:30000

^C

4. Check no ip6tables rule 

# ip6tables-save | grep NODEPORT
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT


#####I can see iptalbes for v4

# iptables-save | grep OVN-KUBE-NODEPORT
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT
-A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30000 -j DNAT --to-destination 172.30.30.65:27017

Actual results:


Expected results:


Additional info:

Comment 4 Nadia Pinaeva 2021-12-07 15:32:25 UTC
I wasn't able to reproduce it neither on 4.10 nor on 4.9.8 versions, so can you please give me more info about how many times and how exactly you reproduced it?

Comment 7 zhaozhanqi 2021-12-08 10:23:14 UTC
Thanks Andreas Karis and Nadia Pinaeva for help checking this issue. 

So if migration from ipv4 to dual stack cluster , we need to reboot the worker to make the feature work. 

Move this bug to close

Comment 9 Andreas Karis 2021-12-08 15:44:30 UTC
i) The next time that this occurs, it would be great to be able to strace the ovnkube process:
~~~
# connect to affected node, then:
toolbox
strace -f -tt -s1024 -o ovnkube-strace.txt -p $(pidof ovnkube)
~~~

Then, delete and recreate the service.

That will help to figure out if the issue is with ovnkube or with something underlying, on the OS.

The IP tables rules to be configured should show up in the strace, for example from my lab:
~~~
[root@openshift-worker-1 /]# grep NODEPORT ovnkube-strace.txt  | tail -1
3105960 14:14:36.066761 execve("/sbin/ip6tables", ["/sbin/ip6tables", "-t", "nat", "-I", "OVN-KUBE-NODEPORT", "1", "-p", "TCP", "-m", "addrtype", "--dst-type", "LOCAL", "--dport", "30000", "-j", "DNAT", "--to-destination", "[fd02::27b9]:27017", "--wait"], 0x7ffe1cc7c558 /* 35 vars */ <unfinished ...>
~~~

ii) Also, if you can consistently (with every redeployment) recreate the issue, it might be good to try this here **before** migrating from IPv4 only to dualstack:
https://github.com/openshift/cluster-network-operator/wiki/How-to-increase-the-log-level-on-a-single-SDN-or-OVN-node

Something like this, ideally add all of your worker nodes to the configmap and delete all of the worker node ovnkube-node pods. The following example is just for a single pod:
~~~
[root@openshift-jumpserver-0 ~]# cat overrides.yaml 
kind: ConfigMap
apiVersion: v1
metadata:
  name: env-overrides
  namespace: openshift-ovn-kubernetes
data:
  openshift-worker-1: |
    # This sets the log level for the ovn-kubernetes node process:
    OVN_KUBE_LOG_LEVEL=5
[root@openshift-jumpserver-0 ~]# oc apply -f overrides.yaml 
configmap/env-overrides created
[root@openshift-jumpserver-0 ~]# oc delete pod -n openshift-ovn-kubernetes --field-selector spec.nodeName=openshift-worker-1 -l app=ovnkube-node
pod "ovnkube-node-8lkd7" deleted
~~~

That will increase the log level a notch and should help us understand better where this is failing, for example from my lab:
~~~
$ oc logs -f -n openshift-ovn-kubernetes ovnkube-node-vg55p --tail=0 -c ovnkube-node
(...)
I1208 15:37:38.996246 3343454 port_claim.go:40] Opening socket for service: test/netshoot-service, port: 30000 and protocol TCP
I1208 15:37:38.996255 3343454 port_claim.go:63] Opening socket for LocalPort "nodePort for test/netshoot-service:http" (:30000/tcp)
I1208 15:37:38.996350 3343454 healthcheck.go:253] Gateway OpenFlow sync requested
I1208 15:37:38.996370 3343454 gateway_iptables.go:45] Adding rule in table: nat, chain: OVN-KUBE-NODEPORT with args: "-p TCP -m addrtype --dst-type LOCAL --dport 30000 -j DNAT --to-destination 172.30.239.167:27017" for protocol: 0 
I1208 15:37:38.996476 3343454 ovs.go:209] exec(164): /usr/bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows br-ex -
I1208 15:37:39.001098 3343454 gateway_iptables.go:48] Chain: "nat" in table: "OVN-KUBE-NODEPORT" already exists, skipping creation
I1208 15:37:39.010805 3343454 gateway_iptables.go:45] Adding rule in table: nat, chain: OVN-KUBE-NODEPORT with args: "-p TCP -m addrtype --dst-type LOCAL --dport 30000 -j DNAT --to-destination [fd02::82b5]:27017" for protocol: 1 
I1208 15:37:39.015130 3343454 gateway_iptables.go:48] Chain: "nat" in table: "OVN-KUBE-NODEPORT" already exists, skipping creation
I1208 15:37:39.023286 3343454 ovs.go:212] exec(164): stdout: ""
I1208 15:37:39.023310 3343454 ovs.go:213] exec(164): stderr: ""
(...)
~~~

Then, you can migrate the environment from IPv4 to dualstack, and if the issue then reproduces with a pod, we will be able to see a whole lot more in the logs.

Alternatively, you might try ii) after you reproduced the issue, but I'm not sure if deleting the pod would resolve the issue or not and thus affect the test case.

Comment 12 Andreas Karis 2021-12-09 10:24:09 UTC
I ran a whole series of tests (see attached file) by first installing 4.9.8, then deploying a svc + pod single stack, then upgrading the cluster to dual-stack, then deploying a svc single stack + pod dual stack on a non-rebooted host, then rebooting the other host, deploying a svc single stack + pod dual stack on the rebooted host, editing the svc definition of that latter svc to requiredualstack, and last rebooting the worker node that this last pod was running on. I think it all just boils down to this issue: 

In a dual-stack environment, a service is deployed as singlestack (without the RequireDualStack annotation) in front of a dualstack pod. The service definition is then changed to RequireDualStack.
However, the service remains as SingleStack, OVN-Kubernetes does not update the service's NAT rule (and potentially further required OVN rules). A node reboot fixes this ; actually, a simple deletion of the ovn-kube pod fixes this, too.

Here is the relevant snippet:
~~~
[root@openshift-jumpserver-0 ~]# oc adm cordon openshift-worker-2
node/openshift-worker-2 cordoned
[root@openshift-jumpserver-0 ~]# oc get pods -o wide
NAME                                      READY   STATUS        RESTARTS   AGE     IP            NODE                 NOMINATED NODE   READINESS GATES
netshoot-deployment-5b5567c8bc-mdxzx      1/1     Running       0          26m     172.24.2.13   openshift-worker-2   <none>           <none>
netshoot1-deployment-7fb4556ffc-hzwq6     1/1     Running       0          8m54s   172.24.2.11   openshift-worker-2   <none>           <none>
netshoot2-deployment-84944dd847-6xq7n     1/1     Terminating   0          2m25s   172.24.2.16   openshift-worker-2   <none>           <none>
nfs-client-provisioner-5bfc67c85d-qwpqp   1/1     Running       0          16h     172.24.2.10   openshift-worker-2   <none>           <none>
[root@openshift-jumpserver-0 ~]# oc apply -f netshoot2.yaml 
service/netshoot2-service created
deployment.apps/netshoot2-deployment created
[root@openshift-jumpserver-0 ~]# oc get pods
NAME                                      READY   STATUS              RESTARTS   AGE
netshoot-deployment-5b5567c8bc-mdxzx      1/1     Running             0          26m
netshoot1-deployment-7fb4556ffc-hzwq6     1/1     Running             0          9m23s
netshoot2-deployment-84944dd847-lht7s     0/1     ContainerCreating   0          3s
nfs-client-provisioner-5bfc67c85d-qwpqp   1/1     Running             0          16h
[root@openshift-jumpserver-0 ~]# oc get pods -o wide
NAME                                      READY   STATUS              RESTARTS   AGE     IP            NODE                 NOMINATED NODE   READINESS GATES
netshoot-deployment-5b5567c8bc-mdxzx      1/1     Running             0          26m     172.24.2.13   openshift-worker-2   <none>           <none>
netshoot1-deployment-7fb4556ffc-hzwq6     1/1     Running             0          9m25s   172.24.2.11   openshift-worker-2   <none>           <none>
netshoot2-deployment-84944dd847-lht7s     0/1     ContainerCreating   0          5s      <none>        openshift-worker-1   <none>           <none>
nfs-client-provisioner-5bfc67c85d-qwpqp   1/1     Running             0          16h     172.24.2.10   openshift-worker-2   <none>           <none>
[root@openshift-jumpserver-0 ~]# oc get svc netshoot2
Error from server (NotFound): services "netshoot2" not found
[root@openshift-jumpserver-0 ~]# oc get svc
NAME                TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)           AGE
netshoot-service    NodePort   172.30.104.231   <none>        27017:30000/TCP   26m
netshoot1-service   NodePort   172.30.141.198   <none>        27018:30001/TCP   9m16s
netshoot2-service   NodePort   172.30.236.163   <none>        27019:30002/TCP   18s
[root@openshift-jumpserver-0 ~]# oc get -o yaml svc netshoot2-service
apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"netshoot2-service","namespace":"openshift-nfs-storage"},"spec":{"ports":[{"name":"http","nodePort":30002,"port":27019,"protocol":"TCP","targetPort":8080}],"selector":{"app":"netshoot2-pod"},"type":"NodePort"}}
  creationTimestamp: "2021-12-09T10:06:22Z"
  name: netshoot2-service
  namespace: openshift-nfs-storage
  resourceVersion: "324974"
  uid: adba8f90-d8a7-4676-8233-ff51a56b1f2f
spec:
  clusterIP: 172.30.236.163
  clusterIPs:
  - 172.30.236.163
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    nodePort: 30002
    port: 27019
    protocol: TCP
    targetPort: 8080
  selector:
    app: netshoot2-pod
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}
[root@openshift-jumpserver-0 ~]# ssh core@openshift-worker-1
Red Hat Enterprise Linux CoreOS 49.84.202111111343-0
  Part of OpenShift 4.9, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.9/architecture/architecture-rhcos.html

---
Last login: Thu Dec  9 09:59:58 2021 from 192.168.123.1
sudo -i
[systemd]
Failed Units: 1
  NetworkManager-wait-online.service
[core@openshift-worker-1 ~]$ sudo -i
[systemd]
Failed Units: 1
  NetworkManager-wait-online.service
[root@openshift-worker-1 ~]# iptables-save | grep NODE
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT
-A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30002 -j DNAT --to-destination 172.30.236.163:27019
-A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30000 -j DNAT --to-destination 172.30.104.231:27017
-A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30001 -j DNAT --to-destination 172.30.141.198:27018
[root@openshift-worker-1 ~]# ip6tables-save | grep NODE
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT
[root@openshift-worker-1 ~]# exit
logout
ex[core@openshift-worker-1 ~]$ exit
logout
Connection to openshift-worker-1 closed.
[root@openshift-jumpserver-0 ~]# oc ^C
(reverse-i-search)`get s': oc ^Ct svc
[root@openshift-jumpserver-0 ~]# oc edit svc netshoot2-service
service/netshoot2-service edited
[root@openshift-jumpserver-0 ~]# oc get -o yaml svc netshoot2-service
apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"netshoot2-service","namespace":"openshift-nfs-storage"},"spec":{"ports":[{"name":"http","nodePort":30002,"port":27019,"protocol":"TCP","targetPort":8080}],"selector":{"app":"netshoot2-pod"},"type":"NodePort"}}
  creationTimestamp: "2021-12-09T10:06:22Z"
  name: netshoot2-service
  namespace: openshift-nfs-storage
  resourceVersion: "325357"
  uid: adba8f90-d8a7-4676-8233-ff51a56b1f2f
spec:
  clusterIP: 172.30.236.163
  clusterIPs:
  - 172.30.236.163
  - fd02::7e32
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  - IPv6
  ipFamilyPolicy: RequireDualStack
  ports:
  - name: http
    nodePort: 30002
    port: 27019
    protocol: TCP
    targetPort: 8080
  selector:
    app: netshoot2-pod
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}
[root@openshift-jumpserver-0 ~]# ssh core@openshift-worker-1
Red Hat Enterprise Linux CoreOS 49.84.202111111343-0
  Part of OpenShift 4.9, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.9/architecture/architecture-rhcos.html

---
Last login: Thu Dec  9 10:06:55 2021 from 192.168.123.1
s[systemd]
Failed Units: 1
  NetworkManager-wait-online.service
[core@openshift-worker-1 ~]$ sudo -i
[systemd]
Failed Units: 1
  NetworkManager-wait-online.service
[root@openshift-worker-1 ~]# ip6tables-save | grep NODE
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT
[root@openshift-worker-1 ~]# iptables-save | grep NODE
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT
-A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30002 -j DNAT --to-destination 172.30.236.163:27019
-A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30000 -j DNAT --to-destination 172.30.104.231:27017
-A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30001 -j DNAT --to-destination 172.30.141.198:27018
[root@openshift-worker-1 ~]# ip6tables-save | grep NODE
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT
(...)
[root@openshift-worker-1 ~]# reboot
Connection to openshift-worker-1 closed by remote host.
Connection to openshift-worker-1 closed.
[root@openshift-jumpserver-0 ~]# 
[root@openshift-jumpserver-0 ~]# 
[root@openshift-jumpserver-0 ~]# ssh core@openshift-worker-1
Red Hat Enterprise Linux CoreOS 49.84.202111111343-0
  Part of OpenShift 4.9, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.9/architecture/architecture-rhcos.html

---
Last login: Thu Dec  9 10:10:40 2021 from 192.168.123.1
sud [systemd]
Failed Units: 1
  NetworkManager-wait-online.service
[core@openshift-worker-1 ~]$ sudo -i
[systemd]
Failed Units: 1
  NetworkManager-wait-online.service
[root@openshift-worker-1 ~]# ip6tables-save | grep NODE
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT
-A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30002 -j DNAT --to-destination [fd02::7e32]:27019
[root@openshift-worker-1 ~]# 
~~~


Here's the deletion of the ovn kube pod:
~~~
[root@openshift-jumpserver-0 ~]# oc edit svc netshoot1-service
service/netshoot1-service edited
[root@openshift-jumpserver-0 ~]# oc get pods -n openshift-ovn-kubernetes -o wide
NAME                   READY   STATUS    RESTARTS   AGE   IP                NODE                 NOMINATED NODE   READINESS GATES
ovnkube-master-ftd8t   6/6     Running   0          36m   192.168.123.201   openshift-master-1   <none>           <none>
ovnkube-master-h9dk9   6/6     Running   0          38m   192.168.123.200   openshift-master-0   <none>           <none>
ovnkube-master-pmlbp   6/6     Running   0          34m   192.168.123.202   openshift-master-2   <none>           <none>
ovnkube-node-cx8kf     4/4     Running   0          31m   192.168.123.222   openshift-worker-2   <none>           <none>
ovnkube-node-jdpdq     4/4     Running   10         31m   192.168.123.221   openshift-worker-1   <none>           <none>
ovnkube-node-lmv7l     4/4     Running   0          31m   192.168.123.202   openshift-master-2   <none>           <none>
ovnkube-node-nrcfs     4/4     Running   0          32m   192.168.123.201   openshift-master-1   <none>           <none>
ovnkube-node-rvkzh     4/4     Running   0          32m   192.168.123.200   openshift-master-0   <none>           <none>
[root@openshift-jumpserver-0 ~]# oc delete pod -n openshift-ovn-kubernetes ovnkube-node-cx8kf
pod "ovnkube-node-cx8kf" deleted
[root@openshift-jumpserver-0 ~]# oc get pods -n openshift-ovn-kubernetes -o wide
NAME                   READY   STATUS    RESTARTS   AGE   IP                NODE                 NOMINATED NODE   READINESS GATES
ovnkube-master-ftd8t   6/6     Running   0          36m   192.168.123.201   openshift-master-1   <none>           <none>
ovnkube-master-h9dk9   6/6     Running   0          39m   192.168.123.200   openshift-master-0   <none>           <none>
ovnkube-master-pmlbp   6/6     Running   0          34m   192.168.123.202   openshift-master-2   <none>           <none>
ovnkube-node-jdpdq     4/4     Running   10         32m   192.168.123.221   openshift-worker-1   <none>           <none>
ovnkube-node-lmv7l     4/4     Running   0          32m   192.168.123.202   openshift-master-2   <none>           <none>
ovnkube-node-mq6zd     3/4     Running   0          3s    192.168.123.222   openshift-worker-2   <none>           <none>
ovnkube-node-nrcfs     4/4     Running   0          33m   192.168.123.201   openshift-master-1   <none>           <none>
ovnkube-node-rvkzh     4/4     Running   0          32m   192.168.123.200   openshift-master-0   <none>           <none>
[root@openshift-jumpserver-0 ~]# oc get -o yaml svc netshoot2-service
apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"netshoot2-service","namespace":"openshift-nfs-storage"},"spec":{"ports":[{"name":"http","nodePort":30002,"port":27019,"protocol":"TCP","targetPort":8080}],"selector":{"app":"netshoot2-pod"},"type":"NodePort"}}
  creationTimestamp: "2021-12-09T10:06:22Z"
  name: netshoot2-service
  namespace: openshift-nfs-storage
  resourceVersion: "325357"
  uid: adba8f90-d8a7-4676-8233-ff51a56b1f2f
spec:
  clusterIP: 172.30.236.163
  clusterIPs:
  - 172.30.236.163
  - fd02::7e32
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  - IPv6
  ipFamilyPolicy: RequireDualStack
  ports:
  - name: http
    nodePort: 30002
    port: 27019
    protocol: TCP
    targetPort: 8080
  selector:
    app: netshoot2-pod
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}
~~~

And here is the before / after on that worker node:
~~~
[root@openshift-worker-2 ~]# ip6tables-save | grep NODE
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT
[root@openshift-worker-2 ~]# ip6tables-save | grep NODE
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT
-A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30002 -j DNAT --to-destination [fd02::7e32]:27019
-A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30001 -j DNAT --to-destination [fd02::e5e6]:27018
[root@openshift-worker-2 ~]# 
~~~

Our documentation states that pods must be redeployed after conversion: https://docs.openshift.com/container-platform/4.9/networking/ovn_kubernetes_network_provider/converting-to-dual-stack.html

It says nothing about the conversion of services from single-stack to dual-stack. Either the admission controller should reject such a conversion, or the conversion should work without requiring a pod restart.

Next, I will have a look at the current upstream behavior for the conversion of a service from single-stack to require dual stack.

Comment 15 Andreas Karis 2022-01-24 20:57:20 UTC
Verification:
=========================

Create an ovn-kubernetes dualstack cluster (baremetal platforms only) with gateway mode shared (the default for OCP 4.9 and 4.10).

Create a singlestack nginx service and deployment, e.g.:
~~~
cat <<'EOF' > nginx.yaml 
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  ipFamilyPolicy: SingleStack
  type: NodePort
  selector:
    app: nginx-pod
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ports:
      # By default and for convenience, the `targetPort` is set to the same value as the `port` field.
    - port: 27017
      targetPort: 80
      # Optional field
      # By default and for convenience, the Kubernetes control plane will allocate a port from a range (default: 30000-32767)
      nodePort: 30000
      name: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-pod
  template:
    metadata:
      labels:
        app: nginx-pod
    spec:
      containers:
      - name: nginx
        image: nginx
        imagePullPolicy: Always
EOF
oc apply -f nginx.yaml
~~~

Wait until the pod and service are deployed.
~~~
oc get pod,svc
~~~

Next, edit the service:
~~~
oc edit svc nginx-service
~~~

And change the spec.ipFamilyPolicy:
~~~
spec:
(...)
     ipFamilyPolicy: RequireDualStack
(...)
~~~

The service should update now to DualStack.

Get the pod's node:
~~~
oc get pods -l app=nginx-pod -o wide
~~~

Verification part 1):

Connect to the node:
~~~
oc debug node/<name>
chroot /host
~~~

And verify for IPv4:
~~~
iptables-save  | grep NODE
~~~

And verify for IPv6:
~~~
ip6tables-save  | grep NODE
~~~

Failure:
Only for the IPv4 protocol, there is an IPv6 iptables rule with port 30000.
~~~
root@ovn-worker2:/# iptables-save | grep NODE
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT
-A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30000 -j DNAT --to-destination 10.96.153.179:27017
root@ovn-worker2:/# iptables-save | grep NODE
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT
-A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30000 -j DNAT --to-destination 10.96.153.179:27017
root@ovn-worker2:/# ip6tables-save | grep NODE
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT
~~~

Success:
For both protocols, there should be an IPv6 iptables rules with port 30000:
~~~
root@ovn-worker:/# iptables-save  | grep NODE
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT
-A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30000 -j DNAT --to-destination 10.96.237.240:27017
root@ovn-worker:/# ip6tables-save  | grep NODE
:OVN-KUBE-NODEPORT - [0:0]
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-NODEPORT
-A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30000 -j DNAT --to-destination [fd00:10:96::1c77]:27017
~~~

Verification part 2):

Connect to the node:
~~~
oc debug node/<name>
chroot /host
~~~

Get the node's IP and IPv6 address:
~~~
ip a ls dev br-ex       # e.g., this might yield:  172.18.0.2
ip -6 a ls dev br-ex    # e.g., this might yield:  fc00:f853:ccd:e793::2
~~~

Query port 30000 for the IP/IPv6 address obtained earlier, from the node itself and also from another node on the machinenetwork:
~~~
curl --max-time 10  172.18.0.2:30000
curl --max-time 10 [fc00:f853:ccd:e793::2]:30000
~~~

Failure: 
curl works only for IPv4, but times out for IPv6:
~~~
root@ovn-worker2:/# curl --connect-timeout 10  172.18.0.2:30000
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
root@ovn-worker2:/# curl --max-time 10 [fc00:f853:ccd:e793::2]:30000
curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received
~~~

Success:
curl works for both protocols:
~~~
[root@ovn-worker:/# curl --max-time 10  172.18.0.4:30000
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
root@ovn-worker:/# curl --max-time 10 [fc00:f853:ccd:e793::4]:30000
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
~~~

Comment 21 errata-xmlrpc 2022-02-23 20:02:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.22 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0561


Note You need to log in before you can comment on or make changes to this bug.