Bug 2060956

Summary: service domain can't be resolved when networkpolicy is used in OCP 4.10-rc
Product: OpenShift Container Platform Reporter: Ben Bennett <bbennett>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: anbhat, aos-bugs, cblecker, danw, dofinn, Jiaming.Hu, jtanenba, mifiedle, mmasters, piotr.godowski, rszumski, travi, wking, zzhao
Version: 4.10-rc3Keywords: FastFix
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2060553 Environment:
Last Closed: 2022-03-10 16:44:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 2060553    
Bug Blocks:    

Comment 1 Mike Fiedler 2022-03-04 21:07:17 UTC
Verified on cluster-bot cluster built from this PR using the reproducer in https://bugzilla.redhat.com/show_bug.cgi?id=2060553#c13.   Service domain resolvable with the patch.

Comment 5 W. Trevor King 2022-03-07 21:20:57 UTC
Moving back to ASSIGNED as we wait for a 4.10 backport of the follow-up fixes [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2060553#c29

Comment 6 W. Trevor King 2022-03-07 22:51:57 UTC
With a 'launch 4.10.2,openshift/sdn#407' cluster [1]:

  $ oc get -o jsonpath='{.status.desired.version}{"\n"}' clusterversion version
  0.0.1-0.test-2022-03-07-220936-ci-ln-dikvdft-latest

Create to matchLabels policies:

  $ cat <<EOF >policies.yaml 
  > kind: NetworkPolicy
  > apiVersion: networking.k8s.io/v1
  > metadata:
  >   name: irrelevant-egress
  > spec:
  >   podSelector:
  >     matchLabels:
  >       kind: nonexistent
  >   egress:
  >     - {}
  >   policyTypes:
  >     - Egress
  > ---
  > kind: NetworkPolicy
  > apiVersion: networking.k8s.io/v1
  > metadata:
  >   name: allow-client-to-server
  > spec:
  >   podSelector:
  >     matchLabels:
  >       kind: server
  >   ingress:
  >     - from:
  >       podSelector:
  >         matchLabels:
  >           kind: client
  >   policyTypes:
  >     - Ingress
  > EOF

Create a pod that matches the ingress policy:

  $ cat <<EOF >pod.yaml
  > apiVersion: v1
  > kind: Pod
  > metadata:
  >   name: server
  >   labels:
  >     kind: server
  > spec:
  >   containers:
  >   - name: main
  >     args:
  >     - sleep
  >     - "600"
  >     image: quay.io/openshift/origin-tools
  > EOF

Check the pod:

  $ oc debug --as-root pod/server
  Starting pod/server-debug ...
  Pod IP: 10.131.0.17
  If you don't see a command prompt, try pressing enter.
  sh-4.4# dig kubernetes.default

  ; <<>> DiG 9.11.26-RedHat-9.11.26-4.el8_4 <<>> kubernetes.default
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 36293
  ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

  ;; OPT PSEUDOSECTION:
  ; EDNS: version: 0, flags:; udp: 512
  ; COOKIE: e837d6036d355787 (echoed)
  ;; QUESTION SECTION:
  ;kubernetes.default.            IN      A

  ;; AUTHORITY SECTION:
  .                       30      IN      SOA     a.root-servers.net. nstld.verisign-grs.com. 2022030701 1800 900 604800 86400

  ;; Query time: 65 msec
  ;; SERVER: 172.30.0.10#53(172.30.0.10)
  ;; WHEN: Mon Mar 07 22:48:34 UTC 2022
  ;; MSG SIZE  rcvd: 134

Not sure that's what we expect or not...

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1500954685442363392

Comment 7 zhaozhanqi 2022-03-08 01:59:59 UTC
Reproduced this issue on 4.10.2

1. Create test pods with different label

$ oc get pod --show-labels
NAME            READY   STATUS    RESTARTS   AGE   LABELS
test-rc-2kvws   1/1     Running   0          25m   name=hellosdn
test-rc-8j8xq   1/1     Running   0          24m   name=test-pods
test-rc-9w8x8   1/1     Running   0          25m   name=hellosdn
test-rc-bdxpx   1/1     Running   0          24m   name=test-pods

2. Create networkpolicy with podSelector is test-pods for Egress

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: egress-all-otherpod
spec:
  podSelector:
    matchLabels:
      name: test-pods
  egress:
    - {}
  ingress:
    - {}
  policyTypes:
    - Egress
    - Ingres

3. Create networkpolicy with podSelector is hellosdn for ingress

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-all-ingress
spec:
  podSelector:
    matchLabels:
      name: hellosdn
  ingress:
    - {}
  policyTypes:
    - Ingress

4. Check the networkpolicy

 oc get networkpolicy
NAME                  POD-SELECTOR     AGE
allow-all-ingress     name=hellosdn    8s
egress-all-otherpod   name=test-pods   32m

5. So now pods with hellosdn cannot access

####hellosdn pods cannot resolve dns########
$ oc exec test-rc-2kvws -- dig kubernetes.default

; <<>> DiG 9.16.20 <<>> kubernetes.default
;; global options: +cmd
;; connection timed out; no servers could be reached

command terminated with exit code 9


#####test-pods works well #######
$ oc exec test-rc-8j8xq -- dig kubernetes.default

; <<>> DiG 9.16.20 <<>> kubernetes.default
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 46936
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; COOKIE: 0db9c31cc3ffa551 (echoed)
;; QUESTION SECTION:
;kubernetes.default.		IN	A

;; Query time: 1 msec
;; SERVER: 172.30.0.10#53(172.30.0.10)
;; WHEN: Tue Mar 08 01:55:24 UTC 2022
;; MSG SIZE  rcvd: 59

Comment 9 zhaozhanqi 2022-03-08 05:56:48 UTC
Verified this bug on 4.10.3


# oc get pod --show-labels
NAME            READY   STATUS    RESTARTS   AGE     LABELS
test-rc-7fsr4   1/1     Running   0          6m14s   name=hellosdn
test-rc-m2bs8   1/1     Running   0          6m14s   name=test-pods
test-rc-nlfwb   1/1     Running   0          5m14s   name=test-pods
test-rc-xpbgj   1/1     Running   0          5m46s   name=hellosdn2


# oc get networkpolicies.networking.k8s.io -o yaml
apiVersion: v1
items:
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    creationTimestamp: "2022-03-08T05:48:58Z"
    generation: 1
    managedFields:
    - apiVersion: networking.k8s.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:ingress: {}
          f:podSelector: {}
          f:policyTypes: {}
      manager: oc
      operation: Update
      time: "2022-03-08T05:48:58Z"
    name: allow-all-ingress
    namespace: z1
    resourceVersion: "31337"
    uid: a3f12ea5-32ee-4d41-8dfb-35eeea2a6542
  spec:
    ingress:
    - {}
    podSelector:
      matchLabels:
        name: hellosdn
    policyTypes:
    - Ingress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    creationTimestamp: "2022-03-08T05:48:39Z"
    generation: 1
    managedFields:
    - apiVersion: networking.k8s.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:egress: {}
          f:ingress: {}
          f:podSelector: {}
          f:policyTypes: {}
      manager: oc
      operation: Update
      time: "2022-03-08T05:48:39Z"
    name: egress-all-otherpod
    namespace: z1
    resourceVersion: "31239"
    uid: a744a57a-22dd-4cd8-8469-b559f6fdfaaa
  spec:
    egress:
    - {}
    ingress:
    - {}
    podSelector:
      matchLabels:
        name: test-pods
    policyTypes:
    - Egress
    - Ingress
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


#########

# oc exec test-rc-m2bs8 -- dig kubernetes.default

; <<>> DiG 9.16.20 <<>> kubernetes.default
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 36492
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; COOKIE: 9c2fceed1fc94c5c (echoed)
;; QUESTION SECTION:
;kubernetes.default.		IN	A

;; AUTHORITY SECTION:
.			30	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2022030800 1800 900 604800 86400

;; Query time: 1 msec
;; SERVER: 172.30.0.10#53(172.30.0.10)
;; WHEN: Tue Mar 08 05:49:58 UTC 2022
;; MSG SIZE  rcvd: 134

#########

# oc exec test-rc-7fsr4 -- dig kubernetes.default

; <<>> DiG 9.16.20 <<>> kubernetes.default
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51943
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; COOKIE: 75f73f3df786462a (echoed)
;; QUESTION SECTION:
;kubernetes.default.		IN	A

;; AUTHORITY SECTION:
.			30	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2022030800 1800 900 604800 86400

;; Query time: 1 msec
;; SERVER: 172.30.0.10#53(172.30.0.10)
;; WHEN: Tue Mar 08 05:49:35 UTC 2022
;; MSG SIZE  rcvd: 134


######

# oc exec test-rc-xpbgj -- dig kubernetes.default

; <<>> DiG 9.16.20 <<>> kubernetes.default
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 25641
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; COOKIE: 8140d212fa65889e (echoed)
;; QUESTION SECTION:
;kubernetes.default.		IN	A

;; Query time: 0 msec
;; SERVER: 172.30.0.10#53(172.30.0.10)
;; WHEN: Tue Mar 08 05:51:10 UTC 2022
;; MSG SIZE  rcvd: 59

Comment 11 errata-xmlrpc 2022-03-10 16:44:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056