Version: 4.8.0-0.nightly-2021-04-01-213116 4.8.0-0.nightly-2021-04-01-072432 The issue started a few days ago and always reproduces. Upon attempt to deploy SNO, the process doesn't complete. [kni@r640-u09 ~]$ oc get co|grep -v "True.*False.*False"; NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.8.0-0.nightly-2021-04-01-213116 False False True 14h console 4.8.0-0.nightly-2021-04-01-213116 False True True 14h ingress 4.8.0-0.nightly-2021-04-01-213116 True False True 14h [kni@r640-u09 ~]$ oc get pod -n openshift-console NAME READY STATUS RESTARTS AGE console-5787485c6d-4srlh 0/1 Running 46 4h16m console-5f6c5d669b-4pl6w 0/1 Running 46 4h15m downloads-7f8d988d97-pg7db 1/1 Running 0 14h [kni@r640-u09 ~]$ [kni@r640-u09 ~]$ oc logs -n openshift-console console-5787485c6d-4srlh W0402 20:40:20.414454 1 main.go:203] Flag inactivity-timeout is set to less then 300 seconds and will be ignored! I0402 20:40:20.414539 1 main.go:272] cookies are secure! E0402 20:40:25.444163 1 auth.go:231] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.qe3.kni.lab.eng.bos.redhat.com/oauth/token failed: Head "https://oauth-openshift.apps.qe3.kni.lab.eng.bos.redhat.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers) E0402 20:40:40.450173 1 auth.go:231] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.qe3.kni.lab.eng.bos.redhat.com/oauth/token failed: Head "https://oauth-openshift.apps.qe3.kni.lab.eng.bos.redhat.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers) E0402 20:40:55.454198 1 auth.go:231] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.qe3.kni.lab.eng.bos.redhat.com/oauth/token failed: Head "https://oauth-openshift.apps.qe3.kni.lab.eng.bos.redhat.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers) E0402 20:41:10.460107 1 auth.go:231] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.qe3.kni.lab.eng.bos.redhat.com/oauth/token failed: Head "https://oauth-openshift.apps.qe3.kni.lab.eng.bos.redhat.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers) [kni@r640-u09 ~]$ [kni@r640-u09 ~]$ oc exec -n openshift-console console-5787485c6d-4srlh -- timeout 10 curl https://oauth-openshift.apps.qe3.kni.lab.eng.bos.redhat.com/oauth/token -kI % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:08 --:--:-- 0command terminated with exit code 124 [kni@r640-u09 ~]$ In this setup the api,ingress and the node itself have different IPs: api.qe3.kni.lab.eng.bos.redhat.com has IPv6 address 2620:52:0:1386::97 openshift-master-0.qe3.kni.lab.eng.bos.redhat.com has IPv6 address 2620:52:0:1386::91 wildcard.apps.qe3.kni.lab.eng.bos.redhat.com has IPv6 address 2620:52:0:1386::96 This worked fine until a few days ago. [kni@r640-u09 ~]$ oc exec -n openshift-ovn-kubernetes ovs-node-qkv7z -- ip -6 address show dev br-ex 5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet6 2620:52:0:1386::97/121 scope global valid_lft forever preferred_lft forever inet6 2620:52:0:1386::96/121 scope global valid_lft forever preferred_lft forever inet6 2620:52:0:1386::91/121 scope global noprefixroute valid_lft forever preferred_lft forever inet6 fe80::9a03:9bff:fe61:7179/64 scope link noprefixroute valid_lft forever preferred_lft forever [kni@r640-u09 ~]$
Have a setup where api/ingress/node resolve to the same IP and everything works. Seems like this issue doesn't happen on HA (non SNO) cluster
The issue doesn't reproduce with ipv4.
actually on ipv4 used OpenShiftSDN and on ipv6 OVNKubernetes
So, the problem is that pods can not reach the new added ips in the node [kni@r640-u09 ~]$ oc exec -n openshift-ovn-kubernetes ovs-node-qkv7z -- ip -6 address show dev br-ex 5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet6 2620:52:0:1386::97/121 scope global valid_lft forever preferred_lft forever inet6 2620:52:0:1386::96/121 scope global valid_lft forever preferred_lft forever inet6 2620:52:0:1386::91/121 scope global noprefixroute valid_lft forever preferred_lft forever inet6 fe80::9a03:9bff:fe61:7179/64 scope link noprefixroute valid_lft forever preferred_lft forever The pod fails E0406 16:13:53.912561 1 auth.go:231] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.qe3.kni.lab.eng.bos.redhat.com/oauth/token failed: Head "https://oauth-openshift.apps.qe3.kni.lab.eng.bos.redhat.com": dial tcp [2620:52:0:1386::96]:443: i/o timeout (Client.Timeout exceeded while awaiting headers) However, those ips are reachable from outside [kni@r640-u09 ~]$ curl -k -v https://[2620:52:0:1386::96]:443 * Rebuilt URL to: https://[2620:52:0:1386::96]:443/ * Trying 2620:52:0:1386::96... * TCP_NODELAY set * Connected to 2620:52:0:1386::96 (2620:52:0:1386::96) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1
The pod tries to reach the router-internal-defaul pod, that is a pod with host network, but it fails to access it in the new IP openshift-ingress router-default-7c5ff5965d-mfbb4 1/1 Running 0 4d 2620:52:0:1386::91 openshift-master-0.qe3.kni.lab.eng.bos.redhat.com <none> <none> However, it can reach that pod on the original NodeIP [root@openshift-master-0 ~]# crictl ps | grep console 2feb662e438a5 a0a41f9beddd6c92945501b55f31abe2bf301c7faa7178178066f3e80ee79dde 5 hours ago Running console-operator 54 7cb0c0b903a62 [root@openshift-master-0 ~]# crictl exec -it 2feb662e438a5 bash bash-4.4$ curl -k https://2620:52:0:1386::90 bash-4.4$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 3: eth0@if77: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default link/ether 0a:58:68:27:b6:f3 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet6 fd01:0:0:1::22/64 scope global valid_lft forever preferred_lft forever inet6 fe80::858:68ff:fe27:b6f3/64 scope link valid_lft forever preferred_lft forever bash-4.4$ curl -k https://[2620:52:0:1386::91]:443 <html> <head>
To fix this, we need to watch for new IPs added to the host and then update policy routes to redirect traffic into mp0.
@sasha could you help verified this bug?
Version: 4.8.0-0.nightly-2021-04-15-074503 The reported issue doesn't reproduce. oc get pod -n openshift-console NAME READY STATUS RESTARTS AGE console-5bc4546fd8-q4dvr 1/1 Running 1 30m downloads-7bc5989474-qnscs 1/1 Running 0 34m oc rsh -n openshift-console console-5bc4546fd8-q4dvr sh-4.4$ curl [2620:52:0:1386::91]:443 -kv * Rebuilt URL to: [2620:52:0:1386::91]:443/ * Trying 2620:52:0:1386::91... * TCP_NODELAY set * Connected to 2620:52:0:1386::91 (2620:52:0:1386::91) port 443 (#0) > GET / HTTP/1.1 > Host: [2620:52:0:1386::91]:443 > User-Agent: curl/7.61.1 > Accept: */* > * Empty reply from server * Connection #0 to host 2620:52:0:1386::91 left intact curl: (52) Empty reply from server sh-4.4$ curl [2620:52:0:1386::96]:443 -kv * Rebuilt URL to: [2620:52:0:1386::96]:443/ * Trying 2620:52:0:1386::96... * TCP_NODELAY set * Connected to 2620:52:0:1386::96 (2620:52:0:1386::96) port 443 (#0) > GET / HTTP/1.1 > Host: [2620:52:0:1386::96]:443 > User-Agent: curl/7.61.1 > Accept: */* > * Empty reply from server * Connection #0 to host 2620:52:0:1386::96 left intact curl: (52) Empty reply from server sh-4.4$ curl [2620:52:0:1386::97]:443 -kv * Rebuilt URL to: [2620:52:0:1386::97]:443/ * Trying 2620:52:0:1386::97... * TCP_NODELAY set * Connected to 2620:52:0:1386::97 (2620:52:0:1386::97) port 443 (#0) > GET / HTTP/1.1 > Host: [2620:52:0:1386::97]:443 > User-Agent: curl/7.61.1 > Accept: */* > * Empty reply from server * Connection #0 to host 2620:52:0:1386::97 left intact curl: (52) Empty reply from server sh-4.4$
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438