Description of problem: The 4.9.22 and 4.9.23 installer never completes due to the ingress cluster operator is in a degraded state due to a canary timeout. I can install 4.8.4 and any version prior in this environment with no issues but upgrading to 4.8.32 shows the same issue and fails upgrade as well. Version-Release number of selected component (if applicable): Observed succeeding on 4.8.4 Observed failing on 4.8.32, 4.9.22, and 4.9.23 How reproducible: Every install Steps to Reproduce: 1. Install or upgrade to 4.8.32 or above Actual results: Install eventually fails with all nodes up and running but some cluster operators degraded due to ingress canary timeouts ``` [ddreggors@provisioner ~]$ oc get co|awk '/NAME/||$3~/False/||$4~/True/||$5~/True/' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE console 4.9.23 False True False 17h DeploymentAvailable: 0 replicas available for console deployment... ingress 4.9.23 True False True 16h The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing) ``` Expected results: Install completes and ingress is healthy with no canary timeouts Additional info: I have rsh'ed into the ingress controller and ran curl tests from this pod to the URL that is failing. The curl test responds with `200` (ok) but seems to take `12s` when negotiating TLS ``` [ddreggors@provisioner ~]$ oc logs ingress-operator-bbffddb96-dxl6q ingress-operator 2>&1|tail -n 5 2022-03-09T14:18:39.632Z ERROR operator.ingress_controller controller/controller.go:298 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"} 2022-03-09T14:19:02.997Z ERROR operator.canary_controller wait/wait.go:155 error performing canary route check {"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.ocp4.teklocal.net\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"} 2022-03-09T14:19:39.633Z INFO operator.ingress_controller controller/controller.go:298 reconciling {"request": "openshift-ingress-operator/default"} 2022-03-09T14:19:39.965Z ERROR operator.ingress_controller controller/controller.go:298 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"} 2022-03-09T14:20:13.058Z ERROR operator.canary_controller wait/wait.go:155 error performing canary route check {"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.ocp4.teklocal.net\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"} ``` ``` sh-4.4$ time curl -k -v https://canary-openshift-ingress-canary.apps.ocp4.teklocal.net/ * Trying 192.168.4.121... * TCP_NODELAY set * Connected to canary-openshift-ingress-canary.apps.ocp4.teklocal.net (192.168.4.121) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Finished (20): * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.3 (OUT), TLS handshake, [no content] (0): * TLSv1.3 (OUT), TLS handshake, Finished (20): * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 * ALPN, server did not agree to a protocol * Server certificate: * subject: CN=*.apps.ocp4.teklocal.net * start date: Mar 8 21:10:52 2022 GMT * expire date: Mar 7 21:10:53 2024 GMT * issuer: CN=ingress-operator@1646773672 * SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway. * TLSv1.3 (OUT), TLS app data, [no content] (0): > GET / HTTP/1.1 > Host: canary-openshift-ingress-canary.apps.ocp4.teklocal.net > User-Agent: curl/7.61.1 > Accept: */* > * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): * TLSv1.3 (IN), TLS app data, [no content] (0): < HTTP/1.1 200 OK < x-request-port: 8080 < date: Wed, 09 Mar 2022 14:19:26 GMT < content-length: 22 < content-type: text/plain; charset=utf-8 < set-cookie: c6e529a6ab19a530fd4f1cceb91c08a9=d140d4c745044807153297df093c6a57; path=/; HttpOnly; Secure; SameSite=None < cache-control: private < Healthcheck requested * Connection #0 to host canary-openshift-ingress-canary.apps.ocp4.teklocal.net left intact real 0m13.216s user 0m0.023s sys 0m0.025s ```
I have checked the following and all seems ok: 1. DNS entries for API and ingress are present 2. There are no duplicate IPs 3. All worker nodes are up and ready 4. All nodes are on the same network/vlan in vsphere 5. Curl and ping tests from pods show no routing/connectivity issues that I can see
Possibly related to https://access.redhat.com/solutions/5891131 but that was fixed in later versions so maybe a regression?
Moving to routing component since it affects ingress. Please feel free to reassign to us if it's on openshifts-sdn or CNO.
Setting blocker- as this is most likely a configuration issue and shouldn't block the next z-stream release. Looks like you're using vSphere. Are you using OVN or openshift-sdn? Are you using FIPS or any other non-default configuration option? Does the Curl output show where the delay is? Timestamps might help; for example, if you have the moreutils package installed, you can use `curl -k -v https://canary-openshift-ingress-canary.apps.ocp4.teklocal.net/ &| ts` to add timestamps to Curl's output. Otherwise, a packet capture might be needed to diagnose the issue.
(In reply to Miciah Dashiel Butler Masters from comment #4) > Setting blocker- as this is most likely a configuration issue and shouldn't > block the next z-stream release. > > Looks like you're using vSphere. Yes vSphere > Are you using OVN or openshift-sdn? I have tried both at different times/install attempts but I am currently back on default (SDN) in latest install attempt > Are you using FIPS or any other non-default configuration option? This is a vanilla install-config created by `openshift-install create install-config` so no non-default configuration. > > Does the Curl output show where the delay is? Timestamps might help; for > example, if you have the moreutils package installed, you can use `curl -k > -v https://canary-openshift-ingress-canary.apps.ocp4.teklocal.net/ &| ts` to > add timestamps to Curl's output. > > Otherwise, a packet capture might be needed to diagnose the issue. I cannot test the curl command as you gave it as the ts command is not installed on the controller pod. The curl command I can run from this pod is posted in the description.
(In reply to Miciah Dashiel Butler Masters from comment #4) Best I can offer for timestamps is the following given the lack of installed tools in the pod... I start with a date command, then follow with a timed curl command using `--trace-time` to prepend timestamps ``` sh-4.4$ date; time curl -k -v https://canary-openshift-ingress-canary.apps.ocp4.teklocal.net/ --trace-time Wed Mar 9 20:43:17 UTC 2022 20:43:30.093187 * Trying 192.168.4.121... 20:43:30.093445 * TCP_NODELAY set 20:43:30.094541 * Connected to canary-openshift-ingress-canary.apps.ocp4.teklocal.net (192.168.4.121) port 443 (#0) 20:43:30.096582 * ALPN, offering h2 20:43:30.098084 * ALPN, offering http/1.1 20:43:30.116902 * successfully set certificate verify locations: 20:43:30.118931 * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none 20:43:30.120037 * TLSv1.3 (OUT), TLS handshake, Client hello (1): 20:43:30.131469 * TLSv1.3 (IN), TLS handshake, Server hello (2): 20:43:30.136428 * TLSv1.3 (IN), TLS handshake, [no content] (0): 20:43:30.136808 * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): 20:43:30.136955 * TLSv1.3 (IN), TLS handshake, [no content] (0): 20:43:30.137159 * TLSv1.3 (IN), TLS handshake, Certificate (11): 20:43:30.138263 * TLSv1.3 (IN), TLS handshake, [no content] (0): 20:43:30.138784 * TLSv1.3 (IN), TLS handshake, CERT verify (15): 20:43:30.139270 * TLSv1.3 (IN), TLS handshake, [no content] (0): 20:43:30.139808 * TLSv1.3 (IN), TLS handshake, Finished (20): 20:43:30.139963 * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): 20:43:30.140168 * TLSv1.3 (OUT), TLS handshake, [no content] (0): 20:43:30.140341 * TLSv1.3 (OUT), TLS handshake, Finished (20): 20:43:30.140574 * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 20:43:30.140658 * ALPN, server did not agree to a protocol 20:43:30.140773 * Server certificate: 20:43:30.140879 * subject: CN=*.apps.ocp4.teklocal.net 20:43:30.141391 * start date: Mar 8 21:10:52 2022 GMT 20:43:30.141500 * expire date: Mar 7 21:10:53 2024 GMT 20:43:30.141615 * issuer: CN=ingress-operator@1646773672 20:43:30.141790 * SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway. 20:43:30.142042 * TLSv1.3 (OUT), TLS app data, [no content] (0): 20:43:30.142230 > GET / HTTP/1.1 20:43:30.142230 > Host: canary-openshift-ingress-canary.apps.ocp4.teklocal.net 20:43:30.142230 > User-Agent: curl/7.61.1 20:43:30.142230 > Accept: */* 20:43:30.142230 > 20:43:30.142664 * TLSv1.3 (IN), TLS handshake, [no content] (0): 20:43:30.148289 * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): 20:43:30.150040 * TLSv1.3 (IN), TLS handshake, [no content] (0): 20:43:30.150164 * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): 20:43:30.150927 * TLSv1.3 (IN), TLS app data, [no content] (0): 20:43:30.151284 < HTTP/1.1 200 OK 20:43:30.151443 < x-request-port: 8080 20:43:30.151511 < date: Wed, 09 Mar 2022 20:43:30 GMT 20:43:30.151597 < content-length: 22 20:43:30.151681 < content-type: text/plain; charset=utf-8 20:43:30.152110 < set-cookie: c6e529a6ab19a530fd4f1cceb91c08a9=1391f1a9dbcafe076e97d09351a46977; path=/; HttpOnly; Secure; SameSite=None 20:43:30.152435 < cache-control: private 20:43:30.152605 < Healthcheck requested 20:43:30.152736 * Connection #0 to host canary-openshift-ingress-canary.apps.ocp4.teklocal.net left intact real 0m12.437s user 0m0.025s sys 0m0.027s ```
It seems like there is no significant delay from the point where Curl prints the IP address to the completion of the request. The delay could caused by DNS resolution; a slow upstream resolver could cause signfiicant delays, especially with long search paths that are typical in Kubernetes. DNS caching should mitigate this though (on vSphere, entries should be cached for up to 30 seconds). Could you try the following? curl -k -v https://canary-openshift-ingress-canary.apps.ocp4.teklocal.net/ -w 'dnslookup: %{time_namelookup} | connect: %{time_connect} | appconnect: %{time_appconnect} | pretransfer: %{time_pretransfer} | starttransfer: %{time_starttransfer} | total: %{time_total} | size: %{size_download}\n' It might also be useful to get the strace output: strace -ffto /tmp/strace.out curl -k -v https://canary-openshift-ingress-canary.apps.ocp4.teklocal.net/ Then gather the /tmp/strace.out file, and we can try to determine the cause of the delay there.
(In reply to Miciah Dashiel Butler Masters from comment #7) > It seems like there is no significant delay from the point where Curl prints > the IP address to the completion of the request. The delay could caused by > DNS resolution; a slow upstream resolver could cause signfiicant delays, > especially with long search paths that are typical in Kubernetes. DNS > caching should mitigate this though (on vSphere, entries should be cached > for up to 30 seconds). > > Could you try the following? > > curl -k -v > https://canary-openshift-ingress-canary.apps.ocp4.teklocal.net/ -w > 'dnslookup: %{time_namelookup} | connect: %{time_connect} | appconnect: > %{time_appconnect} | pretransfer: %{time_pretransfer} | starttransfer: > %{time_starttransfer} | total: %{time_total} | size: %{size_download}\n' > > It might also be useful to get the strace output: > > strace -ffto /tmp/strace.out curl -k -v > https://canary-openshift-ingress-canary.apps.ocp4.teklocal.net/ > > Then gather the /tmp/strace.out file, and we can try to determine the cause > of the delay there. OK so DNS is the holdup, but it is not using my local DNS to resolve, it appears to be using the internal resolvers and they are NOT resolving the hostname... ``` sh-4.4$ curl -k -v https://canary-openshift-ingress-canary.apps.ocp4.teklocal.net/ -w 'dnslookup: %{time_namelookup} | connect: %{time_connect} | appconnect: %{time_appconnect} | pretransfer: %{time_pretransfer} | starttransfer: %{time_starttransfer} | total: %{time_total} | size: %{size_download}\n' * Trying 192.168.4.121... * TCP_NODELAY set * Connected to canary-openshift-ingress-canary.apps.ocp4.teklocal.net (192.168.4.121) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Finished (20): * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.3 (OUT), TLS handshake, [no content] (0): * TLSv1.3 (OUT), TLS handshake, Finished (20): * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 * ALPN, server did not agree to a protocol * Server certificate: * subject: CN=*.apps.ocp4.teklocal.net * start date: Mar 8 21:10:52 2022 GMT * expire date: Mar 7 21:10:53 2024 GMT * issuer: CN=ingress-operator@1646773672 * SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway. * TLSv1.3 (OUT), TLS app data, [no content] (0): > GET / HTTP/1.1 > Host: canary-openshift-ingress-canary.apps.ocp4.teklocal.net > User-Agent: curl/7.61.1 > Accept: */* > * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): * TLSv1.3 (IN), TLS app data, [no content] (0): < HTTP/1.1 200 OK < x-request-port: 8080 < date: Thu, 10 Mar 2022 00:18:08 GMT < content-length: 22 < content-type: text/plain; charset=utf-8 < set-cookie: c6e529a6ab19a530fd4f1cceb91c08a9=1391f1a9dbcafe076e97d09351a46977; path=/; HttpOnly; Secure; SameSite=None < cache-control: private < Healthcheck requested * Connection #0 to host canary-openshift-ingress-canary.apps.ocp4.teklocal.net left intact dnslookup: 12.128738 | connect: 12.130423 | appconnect: 12.219248 | pretransfer: 12.221369 | starttransfer: 12.229251 | total: 12.231538 | size: 22 ``` USING CLUSTER DNS ``` sh-4.4$ nslookup canary-openshift-ingress-canary.apps.ocp4.teklocal.net ;; Truncated, retrying in TCP mode. Server: 172.30.0.10 Address: 172.30.0.10#53 ** server can't find canary-openshift-ingress-canary.apps.ocp4.teklocal.net.teklocal.net: SERVFAIL sh-4.4$ ``` USING MY LOCAL DNSMASQ TO RESOLVE ``` sh-4.4$ nslookup canary-openshift-ingress-canary.apps.ocp4.teklocal.net 192.168.4.3 Server: 192.168.4.3 Address: 192.168.4.3#53 Name: canary-openshift-ingress-canary.apps.ocp4.teklocal.net Address: 192.168.4.121 sh-4.4$ ```
DNS lookups with timiing: Non-cluster dnsmasq for local network ``` sh-4.4$ time nslookup canary-openshift-ingress-canary.apps.ocp4.teklocal.net 192.168.4.3 Server: 192.168.4.3 Address: 192.168.4.3#53 Name: canary-openshift-ingress-canary.apps.ocp4.teklocal.net Address: 192.168.4.121 real 0m0.229s user 0m0.009s sys 0m0.015s ``` Cluster DNS ``` sh-4.4$ time nslookup canary-openshift-ingress-canary.apps.ocp4.teklocal.net ;; Truncated, retrying in TCP mode. Server: 172.30.0.10 Address: 172.30.0.10#53 ** server can't find canary-openshift-ingress-canary.apps.ocp4.teklocal.net.teklocal.net: SERVFAIL real 0m6.311s user 0m0.006s sys 0m0.022s ``` When using the default cluster DNS resolver we see a 6s timeout and the failure to resolve the host
Also, if this helps.... ``` [ddreggors@provisioner ~]$ oc get svc -n openshift-dns dns-default -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR dns-default ClusterIP 172.30.0.10 <none> 53/UDP,53/TCP,9154/TCP 27h dns.operator.openshift.io/daemonset-dns=default ``` ``` [ddreggors@provisioner ~]$ oc get pod -n openshift-dns -o wide -l dns.operator.openshift.io/daemonset-dns=default NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES dns-default-cgpc9 2/2 Running 0 27h 10.129.0.5 ocp4-p4n85-master-0 <none> <none> dns-default-fxtg9 2/2 Running 0 26h 10.128.2.6 ocp4-p4n85-worker-96ntk <none> <none> dns-default-l6frg 2/2 Running 0 27h 10.128.0.41 ocp4-p4n85-master-1 <none> <none> dns-default-msmsx 2/2 Running 0 27h 10.131.0.8 ocp4-p4n85-worker-56jvx <none> <none> dns-default-nq5k5 2/2 Running 0 27h 10.130.0.6 ocp4-p4n85-master-2 <none> <none> ```
Could you help to get the output of this command `oc -n openshift-vsphere-infra get pod` ? thanks
(In reply to Hongan Li from comment #11) > Could you help to get the output of this command `oc -n > openshift-vsphere-infra get pod` ? thanks ``` [ddreggors@provisioner ~]$ oc -n openshift-vsphere-infra get pod NAME READY STATUS RESTARTS AGE coredns-ocp4-p4n85-master-0 2/2 Running 0 40h coredns-ocp4-p4n85-master-1 2/2 Running 0 40h coredns-ocp4-p4n85-master-2 2/2 Running 0 40h coredns-ocp4-p4n85-worker-56jvx 2/2 Running 0 39h coredns-ocp4-p4n85-worker-96ntk 2/2 Running 0 39h haproxy-ocp4-p4n85-master-0 2/2 Running 0 40h haproxy-ocp4-p4n85-master-1 2/2 Running 0 40h haproxy-ocp4-p4n85-master-2 2/2 Running 0 40h keepalived-ocp4-p4n85-master-0 2/2 Running 0 40h keepalived-ocp4-p4n85-master-1 2/2 Running 0 40h keepalived-ocp4-p4n85-master-2 2/2 Running 1 (40h ago) 40h keepalived-ocp4-p4n85-worker-56jvx 2/2 Running 0 39h keepalived-ocp4-p4n85-worker-96ntk 2/2 Running 0 39h ```
(In reply to Hongan Li from comment #11) > Could you help to get the output of this command `oc -n > openshift-vsphere-infra get pod` ? thanks However, the logs for the pods in openshift-vsphere-infra are not so nice, there are tons of messages like this: ``` [ddreggors@provisioner ~]$ oc logs -n openshift-vsphere-infra coredns-ocp4-p4n85-worker-56jvx coredns|tail -n 10 [ERROR] plugin/errors: 2 oauth-openshift.apps.ocp4.teklocal.net.teklocal.net. A: dial tcp 192.168.4.3:53: connect: connection refused [ERROR] plugin/errors: 2 vcsa.teklocal.net.teklocal.net. AAAA: dial tcp 192.168.4.3:53: connect: connection refused [ERROR] plugin/errors: 2 vcsa.teklocal.net.teklocal.net. AAAA: dial tcp 192.168.4.3:53: connect: connection refused [ERROR] plugin/errors: 2 vcsa.teklocal.net.ocp4.teklocal.net. AAAA: dial tcp 192.168.4.3:53: connect: connection refused [ERROR] plugin/errors: 2 vcsa.teklocal.net.ocp4.teklocal.net. AAAA: dial tcp 192.168.4.3:53: connect: connection refused [ERROR] plugin/errors: 2 vcsa.teklocal.net.ocp4.teklocal.net. AAAA: dial tcp 192.168.4.3:53: connect: connection refused [ERROR] plugin/errors: 2 console-openshift-console.apps.ocp4.teklocal.net.teklocal.net. AAAA: dial tcp 192.168.4.3:53: connect: connection refused [ERROR] plugin/errors: 2 oauth-openshift.apps.ocp4.teklocal.net.teklocal.net. AAAA: dial tcp 192.168.4.3:53: connect: connection refused [ERROR] plugin/errors: 2 oauth-openshift.apps.ocp4.teklocal.net.teklocal.net. A: dial tcp 192.168.4.3:53: connect: connection refused [ERROR] plugin/errors: 2 console-openshift-console.apps.ocp4.teklocal.net.teklocal.net. AAAA: dial tcp 192.168.4.3:53: connect: connection refused ```
I think I have resolved the issue.... Seeing the errors above for connection refused on coredns I looked back at the "Troubleshoting OpenShift Container Platform 4: DNS" solution: https://access.redhat.com/solutions/3804501 There is a test in that solution that really brought to light my issue: " 7. Verify that both TCP and UDP requests from the coredns container to the upstream DNS server are possible. Both TCP and UDP connections to the upstream DNS server are required for CoreDNS to function correctly: # dig @<UPSTREAM-DNS-IP> redhat.com -p 5353 +tcp +short # dig @<UPSTREAM-DNS-IP> redhat.com -p 5353 +notcp +short " When I tested UDP all was fine, however TCP was an issue: ``` sh-4.4# dig @192.168.4.3 redhat.com -p 53 +notcp +short 209.132.183.105 sh-4.4# dig @192.168.4.3 redhat.com -p 53 +tcp +short ;; Connection to 192.168.4.3#53(192.168.4.3) for redhat.com failed: connection refused. ``` I then looked at my DNS and even though it was configured correctly it had stopped listening on TCP port 53: ``` [root@dev-mini ~]# ss -anpl |grep :53|grep dnsmasq udp UNCONN 0 0 192.168.4.3:53 0.0.0.0:* users:(("dnsmasq",pid=63918,fd=8)) udp UNCONN 0 0 127.0.0.1:53 0.0.0.0:* users:(("dnsmasq",pid=63918,fd=10)) tcp LISTEN 0 32 127.0.0.1:53 0.0.0.0:* users:(("dnsmasq",pid=63918,fd=11)) ``` After a restart I now see dnsmasq listening on TCP 53 again and all errors are resolved in the cluster: ``` [ddreggors@provisioner ~]$ oc get co ingress console authentication NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE ingress 4.9.23 True False False 59m console 4.9.23 True False False 9m4s authentication 4.9.23 True False False 9m2s ```
Closing as NOTABUG