Setting blocker- because this is not a regression and because it has a workaround (deleting the router pod to force it to restart). However, this is a high-priority issue as it can cause the router to stop reporting metrics until the pod is restarted.
We have not had capacity to work on this issue at this time.
Able to reproduce the issue in 4.11.0-0.nightly-2022-04-01-172551 using the https://bugzilla.redhat.com/show_bug.cgi?id=2025624#c3. melvinjoseph@mjoseph-mac Downloads % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-04-01-172551 True False 45m Cluster version is 4.11.0-0.nightly-2022-04-01-172551 melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress rsh router-default-9689484c8-j8fxs sh-4.4$ curl -k -vvv https://localhost:1936/metrics * Trying ::1... * TCP_NODELAY set * Connected to localhost (::1) port 1936 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Request CERT (13): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Finished (20): * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.3 (OUT), TLS handshake, [no content] (0): * TLSv1.3 (OUT), TLS handshake, Certificate (11): * TLSv1.3 (OUT), TLS handshake, [no content] (0): * TLSv1.3 (OUT), TLS handshake, Finished (20): * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 * ALPN, server did not agree to a protocol * Server certificate: * subject: CN=router-internal-default.openshift-ingress.svc * start date: Apr 9 03:14:26 2022 GMT * expire date: Apr 8 03:14:27 2024 GMT * issuer: CN=openshift-service-serving-signer@1649473994 * SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway. * TLSv1.3 (OUT), TLS app data, [no content] (0): > GET /metrics HTTP/1.1 > Host: localhost:1936 > User-Agent: curl/7.61.1 > Accept: */* > * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): * TLSv1.3 (IN), TLS app data, [no content] (0): < HTTP/1.1 403 Forbidden < Content-Type: text/plain; charset=utf-8 < X-Content-Type-Options: nosniff < Date: Sat, 09 Apr 2022 04:21:07 GMT < Content-Length: 12 melvinjoseph@mjoseph-mac Downloads % oc delete secret/signing-key -n openshift-service-ca secret "signing-key" deleted Waiting for some time to reload the pods melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress logs router-default-9689484c8-j8fxs <---snip---> E0409 04:23:59.526940 1 reflector.go:138] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: Failed to watch *v1.Route: failed to list *v1.Route: the server is currently unable to handle the request (get routes.route.openshift.io) I0409 04:24:00.469789 1 template.go:801] router "msg"="reloaded metrics certificate" "cert"="/etc/pki/tls/metrics-certs/tls.crt" "key"="/etc/pki/tls/metrics-certs/tls.key" 2022/04/09 04:24:03 http: TLS handshake error from 10.129.2.8:38784: remote error: tls: bad certificate I0409 04:24:15.157284 1 router.go:618] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 using PROXY protocol ...\n - Health check ok : 0 retry attempt(s).\n" <---snip---> melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress rsh router-default-9689484c8-j8fxs sh-4.4$ curl -k -vvv https://localhost:1936/metrics * Trying ::1... * TCP_NODELAY set * Connected to localhost (::1) port 1936 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Request CERT (13): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Finished (20): * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.3 (OUT), TLS handshake, [no content] (0): * TLSv1.3 (OUT), TLS handshake, Certificate (11): * TLSv1.3 (OUT), TLS handshake, [no content] (0): * TLSv1.3 (OUT), TLS handshake, Finished (20): * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 * ALPN, server did not agree to a protocol * Server certificate: * subject: CN=router-internal-default.openshift-ingress.svc * start date: Apr 9 03:14:26 2022 GMT * expire date: Apr 8 03:14:27 2024 GMT * issuer: CN=openshift-service-serving-signer@1649473994 * SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway. * TLSv1.3 (OUT), TLS app data, [no content] (0): > GET /metrics HTTP/1.1 > Host: localhost:1936 > User-Agent: curl/7.61.1 > Accept: */* > * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): * TLSv1.3 (IN), TLS app data, [no content] (0): < HTTP/1.1 403 Forbidden < Content-Type: text/plain; charset=utf-8 < X-Content-Type-Options: nosniff < Date: Sat, 09 Apr 2022 05:22:31 GMT < Content-Length: 12 < The certificate is not reloaded in the metrics even though it is shows as reloaded in the router logs.
Able to verify the fix in 4.11.0-0.nightly-2022-04-06-213816 using the https://bugzilla.redhat.com/show_bug.cgi?id=2025624#c3. melvinjoseph@mjoseph-mac Downloads % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-04-06-213816 True False 77m Cluster version is 4.11.0-0.nightly-2022-04-06-213816 melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress rsh router-default-5bf7988f7f-8jc4z sh-4.4$ curl -k -vvv https://localhost:1936/metrics * Trying ::1... * TCP_NODELAY set * Connected to localhost (::1) port 1936 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Request CERT (13): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Finished (20): * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.3 (OUT), TLS handshake, [no content] (0): * TLSv1.3 (OUT), TLS handshake, Certificate (11): * TLSv1.3 (OUT), TLS handshake, [no content] (0): * TLSv1.3 (OUT), TLS handshake, Finished (20): * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 * ALPN, server did not agree to a protocol * Server certificate: * subject: CN=router-internal-default.openshift-ingress.svc * start date: Apr 9 05:39:12 2022 GMT * expire date: Apr 8 05:39:13 2024 GMT * issuer: CN=openshift-service-serving-signer@1649482677 * SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway. * TLSv1.3 (OUT), TLS app data, [no content] (0): > GET /metrics HTTP/1.1 > Host: localhost:1936 > User-Agent: curl/7.61.1 > Accept: */* > * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): * TLSv1.3 (IN), TLS app data, [no content] (0): < HTTP/1.1 403 Forbidden < Content-Type: text/plain; charset=utf-8 < X-Content-Type-Options: nosniff < Date: Sat, 09 Apr 2022 07:11:31 GMT < Content-Length: 12 melvinjoseph@mjoseph-mac Downloads % oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.11.0-0.nightly-2022-04-06-213816 True False False 79m melvinjoseph@mjoseph-mac Downloads % oc delete secret/signing-key -n openshift-service-ca secret "signing-key" deleted melvinjoseph@mjoseph-mac Downloads % oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.11.0-0.nightly-2022-04-06-213816 True False False 19m melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress logs router-default-5bf7988f7f-8jc4z <---snip---> I0409 05:56:06.774641 1 router.go:618] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 using PROXY protocol ...\n - Health check ok : 0 retry attempt(s).\n" I0409 07:12:27.778665 1 template.go:801] router "msg"="reloaded metrics certificate" "cert"="/etc/pki/tls/metrics-certs/tls.crt" "key"="/etc/pki/tls/metrics-certs/tls.key" melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress rsh router-default-5bf7988f7f-8jc4z sh-4.4$ curl -k -vvv https://localhost:1936/metrics * Trying ::1... * TCP_NODELAY set * Connected to localhost (::1) port 1936 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Request CERT (13): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Finished (20): * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.3 (OUT), TLS handshake, [no content] (0): * TLSv1.3 (OUT), TLS handshake, Certificate (11): * TLSv1.3 (OUT), TLS handshake, [no content] (0): * TLSv1.3 (OUT), TLS handshake, Finished (20): * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 * ALPN, server did not agree to a protocol * Server certificate: * subject: CN=router-internal-default.openshift-ingress.svc * start date: Apr 9 07:12:08 2022 GMT * expire date: Apr 8 07:12:09 2024 GMT * issuer: CN=openshift-service-serving-signer@1649488323 * SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway. * TLSv1.3 (OUT), TLS app data, [no content] (0): > GET /metrics HTTP/1.1 > Host: localhost:1936 > User-Agent: curl/7.61.1 > Accept: */* > * TLSv1.3 (IN), TLS handshake, [no content] (0): * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): * TLSv1.3 (IN), TLS app data, [no content] (0): < HTTP/1.1 403 Forbidden < Content-Type: text/plain; charset=utf-8 < X-Content-Type-Options: nosniff < Date: Sat, 09 Apr 2022 07:33:30 GMT < Content-Length: 12 We can see the certificate is updated in the metrics and hence marking as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069