Bug 2025624 - Ingress router metrics endpoint serving old certificates after certificate rotation
Summary: Ingress router metrics endpoint serving old certificates after certificate ro...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: All
OS: All
high
high
Target Milestone: ---
: 4.11.0
Assignee: Suleyman Akbas
QA Contact: Melvin Joseph
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-22 15:59 UTC by Andreas Nowak
Modified: 2022-08-10 10:40 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 10:39:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift router pull 379 0 None open Bug 2025624: Fix certificate reloader 2022-04-04 22:18:02 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:40:25 UTC

Comment 2 Miciah Dashiel Butler Masters 2021-11-23 17:23:25 UTC
Setting blocker- because this is not a regression and because it has a workaround (deleting the router pod to force it to restart).  However, this is a high-priority issue as it can cause the router to stop reporting metrics until the pod is restarted.

Comment 5 Miciah Dashiel Butler Masters 2022-02-15 00:23:59 UTC
We have not had capacity to work on this issue at this time.

Comment 8 Melvin Joseph 2022-04-09 07:33:49 UTC
Able to reproduce the issue in 4.11.0-0.nightly-2022-04-01-172551 using the https://bugzilla.redhat.com/show_bug.cgi?id=2025624#c3.

melvinjoseph@mjoseph-mac Downloads % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-01-172551   True        False         45m     Cluster version is 4.11.0-0.nightly-2022-04-01-172551

melvinjoseph@mjoseph-mac Downloads %  oc -n openshift-ingress rsh router-default-9689484c8-j8fxs 
sh-4.4$  curl -k -vvv https://localhost:1936/metrics
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 1936 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=router-internal-default.openshift-ingress.svc
*  start date: Apr  9 03:14:26 2022 GMT
*  expire date: Apr  8 03:14:27 2024 GMT
*  issuer: CN=openshift-service-serving-signer@1649473994
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET /metrics HTTP/1.1
> Host: localhost:1936
> User-Agent: curl/7.61.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/1.1 403 Forbidden
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sat, 09 Apr 2022 04:21:07 GMT
< Content-Length: 12

melvinjoseph@mjoseph-mac Downloads % oc delete secret/signing-key -n openshift-service-ca 
secret "signing-key" deleted

Waiting for some time to reload the pods
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress logs router-default-9689484c8-j8fxs   
<---snip--->
E0409 04:23:59.526940       1 reflector.go:138] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: Failed to watch *v1.Route: failed to list *v1.Route: the server is currently unable to handle the request (get routes.route.openshift.io)
I0409 04:24:00.469789       1 template.go:801] router "msg"="reloaded metrics certificate"  "cert"="/etc/pki/tls/metrics-certs/tls.crt" "key"="/etc/pki/tls/metrics-certs/tls.key"
2022/04/09 04:24:03 http: TLS handshake error from 10.129.2.8:38784: remote error: tls: bad certificate
I0409 04:24:15.157284       1 router.go:618] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 using PROXY protocol ...\n - Health check ok : 0 retry attempt(s).\n"
<---snip--->

melvinjoseph@mjoseph-mac Downloads %  oc -n openshift-ingress rsh router-default-9689484c8-j8fxs   
sh-4.4$ curl -k -vvv https://localhost:1936/metrics
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 1936 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=router-internal-default.openshift-ingress.svc
*  start date: Apr  9 03:14:26 2022 GMT
*  expire date: Apr  8 03:14:27 2024 GMT
*  issuer: CN=openshift-service-serving-signer@1649473994
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET /metrics HTTP/1.1
> Host: localhost:1936
> User-Agent: curl/7.61.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/1.1 403 Forbidden
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sat, 09 Apr 2022 05:22:31 GMT
< Content-Length: 12
< 


The certificate is not reloaded in the metrics even though it is shows as reloaded in the router logs.

Comment 9 Melvin Joseph 2022-04-09 07:41:46 UTC
Able to verify the fix in 4.11.0-0.nightly-2022-04-06-213816 using the https://bugzilla.redhat.com/show_bug.cgi?id=2025624#c3.

melvinjoseph@mjoseph-mac Downloads % oc get clusterversion

NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-06-213816   True        False         77m     Cluster version is 4.11.0-0.nightly-2022-04-06-213816
melvinjoseph@mjoseph-mac Downloads % 
melvinjoseph@mjoseph-mac Downloads %  oc -n openshift-ingress rsh router-default-5bf7988f7f-8jc4z
sh-4.4$ curl -k -vvv https://localhost:1936/metrics
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 1936 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=router-internal-default.openshift-ingress.svc
*  start date: Apr  9 05:39:12 2022 GMT
*  expire date: Apr  8 05:39:13 2024 GMT
*  issuer: CN=openshift-service-serving-signer@1649482677
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET /metrics HTTP/1.1
> Host: localhost:1936
> User-Agent: curl/7.61.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/1.1 403 Forbidden
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sat, 09 Apr 2022 07:11:31 GMT
< Content-Length: 12

melvinjoseph@mjoseph-mac Downloads % oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.11.0-0.nightly-2022-04-06-213816   True        False         False      79m     
melvinjoseph@mjoseph-mac Downloads % oc delete secret/signing-key -n openshift-service-ca
secret "signing-key" deleted

melvinjoseph@mjoseph-mac Downloads % oc get co                                                    
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.11.0-0.nightly-2022-04-06-213816   True        False         False      19m     

melvinjoseph@mjoseph-mac Downloads %  oc -n openshift-ingress logs router-default-5bf7988f7f-8jc4z
<---snip--->
I0409 05:56:06.774641       1 router.go:618] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 using PROXY protocol ...\n - Health check ok : 0 retry attempt(s).\n"
I0409 07:12:27.778665       1 template.go:801] router "msg"="reloaded metrics certificate"  "cert"="/etc/pki/tls/metrics-certs/tls.crt" "key"="/etc/pki/tls/metrics-certs/tls.key"
melvinjoseph@mjoseph-mac Downloads % 

melvinjoseph@mjoseph-mac Downloads %  oc -n openshift-ingress rsh router-default-5bf7988f7f-8jc4z 
sh-4.4$ curl -k -vvv https://localhost:1936/metrics
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 1936 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=router-internal-default.openshift-ingress.svc
*  start date: Apr  9 07:12:08 2022 GMT
*  expire date: Apr  8 07:12:09 2024 GMT
*  issuer: CN=openshift-service-serving-signer@1649488323
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET /metrics HTTP/1.1
> Host: localhost:1936
> User-Agent: curl/7.61.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/1.1 403 Forbidden
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sat, 09 Apr 2022 07:33:30 GMT
< Content-Length: 12

We can see the certificate is updated in the metrics and hence marking as verified

Comment 11 errata-xmlrpc 2022-08-10 10:39:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.