+++ This bug was initially created as a clone of Bug #1771811 +++ A serving cert supplied by the service CA operator appears to be used to secure the /metrics endpoints of catalog-operator and olm-operator. Neither operator appears to reload the key material if it were to change. When the serving cert is regenerated (i.e. when the service CA is rotated), the endpoints may cease to work until the operators are restarted. The 'Refresh Strategies' section of the linked compatibility doc catalogs potential strategies for responding to changes in key material supplied by the service CA operator. Note that CA rotation can be manually triggered in any 4.x release by removing the signing secret. Automated rotation is likely to be introduced in a future z-stream release. References: Enhancement for automated service CA rotation: https://github.com/openshift/enhancements/blob/master/enhancements/automated-service-ca-rotation.md Operator compatibility with service ca rotation: https://docs.google.com/document/d/1NB2wUf9e8XScfVM6jFBl8VuLYG6-3uV63eUpqmYE8Ts/edit
Hi, Jeff I test it in a cluster without this fixed PR. But, I couldn't reproduce this issue. Details as follows: Cluster version is 4.3.0-0.nightly-2019-12-03-032607 The OLM version without that fixed PR. mac:~ jianzhang$ oc exec catalog-operator-6cfdcd86fd-xwpsh -- olm --version OLM version: 0.13.0 git commit: ba10413e72cfe23724edc588ff25f36dfdbeb37e 1, Delete olm-operator-serving-cert and catalog-operator-serving-cert. mac:~ jianzhang$ oc get secret NAME TYPE DATA AGE builder-dockercfg-lv7jr kubernetes.io/dockercfg 1 23h builder-token-2k476 kubernetes.io/service-account-token 4 23h builder-token-zj659 kubernetes.io/service-account-token 4 23h catalog-operator-serving-cert kubernetes.io/tls 2 6m19s default-dockercfg-kzn65 kubernetes.io/dockercfg 1 23h default-token-lbgz5 kubernetes.io/service-account-token 4 23h default-token-x554m kubernetes.io/service-account-token 4 23h deployer-dockercfg-pdrmt kubernetes.io/dockercfg 1 23h deployer-token-mclc7 kubernetes.io/service-account-token 4 23h deployer-token-q9jtd kubernetes.io/service-account-token 4 23h olm-operator-serviceaccount-dockercfg-zqfnc kubernetes.io/dockercfg 1 23h olm-operator-serviceaccount-token-4vtnf kubernetes.io/service-account-token 4 23h olm-operator-serviceaccount-token-vgfxq kubernetes.io/service-account-token 4 23h olm-operator-serving-cert kubernetes.io/tls 2 6m19s v1.packages.operators.coreos.com-cert kubernetes.io/tls 2 23h 2, Forward the port to my localhost. mac:~ jianzhang$ oc port-forward catalog-operator-6cfdcd86fd-xwpsh 8081:8081 Forwarding from 127.0.0.1:8081 -> 8081 Forwarding from [::1]:8081 -> 8081 Handling connection for 8081 3, In another terminal, run `openssl s_client -connect`, it works well. mac:~ jianzhang$ openssl s_client -connect localhost:8081 CONNECTED(00000005) depth=1 CN = openshift-service-serving-signer@1575455354 verify error:num=19:self signed certificate in certificate chain verify return:0 --- Certificate chain 0 s:/CN=catalog-operator-metrics.openshift-operator-lifecycle-manager.svc i:/CN=openshift-service-serving-signer@1575455354 1 s:/CN=openshift-service-serving-signer@1575455354 i:/CN=openshift-service-serving-signer@1575455354 --- Server certificate -----BEGIN CERTIFICATE----- ... Start Time: 1575538998 Timeout : 7200 (sec) Verify return code: 19 (self signed certificate in certificate chain) 4, Check the metrics on the Promuttheus, it works well. See a screenshot: https://user-images.githubusercontent.com/15416633/70224969-1cd68280-1789-11ea-8aa9-669a4c9c9f0d.png So, what're the steps to reproduce this issue?
My reference to using openssl s_client was a pointer to get started, not the entire test itself. Without the PR, the original certificate will stay in use until the container is restarted. I don't see much value in testing anything without the PR, but if you really wanted to you can verify that the certificate is still the same after you delete the certs in the OLM namespace. With the PR, do something like this after you've set up the port forwarding you had before: $ echo | openssl s_client -connect localhost:8081 2>&1 | sed --quiet '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > olm.crt $ openssl x509 -in olm.crt -purpose -noout -text Do the above before and after deleting the certificate in the OLM namespace. The result should be that the certificate is different and I assume the validity (not before / not after) will be slightly different too.
Hi Jeff, Many thanks for your information! I test it in a cluster within this fixed PR, details as follows: Cluster version is 4.3.0-0.nightly-2019-12-05-213858 mac:~ jianzhang$ oc exec catalog-operator-8fcc9bc76-bjzz6 -- olm --version OLM version: 0.13.0 git commit: 7dfd4517e5368fa19c48dab9b9e126798f3c3f40 mac:~ jianzhang$ oc port-forward catalog-operator-8fcc9bc76-kvctw 8081:8081 Forwarding from 127.0.0.1:8081 -> 8081 Forwarding from [::1]:8081 -> 8081 Handling connection for 8081 ... mac:~ jianzhang$ echo | openssl s_client -connect localhost:8081 2>&1 | gsed --quiet '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > olm4.crt Only delete secret: catalog-operator-serving-cert, olm-operator-serving-cert mac:~ jianzhang$ oc get secret NAME TYPE DATA AGE builder-dockercfg-mpcgh kubernetes.io/dockercfg 1 42m builder-token-4dpzn kubernetes.io/service-account-token 4 43m builder-token-p8vm9 kubernetes.io/service-account-token 4 43m catalog-operator-serving-cert kubernetes.io/tls 2 64s default-dockercfg-4zkbl kubernetes.io/dockercfg 1 42m default-token-dhdst kubernetes.io/service-account-token 4 51m default-token-j2vdg kubernetes.io/service-account-token 4 43m deployer-dockercfg-v979w kubernetes.io/dockercfg 1 42m deployer-token-54pmq kubernetes.io/service-account-token 4 43m deployer-token-tr248 kubernetes.io/service-account-token 4 43m olm-operator-serviceaccount-dockercfg-ldx5g kubernetes.io/dockercfg 1 43m olm-operator-serviceaccount-token-kbshw kubernetes.io/service-account-token 4 43m olm-operator-serviceaccount-token-knwvr kubernetes.io/service-account-token 4 51m olm-operator-serving-cert kubernetes.io/tls 2 64s v1.packages.operators.coreos.com-cert kubernetes.io/tls 2 47m mac:~ jianzhang$ echo | openssl s_client -connect localhost:8081 2>&1 | gsed --quiet '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > olm5.crt Check if the olm4.crt and olm5. crt are the same. mac:~ jianzhang$ diff olm4.crt olm5.crt 2c2 < MIIEVjCCAz6gAwIBAgIIKAf5qP8BYvcwDQYJKoZIhvcNAQELBQAwNjE0MDIGA1UE --- > MIIEVjCCAz6gAwIBAgIIBZzzc3WJ7kYwDQYJKoZIhvcNAQELBQAwNjE0MDIGA1UE 4c4 < Fw0xOTEyMDYwNTM0NDRaFw0yMTEyMDUwNTM0NDVaMEwxSjBIBgNVBAMTQWNhdGFs --- > Fw0xOTEyMDYwNTQzMDlaFw0yMTEyMDUwNTQzMTBaMEwxSjBIBgNVBAMTQWNhdGFs 6,14c6,14 < LW1hbmFnZXIuc3ZjMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxxpz < w7/wDb5aJGsu9cVGzF08wVpXMWYW6VfFa0oiipLO/RttLOgm8UUsjqgH+w/bwaCl < X1zxdVBbpqvHX3NDxvb72GM24qhTKoWXuQX0Vt6pzn8vhzzvnzFcy4sjXx7fOmC2 < tc4b4dGiwmYh9hqy/Jtv19QTU7LI+/Prk+2oYe/fRK5PDH1UEFLWx3nfzmjstZGE < 9aRnh5wTba2iCnmP8i/BYa9yVdt58Mb7touBA+/Nj3iTL0KgNBkJQLEoiIcmuE7C < jgUQRMxRfRVVdXR7XMHrQerr96tajZwnSjbcM4SYEcigoRJVa+o/g019mRfktajH < o8d+6fuf6AHt8uNA0QIDAQABo4IBUDCCAUwwDgYDVR0PAQH/BAQDAgWgMBMGA1Ud < JQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwHQYDVR0OBBYEFO5v1gBY0qlr < f59f05f4V6IEMAP/MB8GA1UdIwQYMBaAFNxpLrspblVMh04UbIaYlAYneWO2MIGf --- > LW1hbmFnZXIuc3ZjMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAsagD > npKrLfRay3Re7RMBpRl4MtCoZrqR9I5Aps575G8k0uBGwXf2F4YURHjpXvD0zfly > mbTy3U/oeStX+HDQ54mfLjDhGqkizpFmYHASwtqXdDxsrRbeGRKzWCYsYWaBZTAq > KrniFtPCiAOCEAbJBvUmcv2ahR6CVajXNiUSz9j+ptPoGCyfpQ4CO1kSF6X0Y5Gy > R8kTExhXua6bs30jpdhE9vcENpc8YjGrh/81HtMZRohwWyZNeAz3dwbIxuX1YfVB > dz1AT9O5ebciy3cs4EaU5wr5bj6/63I4DF5rQa7NZJPLlCurBFLYpR5F4Mk0a1TD > LQ3c4DQRM+6wgLSqmwIDAQABo4IBUDCCAUwwDgYDVR0PAQH/BAQDAgWgMBMGA1Ud > JQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwHQYDVR0OBBYEFHAevST4r1DZ > WJAsxGXaL/OxucL/MB8GA1UdIwQYMBaAFNxpLrspblVMh04UbIaYlAYneWO2MIGf 19,25c19,25 < NDllOS05ZDQ5LWEyMzI4ODc5NDM1YTANBgkqhkiG9w0BAQsFAAOCAQEAW3QGBOxR < 7dzGifds6qnei4JjFx85Jgq6eLUKZSvz3RLfToKtWs96LCQIp0cxPdnJtFAzfzEO < 3vk04ZXfgG2FnomlQ0h7SOZQH03+khwErVjIwfoHyHvVIzLXEI9p6yyHCWArkS3L < YrIqbCMN+hP6BNi9+iFXRuF80H0POMwXIz96Sk6hOxZOqg6lb8NBiJusf2Av6Np0 < DduWZJC/Xef9paiDkLKzXJkginNNQ0MZCWnTgl5+weXJJYeQauk8zUyGunDu4Os6 < hSYi16xPKHryIlsWEPMnMdKlye8pn3UDT4E+5xKjBf26ML5kiPSYbCav/pt7olkF < DbxbG5OYu9KKWQ== --- > NDllOS05ZDQ5LWEyMzI4ODc5NDM1YTANBgkqhkiG9w0BAQsFAAOCAQEADUdpNgTW > HjwfQorMzRKVMYdvSGC/Ku/SaSBJd65mbQFexNeYiloX+UcogM5IawFqDw6haK6m > DJlG5hR+uBgdSIgSYlRvUPkLU/iRgtUXnMydb8OTOs3cxTFTEloaaA4BzJNz7qn8 > M0TggdR5jKDHa29h1IyO30jvQnz52mMpLfXt+QrRoWQ+Gs+Pv1mLjomMUkPcgxOS > s5JKJ0AVcrEQmQbZPuLTmispVtZ3v1YD4mvI4Fc5HsMRXSQwIYVOioimC9ownK0n > 6ldi9gDEPE/JjaDOj53McVP2TSnaEaGdDksVPei5Y45Y+MmrHqWlTIcKfnax53R+ > Ec7NcsKMB0QCPg== They are different. LGTM, verify it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062