A serving cert supplied by the service CA operator appears to be used to secure the /metrics endpoints of catalog-operator and olm-operator. Neither operator appears to reload the key material if it were to change. When the serving cert is regenerated (i.e. when the service CA is rotated), the endpoints may cease to work until the operators are restarted. The 'Refresh Strategies' section of the linked compatibility doc catalogs potential strategies for responding to changes in key material supplied by the service CA operator. Note that CA rotation can be manually triggered in any 4.x release by removing the signing secret. Automated rotation is likely to be introduced in a future z-stream release. References: Enhancement for automated service CA rotation: https://github.com/openshift/enhancements/blob/master/enhancements/automated-service-ca-rotation.md Operator compatibility with service ca rotation: https://docs.google.com/document/d/1NB2wUf9e8XScfVM6jFBl8VuLYG6-3uV63eUpqmYE8Ts/edit
In order to test this functionality, one should delete the olm-operator-serving-cert and the catalog-operator-serving-cert in the openshift-operator-lifecycle-manager namespace, wait for the CA operator to regenerate the certificates, and then ensure that the metrics are being served using the newly generated certificates. Metrics on both operators are served on port 8081. You might consider looking at just the certificates using openssl s_connect...
Sorry, that's "openssl s_client -connect"...
Change its version to 4.4 since we have already a bug 1777593 for 4.3.
Jeff's instructions are valid, though I'm not sure it's worth testing this functionality manually. My intention is to add a periodic rotation job that checks that metrics from all operators are collected after a combination of CA expiry and manual rotation: https://docs.google.com/document/d/1NB2wUf9e8XScfVM6jFBl8VuLYG6-3uV63eUpqmYE8Ts/edit#heading=h.8rgsj08xt5tp
Worked as expected, the certs were rotated as expected. Marking as VERIFIED. Steps used to reproduce: OLM version: 0.13.0 git commit: 30838b7abce35c2d0d24bcf91596fc31db50755b Cluster Version: 4.4.0-0.nightly-2020-01-24-113037 oc port-forward catalog-operator-6795f76457-6tn6g -n openshift-operator-lifecycle-manager 8081:8081 Forwarding from 127.0.0.1:8081 -> 8081 Forwarding from [::1]:8081 -> 8081 Handling connection for 8081 echo | openssl s_client -connect localhost:8081 2>&1 | sed --quiet '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > olm4.crt Only delete secret: catalog-operator-serving-cert, olm-operator-serving-cert oc delete secret olm-operator-serving-cert catalog-operator-serving-cert -n openshift-operator-lifecycle-manager secret "olm-operator-serving-cert" deleted secret "catalog-operator-serving-cert" deleted echo | openssl s_client -connect localhost:8081 2>&1 | sed --quiet '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > olm5.crt diff olm4.crt olm5.crt 2c2 < MIIEVjCCAz6gAwIBAgIICKVjNn35ZYwwDQYJKoZIhvcNAQELBQAwNjE0MDIGA1UE --- > MIIEVjCCAz6gAwIBAgIIRQkuUfQn1GIwDQYJKoZIhvcNAQELBQAwNjE0MDIGA1UE 4c4 < Fw0yMDAxMjQxNTM2MzVaFw0yMjAxMjMxNTM2MzZaMEwxSjBIBgNVBAMTQWNhdGFs --- > Fw0yMDAxMjQxODE3NTRaFw0yMjAxMjMxODE3NTVaMEwxSjBIBgNVBAMTQWNhdGFs 6,14c6,14 < LW1hbmFnZXIuc3ZjMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAqa5k < DSe3F5IBivhC/+H4ooz/TDZRX0eCoik0cU492w+bQ7dnjdlvsj0k2SmFltTgA2gP < eMhz4YMoO1/T5qQ2HfxBtcJ3sronSgll+k/RPGHCb8JCqDd9hE82ITlLd7WqZuI8 < U1WrRG7JQNknk/+OgIoAMHNVmSlp2hJNbMx5pzcPMv5BvAiNa8FJ7/39yxdYaCGa < EzEeWrezzn6H2xuX1oRUKoT67GZPM+ZkQZrl7PcSd+gTwtTbCaofsER5CG+4ydJ4 < nIMRKvALBlnbPNA9pJCjVc4nJrGgNVlcNskxluJ5XWtuYGUcTzLQ0M+lZ5Ztxf2S < JcpXGzhvi6LIQvRnjQIDAQABo4IBUDCCAUwwDgYDVR0PAQH/BAQDAgWgMBMGA1Ud < JQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwHQYDVR0OBBYEFI8OEErqpnCa < QutvGni4AeWUPzd1MB8GA1UdIwQYMBaAFN6CJi6bNxMFTWw2s+FJjpJFjNnKMIGf --- > LW1hbmFnZXIuc3ZjMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA30TC > aKHd/nd+pSkmJuCXSh6FUbW5tssmgGVj+ICPxCbJLDaHAf3xmG8nx5p8sWTAj1bJ > LvW4BZ394lV4YCYvr97WpE5AVJZ0R9/xcFLb3XOkClOy2GxfuFW3SS4zXjHZHaNl > 03YTZoYT7JhacNhcUgKxDdmTvfkIbOlTvHicx5dhyN9ObnAzEZ72ZDnVPWAL/5YR > 8Aii4CWhT94gIfs8GAcuxmxnfA7PBTrMYOmNh/6GHa2cjG8fHLOBOkGOPwwm2KsT > O+WzxV0T9/tvc6im/w0RZ0GhBzXIrF9AGYQ+fPt7zrKiVYLEc0GkwwwAGcyfstQa > mZYsTVueEGI3f8hoOQIDAQABo4IBUDCCAUwwDgYDVR0PAQH/BAQDAgWgMBMGA1Ud > JQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwHQYDVR0OBBYEFJ/mZpc3IR8n > r6HQxNBV3NUT+/+SMB8GA1UdIwQYMBaAFN6CJi6bNxMFTWw2s+FJjpJFjNnKMIGf 19,25c19,25 < NGZhOS04MDYxLTNhNmYzNmMwZDQxOTANBgkqhkiG9w0BAQsFAAOCAQEAdZkcwOU0 < QPfNk6gGvmQtdXxSfHAvfex1SE2bM+KLyDcpkHDFOHiTOxGD/jseEtyznzRcr/fF < yCSJQxQnAaUi7Wq5JP/5f65AkcLdQDv/HLqm2awQsghKSS7Zb9CN2/vEWyeKyNVX < KYCkxR4obomrVhYF43l85VU/FI2cqBDXcxo3mFaPrJtFvUdtwwwGXVVicKYy97Xo < 6IDsT1pc8hT7xRFs81uVSo4zkcQ4VRwMmbyCkIaZYZfnbX4dzW50ydYLJk4vjg6D < GhVPQVxN88koTAWszvlAvXDXn+rn1WYKuxFNnjYm+63+0VTqt6M8C0GWT4GjuiZd < jrQnzAFU1DTEDQ== --- > NGZhOS04MDYxLTNhNmYzNmMwZDQxOTANBgkqhkiG9w0BAQsFAAOCAQEAiW5qRVsP > quIt7eD3xI6dNi3Nb1lcRInI+c0y0RSlNwuNDdIRk5BOFF3p6eXD9rf3TnmoyqcT > Q1dMCDUjIKQwPb2L4Z1Ok3cV5H8QwK3YIxmcqyBT5qM3kJKRWbsxGXTd5pxT/ZTK > qbvj7l6BHQYSiZsjbm0pWH8fOoIeQo6YvpvJkycSm9lZVTwnTS5tMq19dcalCw6t > O5CnDdLjVgb5Zo+yc9nczdtKjK930PiA2+/wi2dxS6JcCFHMuNKJUkz13D7l/UmN > LcOS/Mtan+az3pa565tfhP1AM/N3G4LTGhDPLI540h5Aaoy1p+RQKg3Ok5DHyIwT > NwH66/C4SrDrqg==
I'm in the process of backporting CA rotation to 4.2 and 4.3. Would it make sense to for the fix for this BZ to similarly be backported to preclude failure of metrics collection in the event that the operator is not restarted after CA rotation and before expiry of the pre-rotation CA?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days