Bug 1835483

Summary: No TLS certs available for HTTPS metrics
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: Cluster Version OperatorAssignee: Jack Ottofaro <jack.ottofaro>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: high Docs Contact:
Priority: high    
Version: 4.4CC: aos-bugs, jokerman, wking
Target Milestone: ---   
Target Release: 4.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The cluster-version operator should serve metrics over HTTPS. To do that, it needs a TLS key and certificate. Consequence: Without a TLS key and certificate, cluster-version operators which expect them to be in place will crash loop. Fix: Add a service annotation in 4.4.z (this bug), so the 4.4 monitoring operator will create the TLS key and certificate. Result: When an update from future 4.4.z to 4.5 is initiated, the incoming 4.5 cluster version operator will have the TLS key and certificate that it needs to start serving metrics over HTTPS.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-02 11:18:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1834568    
Bug Blocks:    

Description OpenShift BugZilla Robot 2020-05-13 22:05:54 UTC
+++ This bug was initially created as a clone of Bug #1834568 +++

Description of problem:

Fix for https://bugzilla.redhat.com/show_bug.cgi?id=1809195 is to add https metrics target. The PR to fix that bug is failing CI upgrade test since the 0001_00_cluster-version-operator_03_service with the service.beta.openshift.io/serving-cert-secret-name annotation is coming too late. So the monitoring tools aren't creating the secret causing the CVO container to fail to start.

Solution, which this bug addresses, is to first distribute a z-stream release, e.g. 4.y.z, with the service annotation. A subsequent z-stream distribution, e.g. 4.y.(z+n), will contain the remaining changes to address https://bugzilla.redhat.com/show_bug.cgi?id=1809195.

When 4.y.(z+n) enters Cincinnati, we will ensure that it and all subsequent 4.y.(z+n) releases are never the update of a 4.y.(z-m) release. So you have to go from the early z-stream (which has nothing about HTTPS) to a middle-ground z-stream (which only has the annotation) to a late z-stream (which has both the annotation and remaining code changes). We also need to backport the annotation to 4.(y-1), and block all 4.(y-1) -> 4.y from before the annotation landed in 4.(y-1) so you couldn't go straight to 4.y.(z+n) or later.

--- Additional comment from wking on 2020-05-12 23:51:44 UTC ---

Test plan is:

1. Launch the cluster.
2. Ensure that a secret named cluster-version-operator-serving-cert exists in the openshift-cluster-version namespace.

Comment 3 liujia 2020-05-25 06:28:19 UTC
Version:
4.4.0-0.nightly-2020-05-24-193742

Fresh installation:
# ./oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-05-24-193742   True        False         2m24s   Cluster version is 4.4.0-0.nightly-2020-05-24-193742
# ./oc get secrets cluster-version-operator-serving-cert -n openshift-cluster-version
NAME                                    TYPE                DATA   AGE
cluster-version-operator-serving-cert   kubernetes.io/tls   2      15m
# ./oc get service/cluster-version-operator -o json -n openshift-cluster-version|jq .metadata.annotations
{
  "exclude.release.openshift.io/internal-openshift-hosted": "true",
  "service.alpha.openshift.io/serving-cert-signed-by": "openshift-service-serving-signer@1590377080",
  "service.beta.openshift.io/serving-cert-secret-name": "cluster-version-operator-serving-cert",
  "service.beta.openshift.io/serving-cert-signed-by": "openshift-service-serving-signer@1590377080"
}

Upgrade from old v4.4 to latest v4.4:
Before upgrade:
# ./oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.5     True        False         5m24s   Cluster version is 4.4.5
# ./oc get secrets cluster-version-operator-serving-cert -n openshift-cluster-version
Error from server (NotFound): secrets "cluster-version-operator-serving-cert" not found

After upgrade:
# ./oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-05-24-193742   True        False         79m     Cluster version is 4.4.0-0.nightly-2020-05-24-193742
# ./oc get secrets cluster-version-operator-serving-cert -n openshift-cluster-versionNAME                                    TYPE                DATA   AGE
cluster-version-operator-serving-cert   kubernetes.io/tls   2      120m

Comment 5 errata-xmlrpc 2020-06-02 11:18:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2310