Bug 1834568

Summary: No TLS certs available for HTTPS metrics
Product: OpenShift Container Platform Reporter: Jack Ottofaro <jack.ottofaro>
Component: Cluster Version OperatorAssignee: Jack Ottofaro <jack.ottofaro>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: high Docs Contact:
Priority: high    
Version: 4.4CC: aos-bugs, jokerman, wking
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:37:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1809195, 1835483    

Description Jack Ottofaro 2020-05-12 01:22:15 UTC
Description of problem:

Fix for https://bugzilla.redhat.com/show_bug.cgi?id=1809195 is to add https metrics target. The PR to fix that bug is failing CI upgrade test since the 0001_00_cluster-version-operator_03_service with the service.beta.openshift.io/serving-cert-secret-name annotation is coming too late. So the monitoring tools aren't creating the secret causing the CVO container to fail to start.

Solution, which this bug addresses, is to first distribute a z-stream release, e.g. 4.y.z, with the service annotation. A subsequent z-stream distribution, e.g. 4.y.(z+n), will contain the remaining changes to address https://bugzilla.redhat.com/show_bug.cgi?id=1809195.

When 4.y.(z+n) enters Cincinnati, we will ensure that it and all subsequent 4.y.(z+n) releases are never the update of a 4.y.(z-m) release. So you have to go from the early z-stream (which has nothing about HTTPS) to a middle-ground z-stream (which only has the annotation) to a late z-stream (which has both the annotation and remaining code changes). We also need to backport the annotation to 4.(y-1), and block all 4.(y-1) -> 4.y from before the annotation landed in 4.(y-1) so you couldn't go straight to 4.y.(z+n) or later.

Comment 3 W. Trevor King 2020-05-12 23:51:44 UTC
Test plan is:

1. Launch the cluster.
2. Ensure that a secret named cluster-version-operator-serving-cert exists in the openshift-cluster-version namespace.

Comment 4 liujia 2020-05-14 04:12:55 UTC
# ./oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-05-13-221558   True        False         90m     Cluster version is 4.5.0-0.nightly-2020-05-13-221558

# ./oc get secrets cluster-version-operator-serving-cert -n openshift-cluster-version
NAME                                    TYPE                DATA   AGE
cluster-version-operator-serving-cert   kubernetes.io/tls   2      105m

# ./oc get service/cluster-version-operator -o json -n openshift-cluster-version|jq .metadata.annotations 
{
  "exclude.release.openshift.io/internal-openshift-hosted": "true",
  "service.alpha.openshift.io/serving-cert-signed-by": "openshift-service-serving-signer@1589423028",
  "service.beta.openshift.io/serving-cert-secret-name": "cluster-version-operator-serving-cert",
  "service.beta.openshift.io/serving-cert-signed-by": "openshift-service-serving-signer@1589423028"
}

Comment 5 errata-xmlrpc 2020-07-13 17:37:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409