Bug 1949795 - The defaulting mechanism on HCO CR is not working if the user completely omit the spec stanza
Summary: The defaulting mechanism on HCO CR is not working if the user completely omit...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Installation
Version: 4.8.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.8.0
Assignee: Simone Tiraboschi
QA Contact: ibesso
URL:
Whiteboard:
Depends On: 1943217
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-15 06:32 UTC by Denis Ollier
Modified: 2021-07-27 14:30 UTC (History)
6 users (show)

Fixed In Version: hco-bundle-registry-container-v4.8.0-286
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 14:29:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
CDI Operator logs (6.46 KB, application/gzip)
2021-04-15 17:09 UTC, Denis Ollier
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt hyperconverged-cluster-operator pull 1257 0 None open WIP: hardening the defaulting mechanism 2021-04-16 16:03:02 UTC
Red Hat Product Errata RHSA-2021:2920 0 None None None 2021-07-27 14:30:51 UTC

Description Denis Ollier 2021-04-15 06:32:52 UTC
Description of problem
----------------------

CDI Pods apiserver, deployment and uploadproxy are stuck in ContainerCreating state.

Version-Release number of selected component
--------------------------------------------

CDI: v4.8.0-13
hco-bundle-registry: v4.8.0-259 (2021-04-14 21:40:32)
IIB image: registry-proxy.engineering.redhat.com/rh-osbs/iib:66881
OCP: v4.8.0-0.nightly-2021-04-09-101800

How reproducible
----------------

100%

Steps to Reproduce
------------------

Install CNV from IIB image registry-proxy.engineering.redhat.com/rh-osbs/iib:66881.

Actual results
--------------

CDI Pods apiserver, deployment and uploadproxy are stuck in ContainerCreating state:

> cdi-apiserver-65f668699b-hzw87                    0/1     ContainerCreating   0          3h9m
> cdi-deployment-645d9969b-wpvts                    0/1     ContainerCreating   0          3h9m
> cdi-operator-5849995fcc-b29wz                     1/1     Running             0          3h9m
> cdi-uploadproxy-7fcf95945-lmjtm                   0/1     ContainerCreating   0          3h9m

Expected results
----------------

CDI Pods should start properly

Additional info
---------------

Events from those Pods show issues with TLS related Secrets.

* CDI apiserver:

> Unable to attach or mount volumes: unmounted volumes=[server-cert], unattached volumes=[cdi-apiserver-token-jvldj ca-bundle server-cert]: timed out waiting for the condition
> MountVolume.SetUp failed for volume "server-cert" : references non-existent secret key: tls.crt

* CDI deployment:

> Unable to attach or mount volumes: unmounted volumes=[cdi-api-signing-key uploadserver-ca-cert uploadserver-client-ca-cert uploadserver-ca-bundle uploadserver-client-ca-bundle], unattached volumes=[cdi-sa-token-m74bw cdi-api-signing-key uploadserver-ca-cert uploadserver-client-ca-cert uploadserver-ca-bundle uploadserver-client-ca-bundle]: timed out waiting for the condition
> MountVolume.SetUp failed for volume "cdi-api-signing-key" : secret "cdi-api-signing-key" not found
> MountVolume.SetUp failed for volume "uploadserver-client-ca-cert" : references non-existent secret key: tls.crt
> MountVolume.SetUp failed for volume "uploadserver-ca-bundle" : configmap references non-existent config key: ca-bundle.crt
> MountVolume.SetUp failed for volume "uploadserver-ca-cert" : references non-existent secret key: tls.crt
> MountVolume.SetUp failed for volume "uploadserver-client-ca-bundle" : configmap references non-existent config key: ca-bundle.crt

* CDI uploadproxy:

> Unable to attach or mount volumes: unmounted volumes=[server-cert client-cert], unattached volumes=[cdi-uploadproxy-token-p24xw server-cert client-cert]: timed out waiting for the condition
> MountVolume.SetUp failed for volume "server-cert" : references non-existent secret key: tls.crt
> MountVolume.SetUp failed for volume "client-cert" : references non-existent secret key: tls.crt

Comment 1 Michael Henriksen 2021-04-15 15:17:28 UTC
Can you attach the contents of cdi-operator log?

I've similar errors occur when a previous install of CDI was not uninstalled correctly.  Make sure the install namespace does not exist or is completely empty

Comment 2 Denis Ollier 2021-04-15 17:09:47 UTC
Created attachment 1772228 [details]
CDI Operator logs

CDI Operator logs attached.

Namespace openshift-cnv is removed entirely before (re)installing CNV.

Comment 4 Michael Henriksen 2021-04-15 19:22:51 UTC
On further inspection of logs, it appears that the "cdi-apiserver-signer-bundle" exists, the configmap "data" is not nil but the ca-bundle.crt key does not.  This is a strange state to be in.  But can you please try deleting cdi-apiserver-signer-bundle.  It *should* get recreated correctly.  May take a minute.

Comment 6 Denis Ollier 2021-04-15 20:05:16 UTC
ConfigMap cdi-apiserver-signer-bundle before deletion:

> ---
> kind: ConfigMap
> apiVersion: v1
> metadata:
>   annotations:
>     operator.cdi.kubevirt.io/lastAppliedConfiguration: '{"metadata":{"name":"cdi-apiserver-signer-bundle","namespace":"openshift-cnv","creationTimestamp":null,"labels":{"cdi.kubevirt.io":""}}}'
>   creationTimestamp: "2021-04-15T19:35:42Z"
>   labels:
>     auth.openshift.io/managed-certificate-type: ca-bundle
>     cdi.kubevirt.io: ""
>     operator.cdi.kubevirt.io/createVersion: v4.8.0
>   name: cdi-apiserver-signer-bundle
>   namespace: openshift-cnv
>   ownerReferences:
>   - apiVersion: cdi.kubevirt.io/v1beta1
>     blockOwnerDeletion: true
>     controller: true
>     kind: CDI
>     name: cdi-kubevirt-hyperconverged
>     uid: 853cd265-29c4-4d5f-ae48-2a46abbbba71
>   resourceVersion: "515630"
>   uid: 93235c20-a5ed-4f08-a348-2d4516f38ed9
> data:
>   ca-bundle.crt: ""

ConfigMap after getting deleted and recreated by CDI:

> ---
> kind: ConfigMap
> apiVersion: v1
> metadata:
>   annotations:
>     operator.cdi.kubevirt.io/lastAppliedConfiguration: '{"metadata":{"name":"cdi-apiserver-signer-bundle","namespace":"openshift-cnv","creationTimestamp":null,"labels":{"cdi.kubevirt.io":""}}}'
>   creationTimestamp: "2021-04-15T20:01:28Z"
>   labels:
>     auth.openshift.io/managed-certificate-type: ca-bundle
>     cdi.kubevirt.io: ""
>     operator.cdi.kubevirt.io/createVersion: v4.8.0
>   name: cdi-apiserver-signer-bundle
>   namespace: openshift-cnv
>   ownerReferences:
>   - apiVersion: cdi.kubevirt.io/v1beta1
>     blockOwnerDeletion: true
>     controller: true
>     kind: CDI
>     name: cdi-kubevirt-hyperconverged
>     uid: 853cd265-29c4-4d5f-ae48-2a46abbbba71
>   resourceVersion: "537762"
>   uid: 6a61a137-e680-4d81-a9d1-bed0fb5844b6
> data:
>   ca-bundle.crt: ""

Comment 7 Michael Henriksen 2021-04-15 23:31:33 UTC
Ah, this will do it, from HCO CR:

spec:
  certConfig:
    ca:
      duration: 0s
      renewBefore: 0s
    server:
      duration: 0s
      renewBefore: 0s

Changing to following fixed the issue:

spec:
  certConfig:
    ca:
      duration: 24h
      renewBefore: 12h
    server:
      duration: 12h
      renewBefore: 6h

Comment 9 Simone Tiraboschi 2021-04-16 13:26:32 UTC
(In reply to Michael Henriksen from comment #7)
> Ah, this will do it, from HCO CR:
> 
> spec:
>   certConfig:
>     ca:
>       duration: 0s
>       renewBefore: 0s
>     server:
>       duration: 0s
>       renewBefore: 0s

This looks exactly as a symptom of https://bugzilla.redhat.com/1943217 that is now supposed to be fixed.


Denis, how are you creating the CR for HCO?
what is it going to happen if you completely remove certConfig stanza? is the defaulting mechanism working?

Comment 10 Denis Ollier 2021-04-16 13:47:52 UTC
We create the HCO resource from this YAML:

> ---
> kind: HyperConverged
> apiVersion: hco.kubevirt.io/v1beta1
> metadata:
>   name: kubevirt-hyperconverged
>   namespace: openshift-cnv

We don't define the certConfig.

Do you think I need to add an empty spec (i.e: `spec: {}`) ?

Comment 11 Simone Tiraboschi 2021-04-16 13:52:40 UTC
(In reply to Denis Ollier from comment #10)
> Do you think I need to add an empty spec (i.e: `spec: {}`) ?

I'm trying to reproduce on that cluster and on my opinion the defaulting mechanism works as expected, if you simply omit the spec.certConfig stanza you will simply get the default configuration so:
{
  "ca": {
    "duration": "48h",
    "renewBefore": "24h"
  },
  "server": {
    "duration": "24h",
    "renewBefore": "12h"
  }
}

Try with:
 [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc patch hco -n openshift-cnv   kubevirt-hyperconverged --type json -p '[{ "op": "replace", "path": "/spec/certConfig", "value": {"ca": {"duration": "20h", "renewBefore": "10h"}, "server": {"duration": "20h", "renewBefore": "10h"}}}]'
 hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched
 [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc get hco -n openshift-cnv   kubevirt-hyperconverged -o json | jq .spec.certConfig
 {
   "ca": {
     "duration": "20h",
     "renewBefore": "10h"
   },
   "server": {
     "duration": "20h",
     "renewBefore": "10h"
   }
 }
 [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc patch hco -n openshift-cnv   kubevirt-hyperconverged --type json -p '[{ "op": "remove", "path": "/spec/certConfig" }]'
 hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched
 [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc get hco -n openshift-cnv   kubevirt-hyperconverged -o json | jq .spec.certConfig
 {
   "ca": {
     "duration": "48h",
     "renewBefore": "24h"
   },
   "server": {
     "duration": "24h",
     "renewBefore": "12h"
   }
 }
 [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc patch hco -n openshift-cnv   kubevirt-hyperconverged --type json -p '[{ "op": "replace", "path": "/spec/certConfig", "value": {"ca": {"duration": "40h", "renewBefore": "20h"}, "server": {"duration": "40h", "renewBefore": "20h"}}}]'
 hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched
 [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc get hco -n openshift-cnv   kubevirt-hyperconverged -o json | jq .spec.certConfig
 {
   "ca": {
     "duration": "40h",
     "renewBefore": "20h"
   },
   "server": {
     "duration": "40h",
     "renewBefore": "20h"
   }
 }
 [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc patch hco -n openshift-cnv   kubevirt-hyperconverged --type json -p '[{ "op": "remove", "path": "/spec/certConfig" }]'
 hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched
 [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc get hco -n openshift-cnv   kubevirt-hyperconverged -o json | jq .spec.certConfig
 {
   "ca": {
     "duration": "48h",
     "renewBefore": "24h"
   },
   "server": {
     "duration": "24h",
     "renewBefore": "12h"
   }
 }


The only way to really get a set of 0 is explicitly wirting them (and maybe this deserves a bug on itself since we should refuse 0s there):
 [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc patch hco -n openshift-cnv   kubevirt-hyperconverged --type json -p '[{ "op": "replace", "path": "/spec/certConfig", "value": {"ca": {"duration": "0h", "renewBefore": "0h"}, "server": {"duration": "0h", "renewBefore": "0h"}}}]'
 hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched
 [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc get hco -n openshift-cnv   kubevirt-hyperconverged -o json | jq .spec.certConfig
 {
   "ca": {
     "duration": "0h",
     "renewBefore": "0h"
   },
   "server": {
     "duration": "0h",
     "renewBefore": "0h"
   }
 }

Comment 12 Simone Tiraboschi 2021-04-16 14:00:01 UTC
OK,
reproduced.

If the user completely omit the whole spec stanza we are going to get a set of zeros:
 [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc patch hco -n openshift-cnv   kubevirt-hyperconverged --type json -p '[{ "op": "remove", "path": "/spec" }]'
 hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched
 [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc get hco -n openshift-cnv   kubevirt-hyperconverged -o json | jq .spec.certConfig
 {
   "ca": {
     "duration": "0s",
     "renewBefore": "0s"
   },
   "server": {
     "duration": "0s",
     "renewBefore": "0s"
   }
 }

Taking the bug.

Comment 15 Simone Tiraboschi 2021-04-16 16:04:02 UTC
Removing TestBlocker and AutomationBlocker labels since we have a valid workaround.

Comment 16 ibesso 2021-05-03 12:33:38 UTC
Verified the bug with hco-bundle-registry-container-v4.8.0-290 (IIB == registry-proxy.engineering.redhat.com/rh-osbs/iib:70707).

Verified that the after each of the following, each modification was accepted, but reconciled and the stanzas structure returned to default with the default values for each field
- deleted the whole spec stanza in HCO CR.
- deleted the whole certConfig stanza in HCO CR.
- deleted the whole certConfig.ca stanza in HCO CR.
- deleted the whole certConfig.server stanza in HCO CR.
- deleted the whole sub-fields of certConfig, leaving it bare in HCO CR.
- deleted the whole sub-fields of certConfig.ca, leaving it bare in HCO CR.
- deleted the whole sub-fields of certConfig.server, leaving it bare in HCO CR.

Here is the default structure in HCO CR:

spec:
  certConfig:
    ca:
      duration: 48h0m0s
      renewBefore: 24h0m0s
    server:
      duration: 24h0m0s
      renewBefore: 12h0m0s

+ verified that KubeVirt CR got the default values for all fields in the certConfig (in KubeVirt: certificateRotateStrategy) - OK.
Here is the default structure in KubeVirt CR:

spec:
  certificateRotateStrategy:
    selfSigned:
      ca:
        duration: 48h0m0s
        renewBefore: 24h0m0s
      server:
        duration: 24h0m0s
        renewBefore: 12h0m0s



moving to VERIFIED.

Comment 19 errata-xmlrpc 2021-07-27 14:29:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2920


Note You need to log in before you can comment on or make changes to this bug.