Description of problem ---------------------- CDI Pods apiserver, deployment and uploadproxy are stuck in ContainerCreating state. Version-Release number of selected component -------------------------------------------- CDI: v4.8.0-13 hco-bundle-registry: v4.8.0-259 (2021-04-14 21:40:32) IIB image: registry-proxy.engineering.redhat.com/rh-osbs/iib:66881 OCP: v4.8.0-0.nightly-2021-04-09-101800 How reproducible ---------------- 100% Steps to Reproduce ------------------ Install CNV from IIB image registry-proxy.engineering.redhat.com/rh-osbs/iib:66881. Actual results -------------- CDI Pods apiserver, deployment and uploadproxy are stuck in ContainerCreating state: > cdi-apiserver-65f668699b-hzw87 0/1 ContainerCreating 0 3h9m > cdi-deployment-645d9969b-wpvts 0/1 ContainerCreating 0 3h9m > cdi-operator-5849995fcc-b29wz 1/1 Running 0 3h9m > cdi-uploadproxy-7fcf95945-lmjtm 0/1 ContainerCreating 0 3h9m Expected results ---------------- CDI Pods should start properly Additional info --------------- Events from those Pods show issues with TLS related Secrets. * CDI apiserver: > Unable to attach or mount volumes: unmounted volumes=[server-cert], unattached volumes=[cdi-apiserver-token-jvldj ca-bundle server-cert]: timed out waiting for the condition > MountVolume.SetUp failed for volume "server-cert" : references non-existent secret key: tls.crt * CDI deployment: > Unable to attach or mount volumes: unmounted volumes=[cdi-api-signing-key uploadserver-ca-cert uploadserver-client-ca-cert uploadserver-ca-bundle uploadserver-client-ca-bundle], unattached volumes=[cdi-sa-token-m74bw cdi-api-signing-key uploadserver-ca-cert uploadserver-client-ca-cert uploadserver-ca-bundle uploadserver-client-ca-bundle]: timed out waiting for the condition > MountVolume.SetUp failed for volume "cdi-api-signing-key" : secret "cdi-api-signing-key" not found > MountVolume.SetUp failed for volume "uploadserver-client-ca-cert" : references non-existent secret key: tls.crt > MountVolume.SetUp failed for volume "uploadserver-ca-bundle" : configmap references non-existent config key: ca-bundle.crt > MountVolume.SetUp failed for volume "uploadserver-ca-cert" : references non-existent secret key: tls.crt > MountVolume.SetUp failed for volume "uploadserver-client-ca-bundle" : configmap references non-existent config key: ca-bundle.crt * CDI uploadproxy: > Unable to attach or mount volumes: unmounted volumes=[server-cert client-cert], unattached volumes=[cdi-uploadproxy-token-p24xw server-cert client-cert]: timed out waiting for the condition > MountVolume.SetUp failed for volume "server-cert" : references non-existent secret key: tls.crt > MountVolume.SetUp failed for volume "client-cert" : references non-existent secret key: tls.crt
Can you attach the contents of cdi-operator log? I've similar errors occur when a previous install of CDI was not uninstalled correctly. Make sure the install namespace does not exist or is completely empty
Created attachment 1772228 [details] CDI Operator logs CDI Operator logs attached. Namespace openshift-cnv is removed entirely before (re)installing CNV.
On further inspection of logs, it appears that the "cdi-apiserver-signer-bundle" exists, the configmap "data" is not nil but the ca-bundle.crt key does not. This is a strange state to be in. But can you please try deleting cdi-apiserver-signer-bundle. It *should* get recreated correctly. May take a minute.
ConfigMap cdi-apiserver-signer-bundle before deletion: > --- > kind: ConfigMap > apiVersion: v1 > metadata: > annotations: > operator.cdi.kubevirt.io/lastAppliedConfiguration: '{"metadata":{"name":"cdi-apiserver-signer-bundle","namespace":"openshift-cnv","creationTimestamp":null,"labels":{"cdi.kubevirt.io":""}}}' > creationTimestamp: "2021-04-15T19:35:42Z" > labels: > auth.openshift.io/managed-certificate-type: ca-bundle > cdi.kubevirt.io: "" > operator.cdi.kubevirt.io/createVersion: v4.8.0 > name: cdi-apiserver-signer-bundle > namespace: openshift-cnv > ownerReferences: > - apiVersion: cdi.kubevirt.io/v1beta1 > blockOwnerDeletion: true > controller: true > kind: CDI > name: cdi-kubevirt-hyperconverged > uid: 853cd265-29c4-4d5f-ae48-2a46abbbba71 > resourceVersion: "515630" > uid: 93235c20-a5ed-4f08-a348-2d4516f38ed9 > data: > ca-bundle.crt: "" ConfigMap after getting deleted and recreated by CDI: > --- > kind: ConfigMap > apiVersion: v1 > metadata: > annotations: > operator.cdi.kubevirt.io/lastAppliedConfiguration: '{"metadata":{"name":"cdi-apiserver-signer-bundle","namespace":"openshift-cnv","creationTimestamp":null,"labels":{"cdi.kubevirt.io":""}}}' > creationTimestamp: "2021-04-15T20:01:28Z" > labels: > auth.openshift.io/managed-certificate-type: ca-bundle > cdi.kubevirt.io: "" > operator.cdi.kubevirt.io/createVersion: v4.8.0 > name: cdi-apiserver-signer-bundle > namespace: openshift-cnv > ownerReferences: > - apiVersion: cdi.kubevirt.io/v1beta1 > blockOwnerDeletion: true > controller: true > kind: CDI > name: cdi-kubevirt-hyperconverged > uid: 853cd265-29c4-4d5f-ae48-2a46abbbba71 > resourceVersion: "537762" > uid: 6a61a137-e680-4d81-a9d1-bed0fb5844b6 > data: > ca-bundle.crt: ""
Ah, this will do it, from HCO CR: spec: certConfig: ca: duration: 0s renewBefore: 0s server: duration: 0s renewBefore: 0s Changing to following fixed the issue: spec: certConfig: ca: duration: 24h renewBefore: 12h server: duration: 12h renewBefore: 6h
Probably related: - https://bugzilla.redhat.com/show_bug.cgi?id=1943217 - https://github.com/kubevirt/hyperconverged-cluster-operator/pull/1207
(In reply to Michael Henriksen from comment #7) > Ah, this will do it, from HCO CR: > > spec: > certConfig: > ca: > duration: 0s > renewBefore: 0s > server: > duration: 0s > renewBefore: 0s This looks exactly as a symptom of https://bugzilla.redhat.com/1943217 that is now supposed to be fixed. Denis, how are you creating the CR for HCO? what is it going to happen if you completely remove certConfig stanza? is the defaulting mechanism working?
We create the HCO resource from this YAML: > --- > kind: HyperConverged > apiVersion: hco.kubevirt.io/v1beta1 > metadata: > name: kubevirt-hyperconverged > namespace: openshift-cnv We don't define the certConfig. Do you think I need to add an empty spec (i.e: `spec: {}`) ?
(In reply to Denis Ollier from comment #10) > Do you think I need to add an empty spec (i.e: `spec: {}`) ? I'm trying to reproduce on that cluster and on my opinion the defaulting mechanism works as expected, if you simply omit the spec.certConfig stanza you will simply get the default configuration so: { "ca": { "duration": "48h", "renewBefore": "24h" }, "server": { "duration": "24h", "renewBefore": "12h" } } Try with: [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc patch hco -n openshift-cnv kubevirt-hyperconverged --type json -p '[{ "op": "replace", "path": "/spec/certConfig", "value": {"ca": {"duration": "20h", "renewBefore": "10h"}, "server": {"duration": "20h", "renewBefore": "10h"}}}]' hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc get hco -n openshift-cnv kubevirt-hyperconverged -o json | jq .spec.certConfig { "ca": { "duration": "20h", "renewBefore": "10h" }, "server": { "duration": "20h", "renewBefore": "10h" } } [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc patch hco -n openshift-cnv kubevirt-hyperconverged --type json -p '[{ "op": "remove", "path": "/spec/certConfig" }]' hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc get hco -n openshift-cnv kubevirt-hyperconverged -o json | jq .spec.certConfig { "ca": { "duration": "48h", "renewBefore": "24h" }, "server": { "duration": "24h", "renewBefore": "12h" } } [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc patch hco -n openshift-cnv kubevirt-hyperconverged --type json -p '[{ "op": "replace", "path": "/spec/certConfig", "value": {"ca": {"duration": "40h", "renewBefore": "20h"}, "server": {"duration": "40h", "renewBefore": "20h"}}}]' hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc get hco -n openshift-cnv kubevirt-hyperconverged -o json | jq .spec.certConfig { "ca": { "duration": "40h", "renewBefore": "20h" }, "server": { "duration": "40h", "renewBefore": "20h" } } [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc patch hco -n openshift-cnv kubevirt-hyperconverged --type json -p '[{ "op": "remove", "path": "/spec/certConfig" }]' hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc get hco -n openshift-cnv kubevirt-hyperconverged -o json | jq .spec.certConfig { "ca": { "duration": "48h", "renewBefore": "24h" }, "server": { "duration": "24h", "renewBefore": "12h" } } The only way to really get a set of 0 is explicitly wirting them (and maybe this deserves a bug on itself since we should refuse 0s there): [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc patch hco -n openshift-cnv kubevirt-hyperconverged --type json -p '[{ "op": "replace", "path": "/spec/certConfig", "value": {"ca": {"duration": "0h", "renewBefore": "0h"}, "server": {"duration": "0h", "renewBefore": "0h"}}}]' hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc get hco -n openshift-cnv kubevirt-hyperconverged -o json | jq .spec.certConfig { "ca": { "duration": "0h", "renewBefore": "0h" }, "server": { "duration": "0h", "renewBefore": "0h" } }
OK, reproduced. If the user completely omit the whole spec stanza we are going to get a set of zeros: [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc patch hco -n openshift-cnv kubevirt-hyperconverged --type json -p '[{ "op": "remove", "path": "/spec" }]' hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched [cnv-qe-jenkins@dollierp-cnv48-wr9px-executor ~]$ oc get hco -n openshift-cnv kubevirt-hyperconverged -o json | jq .spec.certConfig { "ca": { "duration": "0s", "renewBefore": "0s" }, "server": { "duration": "0s", "renewBefore": "0s" } } Taking the bug.
Removing TestBlocker and AutomationBlocker labels since we have a valid workaround.
Verified the bug with hco-bundle-registry-container-v4.8.0-290 (IIB == registry-proxy.engineering.redhat.com/rh-osbs/iib:70707). Verified that the after each of the following, each modification was accepted, but reconciled and the stanzas structure returned to default with the default values for each field - deleted the whole spec stanza in HCO CR. - deleted the whole certConfig stanza in HCO CR. - deleted the whole certConfig.ca stanza in HCO CR. - deleted the whole certConfig.server stanza in HCO CR. - deleted the whole sub-fields of certConfig, leaving it bare in HCO CR. - deleted the whole sub-fields of certConfig.ca, leaving it bare in HCO CR. - deleted the whole sub-fields of certConfig.server, leaving it bare in HCO CR. Here is the default structure in HCO CR: spec: certConfig: ca: duration: 48h0m0s renewBefore: 24h0m0s server: duration: 24h0m0s renewBefore: 12h0m0s + verified that KubeVirt CR got the default values for all fields in the certConfig (in KubeVirt: certificateRotateStrategy) - OK. Here is the default structure in KubeVirt CR: spec: certificateRotateStrategy: selfSigned: ca: duration: 48h0m0s renewBefore: 24h0m0s server: duration: 24h0m0s renewBefore: 12h0m0s moving to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2920