Description of problem: hco operator is not staying in unning state in constant manner. It keep going into terminating state Version-Release number of selected component (if applicable): 2.5 IIB_IMAGE="registry-proxy.engineering.redhat.com/rh-osbs/iib:1785" How reproducible: always Steps to Reproduce: 1. deploy hco 2. 3. Additional info: oc logs hco-operator-5b46c9c99b-5dbnz -nopenshift-cnv {"level":"info","ts":1597207467.414534,"logger":"cmd","msg":"Go Version: go1.13.4"} {"level":"info","ts":1597207467.414588,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1597207467.4145932,"logger":"cmd","msg":"Version of operator-sdk: v0.17.0"} {"level":"info","ts":1597207467.4152842,"logger":"leader","msg":"Trying to become the leader."} {"level":"info","ts":1597207470.3352704,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."} {"level":"info","ts":1597207470.335304,"logger":"leader","msg":"Continuing as the leader."} {"level":"info","ts":1597207473.2450678,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"} {"level":"info","ts":1597207473.2678356,"logger":"cmd","msg":"Cluster type = openshift"} {"level":"info","ts":1597207473.2740977,"logger":"cmd","msg":"Found Pod","Pod.Namespace":"openshift-cnv","Pod.Name":"hco-operator-5b46c9c99b-5dbnz"} {"level":"error","ts":1597207473.2922847,"logger":"cmd","msg":"Failed to get HCO CSV","error":"no kind is registered for the type v1alpha1.ClusterServiceVersion in scheme \"k8s.io/client-go/kubernetes/scheme/register.go:67\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kubevirt/hyperconverged-cluster-operator/pkg/util.GetCSVfromPod\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/pkg/util/util.go:122\ngithub.com/kubevirt/hyperconverged-cluster-operator/pkg/util.(*eventEmitter).UpdateClient\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/pkg/util/event_emmiter.go:69\ngithub.com/kubevirt/hyperconverged-cluster-operator/pkg/util.(*eventEmitter).Init\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/pkg/util/event_emmiter.go:39\nmain.main\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/cmd/hyperconverged-cluster-operator/main.go:169\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:203"} {"level":"error","ts":1597207473.2924006,"logger":"cmd","msg":"Can't get CSV","error":"no kind is registered for the type v1alpha1.ClusterServiceVersion in scheme \"k8s.io/client-go/kubernetes/scheme/register.go:67\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kubevirt/hyperconverged-cluster-operator/pkg/util.(*eventEmitter).UpdateClient\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/pkg/util/event_emmiter.go:71\ngithub.com/kubevirt/hyperconverged-cluster-operator/pkg/util.(*eventEmitter).Init\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/pkg/util/event_emmiter.go:39\nmain.main\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/cmd/hyperconverged-cluster-operator/main.go:169\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:203"} {"level":"info","ts":1597207473.2924278,"logger":"cmd","msg":"Registering Components."}
I also see following log for HCO, with the same symptoms (hco is failing and looping in creating and terminating state) {"level":"error","ts":1597212472.774559,"logger":"controller-runtime.source","msg":"if kind is a CRD, it should be installed before calling Start","kind":"VMImportConfig.v2v.kubevirt.io","error":"no matches for kind \"VMImportConfig\" in version \"v2v.kubevirt.io/v1alpha1\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/sigs.k8s.io/controller-runtime/pkg/source/source.go:104\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:165\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198\nsigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startLeaderElectionRunnables.func1\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/sigs.k8s.io/controller-runtime/pkg/manager/internal.go:473"} {"level":"error","ts":1597212472.7748973,"logger":"cmd","msg":"Manager exited non-zero","error":"no matches for kind \"VMImportConfig\" in version \"v2v.kubevirt.io/v1alpha1\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/github.com/kubevirt/hyperconverged-cluster-operator/cmd/hyperconverged-cluster-operator/main.go:246\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:203"}
It's a sort of side effect of: 1867493 vm-import-operator CRD should keep also v1alpha1 version for backward compatibility. Then on HCO code we are still using v1alpha1 on vm-import-operator: https://github.com/kubevirt/hyperconverged-cluster-operator/tree/master/vendor/github.com/kubevirt/vm-import-operator because we are pinned to v0.1.0 which is the latest version available there. Piotr, can you please create an upstream pre/rc release so that we can move forward?
@Simone here is the PR -> https://github.com/kubevirt/vm-import-operator/pull/370 We are about to merge it
Upstream v0.2.0 seems still broken
I see "VMimport is not 'Available'","Request.Namespace" message. Is this related to api version bump?
(In reply to Piotr Kliczewski from comment #5) > I see "VMimport is not 'Available'","Request.Namespace" message. Is this > related to api version bump? In HCO logs I see: {"level":"info","ts":1597655572.3726707,"logger":"controller_hyperconverged","msg":"VM import exists","Request.Namespace":"kubevirt-hyperconverged","Request.Name":"kubevirt-hyperconverged","vmImport.Namespace":"","vmImport.Name":"vmimport-kubevirt-hyperconverged"} {"level":"info","ts":1597655572.37268,"logger":"controller_hyperconverged","msg":"VMimport's resource is not reporting Conditions on it's Status","Request.Namespace":"kubevirt-hyperconverged","Request.Name":"kubevirt-hyperconverged"} I'll try to reproduce it locally to better understand the root cause.
vm-import operator CR v1beta1 completely misses .status: + oc get -n kubevirt-hyperconverged VMImportConfig vmimport-kubevirt-hyperconverged -o yaml apiVersion: v2v.kubevirt.io/v1beta1 kind: VMImportConfig metadata: creationTimestamp: "2020-08-17T17:36:45Z" generation: 1 labels: app: kubevirt-hyperconverged managedFields: - apiVersion: v2v.kubevirt.io/v1beta1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:labels: .: {} f:app: {} f:ownerReferences: .: {} k:{"uid":"d4d4640a-ac09-40cc-8215-6b444cb76e8e"}: .: {} f:apiVersion: {} f:blockOwnerDeletion: {} f:controller: {} f:kind: {} f:name: {} f:uid: {} f:spec: {} f:status: {} manager: hyperconverged-cluster-operator operation: Update time: "2020-08-17T17:36:45Z" name: vmimport-kubevirt-hyperconverged ownerReferences: - apiVersion: hco.kubevirt.io/v1beta1 blockOwnerDeletion: true controller: true kind: HyperConverged name: kubevirt-hyperconverged uid: d4d4640a-ac09-40cc-8215-6b444cb76e8e resourceVersion: "35427" selfLink: /apis/v2v.kubevirt.io/v1beta1/vmimportconfigs/vmimport-kubevirt-hyperconverged uid: faf13a16-8c84-4a32-8393-540f74b71cb0 spec: {}
Yes, I see the issue as well. Investigating...
*** Bug 1867493 has been marked as a duplicate of this bug. ***
Found another issue in vm-import-operator, we need also https://github.com/kubevirt/vm-import-operator/pull/383
trying with 2.5.0-124 rh-osbs/iib:5741 Failed: no matches for kind "OperatorSource" in version "operators.coreos.com/v1" when applying: apiVersion: operators.coreos.com/v1 kind: OperatorSource metadata: name: kubevirt-hyperconverged spec: registryNamespace: rh-verified-operators publisher: Red Hat
@Simone please take a look
OperatorSource got deprecated in OCP 4.5 and probably already removed in OCP 4.6: we should directly use a CatalogSource that point to the index image. Oren is going to fix it in the kustomize template. Then please notice that the Index Image built by CVP are currently not consumable on OCP 4.6; please see https://bugzilla.redhat.com/show_bug.cgi?id=1871234#c18 for a temporary workaround.
When deploying from a Catalog image, in addition to the CatalogSource, QE's deploy_kustomize.sh was unexpectedly creating an OperatorSource from an AppRegistry although it shouldn't. Since OperatorSource API got removed with OCP 4.6.0-fc.1, I removed support to deploy from AppRegistry in QE's deploy_kustomize.sh.
For the record, I managed to deploy CNV 2.5 on an OCP *4.5* cluster using: - CNV 2.5 from registry-proxy.engineering.redhat.com/rh-osbs/iib:5809 - NMO 4.6 from registry-proxy.engineering.redhat.com/rh-osbs/iib:4255 The die/restart loop of the hco-operator is still present. Will retry on an OCP 4.6 cluster with the opm workaround.
(In reply to Denis Ollier from comment #15) > For the record, I managed to deploy CNV 2.5 on an OCP *4.5* cluster using: > > - CNV 2.5 from registry-proxy.engineering.redhat.com/rh-osbs/iib:5809 > - NMO 4.6 from registry-proxy.engineering.redhat.com/rh-osbs/iib:4255 > > The die/restart loop of the hco-operator is still present. There is also a BUG on post on OLM about a similar issue: https://bugzilla.redhat.com/show_bug.cgi?id=1868712 As for this bug, HCO should now (sooner or later) reach ready status, but in the meantime OLM will still kill and restart it more than once.
(In reply to Simone Tiraboschi from comment #16) > > As for this bug, HCO should now (sooner or later) reach ready status, > but in the meantime OLM will still kill and restart it more than once. Evan after a whole night, HCO is still dying/restarting in loop.
With the opm workaroud, I finally managed to deploy CNV 2.5 on an OCP 4.6 cluster using: - CNV 2.5 from registry-proxy.engineering.redhat.com/rh-osbs/iib:6136 (hco-bundle-registry:v2.5.0-135) - NMO 4.6 from registry-proxy.engineering.redhat.com/rh-osbs/iib:4255 I don't see the die/restart loop of the hco-operator anymore.
With registry-proxy.engineering.redhat.com/rh-osbs/iib:6196 the issue is happening again. Note that the CSV seems older with iib:6196 than with iib:6136. - OK: registry-proxy.engineering.redhat.com/rh-osbs/iib:6136 (hco-bundle-registry:v2.5.0-135, CSV createdAt: 2020-09-01 07:44:46) - Not OK: registry-proxy.engineering.redhat.com/rh-osbs/iib:6196 (hco-bundle-registry:v2.5.0-???, CSV createdAt: 2020-08-28 14:53:33)
I deployed CNV 2.5.0 on an OCP 4.6.0-fc.4 cluster using registry-proxy.engineering.redhat.com/rh-osbs/iib:7848 (hco-bundle-registry:v2.5.0-160) and the issue is still present.
Status of KubevirtMetricsAggregation CR: > kubectl get kubevirtmetricsaggregations.ssp.kubevirt.io metrics-aggregation-kubevirt-hyperconverged -o yaml > > apiVersion: ssp.kubevirt.io/v1 > kind: KubevirtMetricsAggregation > metadata: > creationTimestamp: "2020-09-08T10:33:16Z" > generation: 1 > labels: > app: kubevirt-hyperconverged > name: metrics-aggregation-kubevirt-hyperconverged > namespace: openshift-cnv > ownerReferences: > - apiVersion: hco.kubevirt.io/v1beta1 > blockOwnerDeletion: true > controller: true > kind: HyperConverged > name: kubevirt-hyperconverged > uid: 10153ff9-90a5-454d-9d33-9791357156e6 > resourceVersion: "94052" > selfLink: /apis/ssp.kubevirt.io/v1/namespaces/openshift-cnv/kubevirtmetricsaggregations/metrics-aggregation-kubevirt-hyperconverged > uid: 6fd008c3-c6dd-4550-be24-d3d88f7fc8d2 > spec: {} > status: > conditions: > - ansibleResult: > changed: 2 > completion: 2020-09-08T10:33:25.964362 > failures: 0 > ok: 4 > skipped: 0 > lastTransitionTime: "2020-09-08T10:33:16Z" > message: Awaiting next reconciliation > reason: Successful > status: "True" > type: Running > operatorVersion: v2.5.0 > targetVersion: v2.5.0
The issue is that KubevirtMetricsAggregation is reporting Running=True and never gets to Available=True so probably something is stuck on SSP operator now. Can you please attach also the logs of SSP operator?
Relevant logs: > TASK [Inject owner references for KubevirtNodeLabellerBundle] ******************************** > fatal: [localhost]: FAILED! => {"msg": "template error while templating string: no filter named 'k8s_inject_ownership'. String: {{ objects | k8s_inject_ownership(cr_info) }}"} => template error while templating string: no filter named 'k8s_inject_ownership'
Note that the node-maintenance-operator is also looping.
Created attachment 1714098 [details] SSP operator logs SSP operator logs
(In reply to Denis Ollier from comment #24) > Note that the node-maintenance-operator is also looping. Yes, this is also now expected as a side effect of https://bugzilla.redhat.com/1868712 because now (NMO >= 0.7.0) also NMO includes an OLM based admission webhook.
https://bugzilla.redhat.com/1868712 was on Modified and its bits can be consumed from OCP 4.6 nightly builds, moving this to ON_QA for further verifications.
2.5 is deployable. Verified on HCO:[v2.5.0-209] HCO image: registry.redhat.io/container-native-virtualization/hyperconverged-cluster-operator@sha256:bec6349f6f98faae85fa7ee91c49c20522d2ce955e70e2d04e75e14822f2562d CSV creation time: 2020-09-21 07:30:25
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 2.5.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5127