Description of problem: KubeVirt CR seems to be in DeploymentInProgress state and not recovering oc get hco -n openshift-cnv kubevirt-hyperconverged -o=jsonpath='{range .status.conditions[*]}{.type}{"\t"}{.status}{"\t"}{.message}{"\n"}{end}' ReconcileComplete True Reconcile completed successfully Available False KubeVirt is not available: Deploying version sha256:48b123381f4aec379a24cd6bb2d641721919a7c4d95a6d42c7934a41177a0f37 with registry registry.redhat.io/container-native-virtualization Progressing True KubeVirt is progressing: Deploying version sha256:48b123381f4aec379a24cd6bb2d641721919a7c4d95a6d42c7934a41177a0f37 with registry registry.redhat.io/container-native-virtualization Degraded False Reconcile completed successfully Upgradeable False KubeVirt is progressing: Deploying version sha256:48b123381f4aec379a24cd6bb2d641721919a7c4d95a6d42c7934a41177a0f37 with registry registry.redhat.io/container-native-virtualization [kbidarka@localhost auth]$ oc get kubevirt kubevirt-kubevirt-hyperconverged -n openshift-cnv -o yaml apiVersion: kubevirt.io/v1 kind: KubeVirt metadata: annotations: kubevirt.io/latest-observed-api-version: v1 kubevirt.io/storage-observed-api-version: v1alpha3 creationTimestamp: "2022-04-27T14:49:03Z" ... name: kubevirt-kubevirt-hyperconverged namespace: openshift-cnv ownerReferences: - apiVersion: hco.kubevirt.io/v1beta1 blockOwnerDeletion: true controller: true kind: HyperConverged name: kubevirt-hyperconverged uid: f19055d3-f566-4214-b596-9ae2be777f79 resourceVersion: "1538042" uid: ab64cffc-c144-46a9-b083-1dae1e2eeddc spec: certificateRotateStrategy: selfSigned: ca: duration: 48h0m0s renewBefore: 24h0m0s server: duration: 24h0m0s renewBefore: 12h0m0s configuration: developerConfiguration: diskVerification: memoryLimit: 2G featureGates: - DataVolumes - SRIOV - CPUManager - CPUNodeDiscovery - Snapshot - HotplugVolumes - ExpandDisks - GPU - HostDevices - DownwardMetrics - NUMA - WithHostModelCPU - HypervStrictCheck - SRIOVLiveMigration - LiveMigration machineType: pc-q35-rhel8.4.0 migrations: completionTimeoutPerGiB: 800 network: migration-nad parallelMigrationsPerCluster: 5 parallelOutboundMigrationsPerNode: 2 progressTimeout: 150 network: defaultNetworkInterface: masquerade obsoleteCPUModels: "486": true Conroe: true athlon: true core2duo: true coreduo: true kvm32: true kvm64: true n270: true pentium: true pentium2: true pentium3: true pentiumpro: true phenom: true qemu32: true qemu64: true selinuxLauncherType: virt_launcher.process smbios: family: Red Hat manufacturer: Red Hat product: Container-native virtualization sku: 4.10.1 version: 4.10.1 customizeComponents: {} productComponent: compute productName: hyperconverged-cluster productVersion: 4.10.1 uninstallStrategy: BlockUninstallIfWorkloadsExist workloadUpdateStrategy: batchEvictionInterval: 1m0s batchEvictionSize: 10 workloadUpdateMethods: - LiveMigrate status: conditions: - lastProbeTime: "2022-04-28T11:36:08Z" lastTransitionTime: "2022-04-28T11:36:08Z" message: Deploying version sha256:48b123381f4aec379a24cd6bb2d641721919a7c4d95a6d42c7934a41177a0f37 with registry registry.redhat.io/container-native-virtualization reason: DeploymentInProgress status: "False" type: Available - lastProbeTime: "2022-04-28T11:36:08Z" lastTransitionTime: "2022-04-28T11:36:08Z" message: Deploying version sha256:48b123381f4aec379a24cd6bb2d641721919a7c4d95a6d42c7934a41177a0f37 with registry registry.redhat.io/container-native-virtualization reason: DeploymentInProgress status: "True" type: Progressing - lastProbeTime: "2022-04-28T11:36:08Z" lastTransitionTime: "2022-04-28T11:36:08Z" message: Deploying version sha256:48b123381f4aec379a24cd6bb2d641721919a7c4d95a6d42c7934a41177a0f37 with registry registry.redhat.io/container-native-virtualization reason: DeploymentInProgress status: "False" type: Degraded - lastProbeTime: "2022-04-28T08:23:53Z" lastTransitionTime: null message: All resources were created. reason: AllResourcesCreated status: "True" type: Created Version-Release number of selected component (if applicable): 4.10.1 How reproducible: Always Steps to Reproduce: 1. 2. 3. Actual results: DeploymentInProgress for KubeVirt CR is True. KubeVirt is not Ready. Expected results: KubeVirt is in Ready state. Additional info:
Created attachment 1875704 [details] virt_operator1
Created attachment 1875705 [details] virt_operator2
Is it possible to reproduce this? Deferring to the next release due to capacity.
Must gather is being attached. Since this is impacting smoke on 4.12, adding testblocker label.
When a node maintenance CR is created, I see kubevirt.status.conditions continuing to stay, till the cr is deleted: ============================ [cnv-qe-jenkins@c01-dbn-4012-8k679-executor ~]$ kubectl get kubevirt kubevirt-kubevirt-hyperconverged -n openshift-cnv -o json | jq ".status.conditions" [ { "lastProbeTime": "2022-06-29T21:32:36Z", "lastTransitionTime": "2022-06-29T21:32:36Z", "message": "Deploying version sha256:f9904655a1c579b7db4f55f621852795397c2a01c72d0c420c916ec8f0466024 with registry registry.redhat.io/container-native-virtualization", "reason": "DeploymentInProgress", "status": "False", "type": "Available" }, { "lastProbeTime": "2022-06-29T21:32:36Z", "lastTransitionTime": "2022-06-29T21:32:36Z", "message": "Deploying version sha256:f9904655a1c579b7db4f55f621852795397c2a01c72d0c420c916ec8f0466024 with registry registry.redhat.io/container-native-virtualization", "reason": "DeploymentInProgress", "status": "True", "type": "Progressing" }, { "lastProbeTime": "2022-06-29T21:32:36Z", "lastTransitionTime": "2022-06-29T21:32:36Z", "message": "Deploying version sha256:f9904655a1c579b7db4f55f621852795397c2a01c72d0c420c916ec8f0466024 with registry registry.redhat.io/container-native-virtualization", "reason": "DeploymentInProgress", "status": "False", "type": "Degraded" }, { "lastProbeTime": "2022-06-24T03:59:37Z", "lastTransitionTime": null, "message": "All resources were created.", "reason": "AllResourcesCreated", "status": "True", "type": "Created" } ] [cnv-qe-jenkins@c01-dbn-4012-8k679-executor ~]$ ============================
(In reply to Debarati Basu-Nag from comment #15) > When a node maintenance CR is created, I see kubevirt.status.conditions > continuing to stay, till the cr is deleted: > ============================ > [cnv-qe-jenkins@c01-dbn-4012-8k679-executor ~]$ kubectl get kubevirt > kubevirt-kubevirt-hyperconverged -n openshift-cnv -o json | jq > ".status.conditions" > [ > { > "lastProbeTime": "2022-06-29T21:32:36Z", > "lastTransitionTime": "2022-06-29T21:32:36Z", > "message": "Deploying version > sha256:f9904655a1c579b7db4f55f621852795397c2a01c72d0c420c916ec8f0466024 with > registry registry.redhat.io/container-native-virtualization", > "reason": "DeploymentInProgress", > "status": "False", > "type": "Available" > }, > { > "lastProbeTime": "2022-06-29T21:32:36Z", > "lastTransitionTime": "2022-06-29T21:32:36Z", > "message": "Deploying version > sha256:f9904655a1c579b7db4f55f621852795397c2a01c72d0c420c916ec8f0466024 with > registry registry.redhat.io/container-native-virtualization", > "reason": "DeploymentInProgress", > "status": "True", > "type": "Progressing" > }, > { > "lastProbeTime": "2022-06-29T21:32:36Z", > "lastTransitionTime": "2022-06-29T21:32:36Z", > "message": "Deploying version > sha256:f9904655a1c579b7db4f55f621852795397c2a01c72d0c420c916ec8f0466024 with > registry registry.redhat.io/container-native-virtualization", > "reason": "DeploymentInProgress", > "status": "False", > "type": "Degraded" > }, > { > "lastProbeTime": "2022-06-24T03:59:37Z", > "lastTransitionTime": null, > "message": "All resources were created.", > "reason": "AllResourcesCreated", > "status": "True", > "type": "Created" > } > ] > [cnv-qe-jenkins@c01-dbn-4012-8k679-executor ~]$ > ============================ This is expected behavior as one of our handlers will be missing. We can look at his if we can improve here but this is not related to the issue described here.
*** Bug 2099635 has been marked as a duplicate of this bug. ***
Following my observations, I can see that upon changing the Kubevirt CR, Kubevirt unconditionally updates its operands with the `kubevirt.io/generation` annotation even though the operand doesn't require re-conciliation. For example when adding the `Spec.configuration.cpuModel` field in the KV CR, this will cause change in the `kubevirt.io/generation` field of each operand. This is indeed an issue, it may trigger some status flip-flop such as that described in the bug. However the status eventually recovers and very quickly, I can see it in the reproductions. IMO this bug is not a blocker one. It may be a blocker if the status in KV CR is really stuck at deploying, but this is not the case.
Per Comment #19, the impact of this issue on clusters with OpenShift CNV deployed is that unnecessary reconciliations can occur. This was due to a conscious design decision to keep virt-operator operands with a common generation annotation. However, there doesn't appear to be an obvious reason why that is necessary. That is still being investigated. Regardless of whether there is worth in reconciling resources that otherwise did not change, this transient will occur extremely rapidly and should not even be noticed in typical deployments. Reconciliation also of course only occurs in the first place if the HCO CR was modified, so is generally expected to be infrequent. Because there is no danger of loss of data, and the cluster's ability to upgrade will not be impaired, risk of disruption to a cluster is minimal. Consequently we've removed the blocker flag and deferred this BZ to the next major release.
To clarify comment #22, "extremely rapidly" isn't clear enough. The reconciliation is usually just a few seconds. Almost always less than 5 seconds.
Verified - we didn't see the issue during automation runs
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.12.0 Images security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:0408