Bug 1880950
Summary: | [CNV-2.5.0] Can not import VM Image due no kind "VMImportConfig" is registered | ||||||
---|---|---|---|---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | Lukas Bednar <lbednar> | ||||
Component: | Storage | Assignee: | Alexander Wels <awels> | ||||
Status: | CLOSED ERRATA | QA Contact: | Alex Kalenyuk <akalenyu> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 2.5.0 | CC: | akalenyu, awels, cnv-qe-bugs, dagur, fdeutsch, lbednar, mguetta, ncredi, ngavrilo, pkliczew, rnetser, yadu, ycui | ||||
Target Milestone: | --- | Keywords: | AutomationBlocker, Regression, TestBlocker, TestBlockerForLayeredProduct | ||||
Target Release: | 2.5.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | 2.5.0 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-11-17 13:24:24 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Lukas Bednar
2020-09-21 08:40:12 UTC
Please provide the steps to reproduce. How the env was deployed? HCO creates VMImportConfig. Without it vm-import-controller won't get deployed. I talked to Ruth about the test case it failed. It seems like the test is creating: {'apiVersion': 'cdi.kubevirt.io/v1beta1', 'kind': 'DataVolume', 'metadata': {'name': 'fedora-31', 'annotations': {'kubevirt.io/provisionOnNode': 'verify-25-2kq2l-worker-0-7tn6j'}}, 'spec': {'source': {'http': {'url': 'http://cnv-qe-server.rhevdev.lab.eng.rdu2.redhat.com/files/cnv-tests/fedora-images/Fedora-Cloud-Base-31-1.9.x86_64.qcow2'}}, 'pvc': {'accessModes': ['ReadWriteOnce'], 'resources': {'requests': {'storage': '25Gi'}}, 'storageClassName': 'hostpath-provisioner', 'volumeMode': 'Filesystem'}, 'contentType': 'kubevirt'}} and he DV fails to bind. It seems like it is CDI issue. The test do not touches vmimport at all. I only wonder why we see this: no kind "VMImportConfig" is registered. Lets figure out why the PVC is not binding. Can I get the output for 2 things: 1. oc get sc hostpath-provisioner -o yaml I want to see what the volumeBinding mode is (it should be immediate since we are using provisionOnNode, if it is WaitForFirstConsumer, we need to change that) 2. get the logs for the hostpath provisioner node pod that is running on node verify-25-2kq2l-worker-0-7tn6j that should have a clue on why it is not creating the PV. You should get be able to see which pod it is in the openshift-cnv namespace by look at the node the pod is running on in the status of the pod. $ oc get sc hostpath-provisioner -o yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: creationTimestamp: "2020-09-21T07:52:50Z" managedFields: - apiVersion: storage.k8s.io/v1 fieldsType: FieldsV1 fieldsV1: f:provisioner: {} f:reclaimPolicy: {} f:volumeBindingMode: {} manager: kubectl-create operation: Update time: "2020-09-21T07:52:50Z" name: hostpath-provisioner resourceVersion: "8807362" selfLink: /apis/storage.k8s.io/v1/storageclasses/hostpath-provisioner uid: ed47f94f-6b7a-4b97-ad5b-9bec5ed6952e provisioner: kubevirt.io/hostpath-provisioner reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer ============================================================================ hostpath provisioner pod log: I0921 16:42:49.929287 1 hostpath-provisioner.go:95] isCorrectNodeByBindingMode mode: WaitForFirstConsumer I0921 16:42:49.929363 1 hostpath-provisioner.go:110] claim included kubevirt.io/provisionOnNode annotation: verify-25-2kq2l-worker-0-7tn6j I0921 16:42:49.929370 1 hostpath-provisioner.go:112] matched kubevirt.io/provisionOnNode: verify-25-2kq2l-worker-0-7tn6j with this node: verify-25-2kq2l-worker-0-7tn6j I0921 16:42:49.929469 1 controller.go:1198] provision "supported-os-common-templates-fedora-test-fedora-os-support/fedora-31" class "hostpath-provisioner": started I0921 16:42:49.948428 1 hostpath-provisioner.go:147] creating backing directory: /var/hpvolumes/pvc-82daedf1-1104-477c-ba94-2dc35446c6bb I0921 16:42:49.949255 1 event.go:278] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"supported-os-common-templates-fedora-test-fedora-os-support", Name:"fedora-31", UID:"82daedf1-1104-477c-ba94-2dc35446c6bb", APIVersion:"v1", ResourceVersion:"9942683", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "supported-os-common-templates-fedora-test-fedora-os-support/fedora-31" I0921 16:42:49.950922 1 controller.go:1290] provision "supported-os-common-templates-fedora-test-fedora-os-support/fedora-31" class "hostpath-provisioner": volume "pvc-82daedf1-1104-477c-ba94-2dc35446c6bb" provisioned I0921 16:42:49.950963 1 controller.go:1307] provision "supported-os-common-templates-fedora-test-fedora-os-support/fedora-31" class "hostpath-provisioner": succeeded I0921 16:42:49.950984 1 volume_store.go:188] Trying to save persistentvolume "pvc-82daedf1-1104-477c-ba94-2dc35446c6bb" I0921 16:42:49.976750 1 volume_store.go:195] persistentvolume "pvc-82daedf1-1104-477c-ba94-2dc35446c6bb" saved I0921 16:42:49.976861 1 hostpath-provisioner.go:95] isCorrectNodeByBindingMode mode: WaitForFirstConsumer I0921 16:42:49.976870 1 hostpath-provisioner.go:110] claim included kubevirt.io/provisionOnNode annotation: verify-25-2kq2l-worker-0-7tn6j I0921 16:42:49.976875 1 hostpath-provisioner.go:112] matched kubevirt.io/provisionOnNode: verify-25-2kq2l-worker-0-7tn6j with this node: verify-25-2kq2l-worker-0-7tn6j I0921 16:42:49.976923 1 controller.go:1198] provision "supported-os-common-templates-fedora-test-fedora-os-support/fedora-31" class "hostpath-provisioner": started I0921 16:42:49.977442 1 event.go:278] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"supported-os-common-templates-fedora-test-fedora-os-support", Name:"fedora-31", UID:"82daedf1-1104-477c-ba94-2dc35446c6bb", APIVersion:"v1", ResourceVersion:"9942683", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-82daedf1-1104-477c-ba94-2dc35446c6bb I0921 16:42:49.980696 1 controller.go:1207] provision "supported-os-common-templates-fedora-test-fedora-os-support/fedora-31" class "hostpath-provisioner": persistentvolume "pvc-82daedf1-1104-477c-ba94-2dc35446c6bb" already exists, skipping ============================================================================ $ oc get pvc -A NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE default dv-source-fedora Pending hostpath-provisioner 5h32m default fedora-dv Pending hostpath-provisioner 5h57m supported-os-common-templates-fedora-test-fedora-os-support fedora-31 Bound pvc-82daedf1-1104-477c-ba94-2dc35446c6bb 48Gi RWO hostpath-provisioner 21s $ oc get dv -A NAMESPACE NAME PHASE PROGRESS RESTARTS AGE default dv-source-fedora Pending N/A 5h32m default fedora-dv Pending N/A 5h57m supported-os-common-templates-fedora-test-fedora-os-support fedora-31 ImportScheduled N/A 26s Okay so that is problem number 1, we have a WFFC storage class and we are using provisionOnNode. This is not a supported configuration as there is a subtle race condition that can cause intermediate failures (This didn't happen in this case, because the PVC is bound but still). Can you show me the contents of the cdi-deployment log? I suspect it looks something like this: {"level":"error","ts":1600708760.1006212,"logger":"controller","msg":"Reconciler error","controller":"import-controller","name":"hpp-test-dv","namespace":"default","error":"CDI.cdi.kubevirt.io \"cdi\" not found","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:237\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90"} If that is the case, then I think I know what the problem is. Created attachment 1715580 [details]
cdi-deployment logs
looks like this is the case here:
"error":"CDI.cdi.kubevirt.io \"cdi\" not found"
CDI part of this is fixed, a new release is pending. @Alex, Is there a way to manually add the fix to our CNV-2.5 environments? Unfortunately no. The problem is that CDI was looking for a particular named object, that is named differently D/S and the name is managed by HCO, so even if you did change it, HCO would revert whatever you changed. I just release U/S build that fixes the problem, and we are working on a D/S build now. Verified on CNV 2.5.0 HCO:[v2.5.0-270] HCO image: registry.redhat.io/container-native-virtualization/hyperconverged-cluster-operator@sha256:1d2ee6515d5b669d59dce162b9ab9ac7e523bcd052c16ef0be2bcd2614e399b2 CSV creation time: 2020-10-02 10:28:06 KubeVirt v0.34.0-rc.0-6-gad89f92 CDI v1.23.5 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 2.5.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5127 |