Bug 1886349 - [v2v][VMware to CNV VM import API] VM import remain at 75%, with no clear reason.
Summary: [v2v][VMware to CNV VM import API] VM import remain at 75%, with no clear rea...
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: V2V
Version: 2.5.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Fabien Dupont
QA Contact: Ilanit Stein
Depends On:
TreeView+ depends on / blocked
Reported: 2020-10-08 09:20 UTC by Ilanit Stein
Modified: 2020-10-08 15:51 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2020-10-08 15:51:10 UTC
Target Upstream Version:

Attachments (Terms of Use)
vm-import-controller.log (3.15 MB, text/plain)
2020-10-08 09:21 UTC, Ilanit Stein
no flags Details
vm import CR yaml (2.60 KB, text/plain)
2020-10-08 09:22 UTC, Ilanit Stein
no flags Details
cdi-deployment.log (549.46 KB, text/plain)
2020-10-08 09:25 UTC, Ilanit Stein
no flags Details
vm import controller oc describe (2.79 KB, text/plain)
2020-10-08 09:45 UTC, Ilanit Stein
no flags Details

Description Ilanit Stein 2020-10-08 09:20:17 UTC
Description of problem:
VM (RHEL-8) import remain forever at 75% progress (in UI).
This is for target storage NFS/Ceph-RBD/Block.

Found an error only in the vm import controller log:
{"level":"info","ts":1602146844.2947583,"logger":"controller_virtualmachineimport","msg":"Reconciling VirtualMachineImport","Request.Namespace":"default","Request.Name":"vmware-import-1"}
{"level":"info","ts":1602146844.2958064,"logger":"controller_virtualmachineimport","msg":"No need to fetch virtual machine - skipping","Request.Namespace":"default","Request.Name":"vmware-import-1"}
{"level":"info","ts":1602146844.2960317,"logger":"controller_virtualmachineimport","msg":"VirtualMachineImport has already been validated positively. Skipping re-validation","Request.Namespace":"default","Request.Name":"vmware-import-1"}
{"level":"info","ts":1602146844.5489664,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"virtualmachineimport-controller","source":"kind source: /, Kind="}
{"level":"error","ts":1602146844.5543096,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"virtualmachineimport-controller","name":"vmware-import-1","namespace":"default","error":"Job.batch \"vmimport.v2v.kubevirt.ioqjfjl\" is invalid: spec.template.spec.containers[0].image: Required value","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:248\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:201\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}

Version-Release number of selected component (if applicable):
$ oc get   csv -n openshift-cnv kubevirt-hyperconverged-operator.v2.5.0 -oyaml | grep createdAt
    createdAt: "2020-10-06 07:31:33"

How reproducible:

Steps to Reproduce:
1. Add vddk-init-image to the v2v-vmware conigMap.

2. Add a Secret:

cat <<EOF | oc create -f -
apiVersion: v1
kind: Secret
 name: vmw-secret
type: Opaque
 vmware: |-
   # API URL of the vCenter or ESXi host
   apiUrl: "https://<VMware IP address>/sdk"
   # Username provided in the format of username@domain.
   username: administrator@vsphere.local
   password: Heslo123!
   # The certificate thumbprint of the vCenter or ESXi host, in colon-separated hexidecimal octets.
   thumbprint: 31:...:30

3. Add External mapping:

cat <<EOF | oc create -f -
apiVersion: v2v.kubevirt.io/v1beta1
kind: ResourceMapping
 name: example-vmware-resourcemappings
 namespace: default
    - source:
        name: VM Network # map network name to network attachment definition
        name: pod
      type: pod
    - source:
        id: datastore-12 
        name: nfs < Or Ceph-RBD/Block

4. VM import create:

cat <<EOF | oc create -f -
apiVersion: v2v.kubevirt.io/v1beta1
kind: VirtualMachineImport
  name: vmware-import-1
  namespace: default
    name: vmw-secret
    namespace: default # optional, if not specified, use CR's namespace
    name: example-vmware-resourcemappings
    namespace: default
  targetVmName: vmw-import
  startVm: false
        id: 42037aff-4d6f-cf89-7979-e98cbc406c0e <-- UUID of RHEL-8 VM

Expected results:
1. VM import progress should succeed.
2. a message should be displayed, explaining why VM import is not progressing. 

Additional info:
$oc get events 
35m         Normal    Provisioning             persistentvolumeclaim/vmw-import-harddisk1        External provisioner is provisioning volume for claim "default/vmw-import-harddisk1"
35m         Normal    ExternalProvisioning     persistentvolumeclaim/vmw-import-harddisk1        waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator
35m         Normal    ProvisioningSucceeded    persistentvolumeclaim/vmw-import-harddisk1        Successfully provisioned volume pvc-e3f025b5-3ce2-4e62-aa88-f721c1a4c119
35m         Normal    Bound                    datavolume/vmw-import-harddisk1                   PVC vmw-import-harddisk1 Bound
35m         Normal    ImportInProgress         datavolume/vmw-import-harddisk1                   Import into vmw-import-harddisk1 in progress
33m         Normal    ImportSucceeded          persistentvolumeclaim/vmw-import-harddisk1        Import Successful
33m         Normal    ImportSucceeded          datavolume/vmw-import-harddisk1                   Successfully imported into PVC vmw-import-harddisk1
55m         Normal    ImportScheduled          virtualmachineimport/vmware-import-1              Import of Virtual Machine default/vmw-import started
55m         Normal    ImportInProgress         virtualmachineimport/vmware-import-1              Import of Virtual Machine default/vmw-import disk vmw-import-harddisk1 in progress

Comment 1 Ilanit Stein 2020-10-08 09:21:20 UTC
Created attachment 1719911 [details]

Comment 2 Ilanit Stein 2020-10-08 09:22:46 UTC
Created attachment 1719912 [details]
vm import CR yaml

Comment 3 Ilanit Stein 2020-10-08 09:25:52 UTC
Created attachment 1719913 [details]

Comment 4 Ilanit Stein 2020-10-08 09:45:53 UTC
Created attachment 1719916 [details]
vm import controller oc describe

Comment 5 Fabien Dupont 2020-10-08 09:57:31 UTC
So, you're probably using an old build, because the VIRTV2V_IMAGE environment variable is empty on the vm-import-controller pod.
This has been fixed by:

- http://pkgs.devel.redhat.com/cgit/containers/hco-bundle-registry/commit/?h=cnv-2.5-rhel-8&id=18b89c1b8607a969db5678a5befb4342bc0aeb72
- http://pkgs.devel.redhat.com/cgit/containers/hco-bundle-registry/commit/?h=cnv-2.5-rhel-8&id=db2af60816f9b759cd8b057fd89a0db1d06c83cc
- https://github.com/kubevirt/hyperconverged-cluster-operator/pull/863

For my tests, I use the v2.5.0-299 build, which maps to the 18273 image index base.
The cnv-qe-automation repository has been updated 15 hours ago to use the v2.5.0-307 / iib:18812, so redeploying the test environment should fix this issue.

Comment 6 Ilanit Stein 2020-10-08 10:05:16 UTC
Thanks Fabien.

I have now iib:18173:

oc get catalogsource hco-catalogsource -n openshift-marketplace -oyaml | grep ersion
apiVersion: operators.coreos.com/v1alpha1
      {"apiVersion":"operators.coreos.com/v1alpha1","kind":"CatalogSource","metadata":{"annotations":{},"name":"hco-catalogsource","namespace":"openshift-marketplace"},"spec":{"displayName":"OpenShift Virtualization Index Image","image":"registry-proxy.engineering.redhat.com/rh-osbs/iib:18173","publisher":"Red Hat","sourceType":"grpc"}}
  - apiVersion: operators.coreos.com/v1alpha1
  - apiVersion: operators.coreos.com/v1alpha1
  resourceVersion: "2150420"

I'll deploy a new cluster.

Comment 7 Fabien Dupont 2020-10-08 10:25:33 UTC
No need to redeploy the cluster, you can simply update it.

Delete the HyperConverged deployment:

$ oc delete hyperconverged kubevirt-hyperconverged -n openshift-cnv

Wait for the non operator pods in openshift-cnv to be deleted.
Delete the Kubevirt subscription:

$ oc delete subscription kubevirt-hyperconverged -n openshift-cnv

Wait for all the pods in openshift-cnv to be deleted.
Delete the openshift-cnv project to be sure there's no leftover:

$ oc delete project openshift-cnv

Wait for the project to be deleted.

Mirror the IIB manifests and apply the ImageContentSourcePolicy:

$ oc adm catalog mirror "registry-proxy.engineering.redhat.com/rh-osbs/iib:18812" registry-proxy.engineering.redhat.com/rh-osbs --manifests-only --insecure
$ oc apply -f iib-manifests/imageContentSourcePolicy.yaml

Wait for the nodes to reboot. It takes up to 30 minutes.
Create the catalog source from the IIB:

$ cat << EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
  name: redhat-osbs-18812
  namespace: openshift-marketplace
  sourceType: grpc
  image: registry-proxy.engineering.redhat.com/rh-osbs/iib:18812
  displayName: Red Hat OSBS 18812
  publisher: Red Hat

Create the openshift-cnv project:

$ cat << EOF | oc apply -f -
apiVersion: project.openshift.io/v1
kind: Project
  name: openshift-cnv

Create the operator group for CNV:

$ cat << EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
  name: cnv
  namespace: openshift-cnv
    - openshift-cnv

Create the subscription for Kubevirt:

$ cat << EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
  channel: "stable"
  installPlanApproval: Automatic
  name: kubevirt-hyperconverged
  source: redhat-osbs-18812
  sourceNamespace: openshift-marketplace
  startingCSV: "kubevirt-hyperconverged-operator.v2.5.0"

Create the hyper converged deployment:

$ cat << EOF | oc apply -f -
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
spec: {}

Comment 8 Fabien Dupont 2020-10-08 15:51:10 UTC
It works with CNV >= 2.5.0-299. Closing.

Note You need to log in before you can comment on or make changes to this bug.