Description of problem: After data migration, VM can't be acccessed via `virtctl console`. VMI is in Scheduled status, not running, with error message "server error. command SyncVMI failed: "neither found block device nor regular file for volume datavolumevolume"" Can be reproduced with these conditions before and after migration: 1. SC name: kubevirt-hostpath-provisioner -> kubevirt-hostpath-provisioner 2. SC name: kubevirt-hostpath-provisioner -> hostpath-provisioner 3. VM running: True -> True 4. VM running: False -> True Version-Release number of selected component (if applicable): OCP 3.11 + CNV 1.4.1 OCP 4.3 + CNV 2.3 How reproducible: Always Steps to Reproduce: 1. Prepare two clusters [cloud-user@cnv-executor-cnv14-master-e6a2cb-1 ~]$ oc get node NAME STATUS ROLES AGE VERSION cnv-executor-cnv14-master-e6a2cb-1.example.com Ready infra,master 1d v1.11.0+d4cacc0 cnv-executor-cnv14-node-e6a2cb-1.example.com Ready compute 1d v1.11.0+d4cacc0 cnv-executor-cnv14-node-e6a2cb-2.example.com Ready compute 1d v1.11.0+d4cacc0 [cnv-qe-jenkins@cnv-executor-qwang-cnv22 ~]$ oc get node NAME STATUS ROLES AGE VERSION cnv-executor-qwang-cnv22-rhel-worker-0 Ready worker 10d v1.16.2 cnv-executor-qwang-cnv22-rhel-worker-1 Ready worker 10d v1.16.2 host-172-16-0-14 Ready master 10d v1.16.2 host-172-16-0-17 Ready master 10d v1.16.2 host-172-16-0-19 Ready worker 10d v1.16.2 host-172-16-0-23 Ready worker 10d v1.16.2 host-172-16-0-44 Ready worker 10d v1.16.2 host-172-16-0-48 Ready master 10d v1.16.2 2. Prepare VM/PVC/DV/PV on OCP 3.11 3. Check resources on OCP 3.11 4. Get repo from https://gitlab.cee.redhat.com/awels/hostpath-provisioner-upgrade 5. Export these resources [cloud-user@cnv-executor-cnv14-master-e6a2cb-1 hostpath-provisioner-upgrade-master]$ ./export cnv-executor-cnv14-node-e6a2cb-2.example.com 6. Copy hostpath-provisioner-upgrade-master directory from OCP3.11 to OCP 4.3 [cnv-qe-jenkins@cnv-executor-qwang-cnv22 ~]$ scp -r cloud-user.97.252:/home/cloud-user/hostpath-provisioner-upgrade-master/ . 7. Make modification to node name, sc name, directory name. [cnv-qe-jenkins@cnv-executor-qwang-cnv22 hostpath-provisioner-upgrade-master]$ mv cnv-executor-cnv14-node-e6a2cb-2.example.com host-172-16-0-19 [cnv-qe-jenkins@cnv-executor-qwang-cnv22 hostpath-provisioner-upgrade-master]$ $ sed -i 's/cnv-executor-cnv14-node-e6a2cb-2.example.com/host-172-16-0-19/g' host-172-16-0-19/export.json [cnv-qe-jenkins@cnv-executor-qwang-cnv22 hostpath-provisioner-upgrade-master]$ $ sed -i 's/kubevirt-hostpath-provisioner/hostpath-provisioner/g' host-172-16-0-19/export.json [cnv-qe-jenkins@cnv-executor-qwang-cnv22 hostpath-provisioner-upgrade-master]$ export KUBEVIRT_NS=openshift-cnv [cnv-qe-jenkins@cnv-executor-qwang-cnv22 hostpath-provisioner-upgrade-master]$ export CDI_NS=openshift-cnv 8. Create the same namespace with OCP3.11 [cnv-qe-jenkins@cnv-executor-qwang-cnv22 ~]$ for i in 3 4 5; do oc new-project test-migration-$i; done 9. Import resources [cnv-qe-jenkins@cnv-executor-qwang-cnv22 hostpath-provisioner-upgrade-master]$ ./import.sh host-172-16-0-19 10. Check resources on OCP 4.2 11. Access VM via `virtctl console <VM>` Actual results: 2. [cloud-user@cnv-executor-cnv14-master-e6a2cb-1 ~]$ oc get dv -n test-migration-4 NAME AGE cirros-dv-4 22h fedora-dv-4 22h rhel-dv-4 22h [cloud-user@cnv-executor-cnv14-master-e6a2cb-1 ~]$ oc get pvc -n test-migration-4 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cirros-dv-4 Bound pvc-eaccbff7-4e6f-11ea-81d2-fa163ed49b51 39Gi RWO kubevirt-hostpath-provisioner 22h fedora-dv-4 Bound pvc-eb296829-4e6f-11ea-81d2-fa163ed49b51 39Gi RWO kubevirt-hostpath-provisioner 22h rhel-dv-4 Bound pvc-06707c4b-4e70-11ea-81d2-fa163ed49b51 39Gi RWO kubevirt-hostpath-provisioner 22h [cloud-user@cnv-executor-cnv14-master-e6a2cb-1 ~]$ oc get vm -n test-migration-4 NAME AGE RUNNING VOLUME vm-cirros-dv-4 22h true vm-fedora-dv-4 22h false vm-rhel-dv-4 22h true [cloud-user@cnv-executor-cnv14-master-e6a2cb-1 ~]$ oc get vmi -n test-migration-4 NAME AGE PHASE IP NODENAME vm-cirros-dv-4 22h Running 10.130.0.23 cnv-executor-cnv14-node-e6a2cb-2.example.com vm-rhel-dv-4 22h Running 10.130.0.25/23 cnv-executor-cnv14-node-e6a2cb-2.example.com [cloud-user@cnv-executor-cnv14-master-e6a2cb-1 ~]$ oc get pod -o wide -n test-migration-4 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE virt-launcher-vm-cirros-dv-4-2x4js 1/1 Running 0 22h 10.130.0.23 cnv-executor-cnv14-node-e6a2cb-2.example.com <none> virt-launcher-vm-rhel-dv-4-kqzb7 1/1 Running 0 22h 10.130.0.25 cnv-executor-cnv14-node-e6a2cb-2.example.com <none> 9. [cnv-qe-jenkins@cnv-executor-qwang-cnv22 hostpath-provisioner-upgrade-master]$ ./import.sh host-172-16-0-19 Python 3 is installed, continuing Kubevirt namespace openshift-cnv, CDI namespace openshift-cnv Checking node: host-172-16-0-19 Found node host-172-16-0-19 Verifying input directory exists. Verifying input file exists. Current CVO requested replicas: 0 Bringing down CVO and OLM, warning this will generate cluster health alerts!!! deployment.extensions/cluster-version-operator scaled Current OLM requested replicas: 0 deployment.extensions/olm-operator scaled Verified CVO and OLM are down, bringing down kubevirt and CDI Current Kubevirt operator requested replicas: 2 deployment.extensions/virt-operator scaled Current virt controller requested replicas: 2 deployment.extensions/virt-controller scaled Current CDI operator requested replicas: 1 deployment.extensions/cdi-operator scaled Current CDI controller requested replicas: 1 deployment.extensions/cdi-deployment scaled Kubevirt and CDI are down, importing Virtual Machines Created 9 VirtualMachines Created 9 DataVolumes Created 9 PVCs Created 9 PVs deployment.extensions/cdi-deployment scaled deployment.extensions/cdi-operator scaled deployment.extensions/virt-controller scaled deployment.extensions/virt-operator scaled Kubevirt and CDI restored deployment.extensions/olm-operator scaled deployment.extensions/cluster-version-operator scaled Finished restoring cluster operations 10. Take one namespace for example [cnv-qe-jenkins@cnv-executor-qwang-cnv22 hostpath-provisioner-upgrade-master]$ oc get pvc -n test-migration-4 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cirros-dv-4 Bound pvc-eaccbff7-4e6f-11ea-81d2-fa163ed49b51 39Gi RWO hostpath-provisioner 10m fedora-dv-4 Bound pvc-eb296829-4e6f-11ea-81d2-fa163ed49b51 39Gi RWO hostpath-provisioner 10m rhel-dv-4 Bound pvc-06707c4b-4e70-11ea-81d2-fa163ed49b51 39Gi RWO hostpath-provisioner 10m [cnv-qe-jenkins@cnv-executor-qwang-cnv22 hostpath-provisioner-upgrade-master]$ oc get vm -n test-migration-4 NAME AGE RUNNING VOLUME vm-cirros-dv-4 7m50s true vm-fedora-dv-4 7m50s false vm-rhel-dv-4 7m54s true [cnv-qe-jenkins@cnv-executor-qwang-cnv22 hostpath-provisioner-upgrade-master]$ oc get vmi -n test-migration-4 NAME AGE PHASE IP NODENAME vm-cirros-dv-4 100s Scheduled 10.130.1.7 host-172-16-0-19 vm-rhel-dv-4 99s Scheduled 10.130.1.8 host-172-16-0-19 [cnv-qe-jenkins@cnv-executor-qwang-cnv22 hostpath-provisioner-upgrade-master]$ oc get pod -n test-migration-4 NAME READY STATUS RESTARTS AGE virt-launcher-vm-cirros-dv-4-bwhvv 1/1 Running 0 2m26s virt-launcher-vm-fedora-dv-4-fmr68 1/1 Running 0 29s virt-launcher-vm-rhel-dv-4-r2hlf 1/1 Running 0 2m21s 11. [cnv-qe-jenkins@cnv-executor-qwang-cnv22 hostpath-provisioner-upgrade-master]$ virtctl start vm-fedora-dv-4 -n test-migration-4 VM vm-fedora-dv-4 was scheduled to start [cnv-qe-jenkins@cnv-executor-qwang-cnv22 hostpath-provisioner-upgrade-master]$ virtctl console vm-fedora-dv-4 -n test-migration-4 Can't connect to websocket (404): VirtualMachineInstance vm-fedora-dv-4 in namespace test-migration-4 not found. 12. [cnv-qe-jenkins@cnv-executor-qwang-cnv22 hostpath-provisioner-upgrade-master]$ oc describe vmi vm-fedora-dv-4 -n test-migration-4 Name: vm-fedora-dv-4 Namespace: test-migration-4 Labels: kubevirt.io/nodeName=host-172-16-0-19 kubevirt.io/vm=vm-datavolume Annotations: kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 API Version: kubevirt.io/v1alpha3 Kind: VirtualMachineInstance Metadata: Creation Timestamp: 2020-02-14T13:30:43Z Finalizers: foregroundDeleteVirtualMachine Generate Name: vm-fedora-dv-4 Generation: 136 Owner References: API Version: kubevirt.io/v1alpha3 Block Owner Deletion: true Controller: true Kind: VirtualMachine Name: vm-fedora-dv-4 UID: cf4ca233-d6c7-4257-957d-92d0190b2330 Resource Version: 11309993 Self Link: /apis/kubevirt.io/v1alpha3/namespaces/test-migration-4/virtualmachineinstances/vm-fedora-dv-4 UID: 10f8e2e6-c261-4377-b749-74eb62780d6a Spec: Domain: Devices: Disks: Disk: Bus: virtio Name: datavolumevolume Interfaces: Bridge: Name: default Features: Acpi: Enabled: true Firmware: Uuid: cfb23348-4c44-584d-b9a7-193316d7f960 Machine: Type: q35 Resources: Requests: Cpu: 100m Memory: 512M Networks: Name: default Pod: Termination Grace Period Seconds: 0 Volumes: Data Volume: Name: fedora-dv-4 Name: datavolumevolume Status: Conditions: Last Probe Time: <nil> Last Transition Time: <nil> Message: cannot migrate VMI with non-shared PVCs Reason: DisksNotLiveMigratable Status: False Type: LiveMigratable Last Probe Time: <nil> Last Transition Time: <nil> Message: cannot migrate VMI with a bridge interface connected to a pod network Reason: InterfaceNotLiveMigratable Status: False Type: LiveMigratable Last Probe Time: <nil> Last Transition Time: 2020-02-14T13:33:38Z Message: server error. command SyncVMI failed: "neither found block device nor regular file for volume datavolumevolume" Reason: Synchronizing with the Domain failed. Status: False Type: Synchronized Guest OS Info: Interfaces: Ip Address: 10.130.1.13 Name: default Migration Method: BlockMigration Node Name: host-172-16-0-19 Phase: Scheduled Qos Class: Burstable Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 2m51s virtualmachine-controller Created virtual machine pod virt-launcher-vm-fedora-dv-4-7ddd7 Warning SyncFailed 2m19s (x25 over 2m35s) virt-handler, host-172-16-0-19 server error. command SyncVMI failed: "neither found block device nor regular file for volume datavolumevolume" Expected results: 10. VMI should be in Running status 11. VM can be accessed. Additional info:
Alexander please take a look.
Qixuan, please provide Alexander with environment details so he can look into this.
Alexander has got the environment.
Tested with the latest https://gitlab.cee.redhat.com/awels/hostpath-provisioner-upgrade. Still can't access the console, but with different reason. Check disk image on cnv1.4 node: [cloud-user@cnv-executor-cnv14-node-e6a2cb-2 hpvolumes]$ ls -ls pvc-0ca13e4b-5883-11ea-81d2-fa163ed49b51 total 28148 28148 -rw-r--r--. 1 107 107 104857600 Feb 26 05:32 disk.img [cloud-user@ocp-psi-executor hostpath-provisioner-upgrade-master]$ oc get pvc -n test-migration-3 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cirros-dv-3 Bound pvc-0ca13e4b-5883-11ea-81d2-fa163ed49b51 39Gi RWO hostpath-provisioner 21m fedora-dv-3 Bound pvc-f7df7271-5882-11ea-81d2-fa163ed49b51 39Gi RWO hostpath-provisioner 21m rhel-dv-3 Bound pvc-f81cc5a2-5882-11ea-81d2-fa163ed49b51 39Gi RWO hostpath-provisioner 21m Create a dir with the same PV name and copy disk.img to it [core@qwang-23-fbfr6-worker-hk6rd hpvolumes]$ ls -ls pvc-0ca13e4b-5883-11ea-81d2-fa163ed49b51 total 102400 102400 -rw-r--r--. 1 core core 104857600 Feb 28 11:00 disk.img Import data to cnv2.3 and start the VM: [cloud-user@ocp-psi-executor hostpath-provisioner-upgrade-master]$ oc get vm NAME AGE RUNNING VOLUME vm-cirros-dv-3 27m false vm-fedora-dv-3 27m false vm-rhel-dv-3 27m false [cloud-user@ocp-psi-executor hostpath-provisioner-upgrade-master]$ virtctl start vm-cirros-dv-3 VM vm-cirros-dv-3 was scheduled to start [cloud-user@ocp-psi-executor hostpath-provisioner-upgrade-master]$ oc get vmi NAME AGE PHASE IP NODENAME vm-cirros-dv-3 2m19s Running 10.131.0.34 qwang-23-fbfr6-worker-hk6rd [cloud-user@ocp-psi-executor hostpath-provisioner-upgrade-master]$ oc describe vmi vm-cirros-dv-3 Name: vm-cirros-dv-3 Namespace: test-migration-3 Labels: kubevirt.io/nodeName=qwang-23-fbfr6-worker-hk6rd kubevirt.io/vm=vm-datavolume Annotations: kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 API Version: kubevirt.io/v1alpha3 Kind: VirtualMachineInstance Metadata: Creation Timestamp: 2020-02-28T10:37:52Z Finalizers: foregroundDeleteVirtualMachine Generate Name: vm-cirros-dv-3 Generation: 474 Owner References: API Version: kubevirt.io/v1alpha3 Block Owner Deletion: true Controller: true Kind: VirtualMachine Name: vm-cirros-dv-3 UID: 5130d9c7-fa81-4b1e-8727-78da67f12a3d Resource Version: 3451430 Self Link: /apis/kubevirt.io/v1alpha3/namespaces/test-migration-3/virtualmachineinstances/vm-cirros-dv-3 UID: 847f353e-d414-46ea-945e-57d30040ccdf Spec: Domain: Devices: Disks: Disk: Bus: virtio Name: datavolumevolume Interfaces: Bridge: Name: default Features: Acpi: Enabled: true Firmware: Uuid: ee510f58-6805-5d71-8465-f6f1b55198ef Machine: Type: q35 Resources: Requests: Cpu: 100m Memory: 64M Networks: Name: default Pod: Termination Grace Period Seconds: 0 Volumes: Data Volume: Name: cirros-dv-3 Name: datavolumevolume Status: Conditions: Last Probe Time: <nil> Last Transition Time: <nil> Message: cannot migrate VMI with non-shared PVCs Reason: DisksNotLiveMigratable Status: False Type: LiveMigratable Last Probe Time: <nil> Last Transition Time: <nil> Message: cannot migrate VMI which does not use masquerade to connect to the pod network Reason: InterfaceNotLiveMigratable Status: False Type: LiveMigratable Last Probe Time: <nil> Last Transition Time: 2020-02-28T10:37:59Z Status: True Type: Ready Guest OS Info: Interfaces: Ip Address: 10.131.0.34 Mac: 52:54:00:df:d6:e4 Name: default Migration Method: BlockMigration Node Name: qwang-23-fbfr6-worker-hk6rd Phase: Running Qos Class: Burstable Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 2m21s virtualmachine-controller Created virtual machine pod virt-launcher-vm-cirros-dv-3-dt6wp Warning SyncFailed 2m13s virt-handler, qwang-23-fbfr6-worker-hk6rd server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-02-28T10:37:59.971943Z qemu-kvm: -device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=drive-ua-datavolumevolume,id=ua-datavolumevolume,bootindex=1,write-cache=on: Could not reopen file: Permission denied')" Normal Started 2m13s virt-handler, qwang-23-fbfr6-worker-hk6rd VirtualMachineInstance started. Normal Created 2m11s (x23 over 2m13s) virt-handler, qwang-23-fbfr6-worker-hk6rd VirtualMachineInstance defined. [cloud-user@ocp-psi-executor hostpath-provisioner-upgrade-master]$ virtctl console vm-cirros-dv-3 Successfully connected to vm-cirros-dv-3 console. The escape sequence is ^] You were disconnected from the console. This has one of the following reasons: - another user connected to the console of the target vm - network issues http: websocket: close 1006 (abnormal closure): unexpected EOF
Can you show me the yaml of the VM? The describe is complaining about live migration not working, and for hostpath provisioner storage live migration will not be an option. In particular the evictionStrategy of the VM should not be set to LiveMigrate.
It's wired that I can't reproduce it with CNV 2.2 anymore. [cnv-qe-jenkins@cnv-executor-qwang hostpath-provisioner-upgrade-master]$ virtctl console vm-cirros-dv-3 -n test-migration-3 Successfully connected to vm-cirros-dv-3 console. The escape sequence is ^] login as 'cirros' user. default password: 'gocubsgo'. use 'sudo' for root. cnv-executor-qwang-worker-0 login: [cnv-qe-jenkins@cnv-executor-qwang hostpath-provisioner-upgrade-master]$ oc describe vmi vm-cirros-dv-3 -n test-migration-3 Name: vm-cirros-dv-3 Namespace: test-migration-3 Labels: kubevirt.io/nodeName=host-172-16-0-34 kubevirt.io/vm=vm-datavolume Annotations: kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 API Version: kubevirt.io/v1alpha3 Kind: VirtualMachineInstance Metadata: Creation Timestamp: 2020-03-03T10:31:41Z Finalizers: foregroundDeleteVirtualMachine Generate Name: vm-cirros-dv-3 Generation: 8 Owner References: API Version: kubevirt.io/v1alpha3 Block Owner Deletion: true Controller: true Kind: VirtualMachine Name: vm-cirros-dv-3 UID: 87329303-0855-4d0b-892e-3685fda10a22 Resource Version: 5814580 Self Link: /apis/kubevirt.io/v1alpha3/namespaces/test-migration-3/virtualmachineinstances/vm-cirros-dv-3 UID: 5b7d79df-cec7-4214-a9bd-8cbae297cc8d Spec: Domain: Devices: Disks: Disk: Bus: virtio Name: datavolumevolume Interfaces: Bridge: Name: default Features: Acpi: Enabled: true Firmware: Uuid: ee510f58-6805-5d71-8465-f6f1b55198ef Machine: Type: q35 Resources: Requests: Cpu: 100m Memory: 64M Networks: Name: default Pod: Termination Grace Period Seconds: 0 Volumes: Data Volume: Name: cirros-dv-3 Name: datavolumevolume Status: Conditions: Last Probe Time: <nil> Last Transition Time: <nil> Message: cannot migrate VMI with non-shared PVCs Reason: DisksNotLiveMigratable Status: False Type: LiveMigratable Last Probe Time: <nil> Last Transition Time: <nil> Message: cannot migrate VMI with a bridge interface connected to a pod network Reason: InterfaceNotLiveMigratable Status: False Type: LiveMigratable Last Probe Time: <nil> Last Transition Time: 2020-03-03T10:31:54Z Status: True Type: Ready Guest OS Info: Interfaces: Ip Address: 10.129.0.212 Mac: 0a:58:0a:81:00:d4 Name: default Migration Method: BlockMigration Node Name: host-172-16-0-34 Phase: Running Qos Class: Burstable Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 3m31s virtualmachine-controller Created virtual machine pod virt-launcher-vm-cirros-dv-3-4cttm Normal Created 3m19s (x3 over 3m19s) virt-handler, host-172-16-0-34 VirtualMachineInstance defined. Normal Started 3m19s virt-handler, host-172-16-0-34 VirtualMachineInstance started.
Also can't reproduce it on CNV 2.3 IPI now. Perhaps there was something wrong when I copied PVC. I'm going to close the bug.
https://bugzilla.redhat.com/show_bug.cgi?id=1805627 addressed the problem.
*** This bug has been marked as a duplicate of bug 1805627 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days