Bug 2135381 - Live migration of OpenShift Virtualization VMs with ODF (ceph storage) based disks is failing consistently
Summary: Live migration of OpenShift Virtualization VMs with ODF (ceph storage) based ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.10.5
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
: 4.14.0
Assignee: Jed Lejosne
QA Contact: zhe peng
URL:
Whiteboard:
: 2152909 (view as bug list)
Depends On: 2016584 2092271 2174226
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-17 13:03 UTC by pbunev@redhat.com
Modified: 2023-11-08 14:05 UTC (History)
14 users (show)

Fixed In Version: v4.14.0.rhel9-1569
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2016584
Environment:
Last Closed: 2023-11-08 14:05:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 9246 0 None open migration: match SELinux level of source pod on target pod 2023-05-08 10:32:48 UTC
Red Hat Issue Tracker CNV-21866 0 None None None 2022-11-03 06:59:16 UTC
Red Hat Product Errata RHSA-2023:6817 0 None None None 2023-11-08 14:05:28 UTC

Description pbunev@redhat.com 2022-10-17 13:03:29 UTC
Description of problem:

Live migration of OpenShift Virtualization VMs with ODF based shared disks is failing:

Version-Release number of selected component (if applicable):

OCP: 4.10.26
ODF: 4.10.6
OCP-virt: 4.10.5

How reproducible: 100%


Steps to Reproduce:
1. Create an OpenShift Virtualization VM (Fedora OS) with ODF file Storage (ocs-storagecluster-cephfs), underlying PVC of type 'Filesystem'
2. Start VM Migration 
3. Observe that a new virt-launcher pod is created, but old one never finishes and the VM is paused indefinitely.

Actual results:

VM is paused indefinetly.

Expected results:

VM gets migrated to another OCP node preserving its state 

Additional info:

Log files and screenshots will be attached to the Bugzilla.

Comment 9 Ying Cui 2022-11-03 06:57:24 UTC
This bug is reported against 4.10, not sure why the target version is set to 4.8.4, so let's re-target it in bug scrub meeting.

And from old virt launcher log:
{"component":"virt-launcher","kind":"","level":"error","msg":"Recevied a live migration error. Will check the latest migration status.","name":"fedora-cephfs","namespace":"vm-testproj","pos":"live-migration-source.go:805","reason":"error encountered during MigrateToURI3 libvirt api call: virError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Failed to get \"write\" lock')","timestamp":"2022-10-17T09:55:42.993889Z","uid":"2a34c2ae-71af-4d4e-a116-5ce9621ce88a"}

Comment 13 Kedar Bidarkar 2023-01-04 13:35:34 UTC
*** Bug 2152909 has been marked as a duplicate of this bug. ***

Comment 14 Antonio Cardace 2023-03-03 15:26:32 UTC
Deferring to 4.13.1 due to capacity.

Comment 15 Antonio Cardace 2023-03-03 16:45:01 UTC
Deferring to 4.14 due to priority.

Comment 16 zhe peng 2023-08-16 07:31:26 UTC
verify with build: CNV-v4.14.0.rhel9-1632

step:
1. create vm with ocs-storagecluster-cephfs
...
   storage:
          resources:
            requests:
              storage: 30Gi
          storageClassName: ocs-storagecluster-cephfs
...
check pvc:
...
   resources:
      requests:
        storage: "34087042032"
    storageClassName: ocs-storagecluster-cephfs
    volumeMode: Filesystem
    volumeName: pvc-19fd569a-a750-4298-99fc-81e0767ea167
...

2. start vm
$ oc get pods
NAME                            READY   STATUS    RESTARTS   AGE
virt-launcher-vm-fedora-9pgkv   1/1     Running   0          2m41s

3. do live migration

$ oc get pods
NAME                            READY   STATUS      RESTARTS   AGE
virt-launcher-vm-fedora-6tgzl   1/1     Running     0          16s
virt-launcher-vm-fedora-9pgkv   0/1     Completed   0          3m22s

$ oc get vm
NAME        AGE     STATUS    READY
vm-fedora   4m31s   Running   True

$ oc get virtualmachineinstancemigrations.kubevirt.io 
NAME                        PHASE       VMI
vm-fedora-migration-o514o   Succeeded   vm-fedora

move to verified.

Comment 19 errata-xmlrpc 2023-11-08 14:05:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817


Note You need to log in before you can comment on or make changes to this bug.