Bug 2152909

Summary: Unable to perform node to node migration
Product: Container Native Virtualization (CNV) Reporter: Prince Sarvaiya <prince.tcet>
Component: VirtualizationAssignee: sgott
Status: CLOSED DUPLICATE QA Contact: Kedar Bidarkar <kbidarka>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.10.6   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-04 13:35:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
virshdump
none
libvirtd
none
virt-launcher pod logs none

Description Prince Sarvaiya 2022-12-13 14:26:18 UTC
Created attachment 1932370 [details]
virshdump

Description of problem:

We want to migrate VM from Node-to-Node, but it is failing with below error.

Linux:
VirtualMachineInstance migration uid 29acdc88-b8d7-4ab2-add2-1131a6d8868a failed. reason:Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Failed to get "write" lock')

Windows:
server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Failed to get \"consistent read\" lock')"


> I am using Ceph(not RedHat Ceph) as my storage class with RWX mode.

> now, if used node selector in my YAML, VM runs on the particular node without any error.

> On initiating node to node migration from UI, it creates 1 more virt launcher pod in another node, but it goes on into completion rather than continue to stay in running. Then, migration fails with above error.

> In the second virt-launcher pod, on doing virsh list, we don't find any VMs running.

> Exisitng virt-launcher pod, we observe 

2022-12-13 13:14:22.884+0000: initiating migration
2022-12-13T13:14:25.532764Z qemu-kvm: warning: Failed to unlock byte 201
2022-12-13T13:14:25.532834Z qemu-kvm: warning: Failed to unlock byte 201


Version-Release number of selected component (if applicable):

4.10.6

How reproducible:

75%


Actual results:

Node-to-Node Live Migration should be completed without any error.


Additional info:

Attach virsh dump xml, libvirtd, virt-launcher logs

>
virsh # list
 Id   Name                                    State
------------------------------------------------------
 1    migration_migration-test-10-1-100-181   paused

virsh # resume migration_migration-test-10-1-100-181
error: Failed to resume domain 'migration_migration-test-10-1-100-181'
error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockInfo)

Comment 1 Prince Sarvaiya 2022-12-13 14:26:47 UTC
Created attachment 1932371 [details]
libvirtd

Comment 2 Prince Sarvaiya 2022-12-13 14:28:25 UTC
Created attachment 1932372 [details]
virt-launcher pod logs

Comment 3 Kedar Bidarkar 2022-12-14 13:15:12 UTC
This bug could possibly be the same as this bug https://bugzilla.redhat.com/show_bug.cgi?id=2135381

Comment 4 sgott 2022-12-14 13:19:12 UTC
Per Comment #3, this could possibly be a duplicate of the mentioned BZ. Are you using ceph in filesystem mode or block mode? If filesystem mode, we suspect it is.

Comment 5 Prince Sarvaiya 2022-12-14 17:11:14 UTC
Hi team,

Yes, i am using Ceph filesystem. Will try with ceph block storage with RWX mode and update you on the observation.

Comment 7 Prince Sarvaiya 2022-12-22 08:17:04 UTC
Hi team,

Works fine with NFS RWX mode. Thanks for your help.

Comment 8 Kedar Bidarkar 2023-01-04 13:35:34 UTC
Closing this as Duplicate as per the comment5 and comment7 as above.

*** This bug has been marked as a duplicate of bug 2135381 ***