Bug 2152909 - Unable to perform node to node migration
Summary: Unable to perform node to node migration
Keywords:
Status: CLOSED DUPLICATE of bug 2135381
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.10.6
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: sgott
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-13 14:26 UTC by Prince Sarvaiya
Modified: 2023-01-04 13:35 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-04 13:35:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
virshdump (8.57 KB, text/plain)
2022-12-13 14:26 UTC, Prince Sarvaiya
no flags Details
libvirtd (4.46 KB, text/plain)
2022-12-13 14:26 UTC, Prince Sarvaiya
no flags Details
virt-launcher pod logs (69.74 KB, text/plain)
2022-12-13 14:28 UTC, Prince Sarvaiya
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-23372 0 None None None 2022-12-13 14:27:08 UTC

Description Prince Sarvaiya 2022-12-13 14:26:18 UTC
Created attachment 1932370 [details]
virshdump

Description of problem:

We want to migrate VM from Node-to-Node, but it is failing with below error.

Linux:
VirtualMachineInstance migration uid 29acdc88-b8d7-4ab2-add2-1131a6d8868a failed. reason:Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Failed to get "write" lock')

Windows:
server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Failed to get \"consistent read\" lock')"


> I am using Ceph(not RedHat Ceph) as my storage class with RWX mode.

> now, if used node selector in my YAML, VM runs on the particular node without any error.

> On initiating node to node migration from UI, it creates 1 more virt launcher pod in another node, but it goes on into completion rather than continue to stay in running. Then, migration fails with above error.

> In the second virt-launcher pod, on doing virsh list, we don't find any VMs running.

> Exisitng virt-launcher pod, we observe 

2022-12-13 13:14:22.884+0000: initiating migration
2022-12-13T13:14:25.532764Z qemu-kvm: warning: Failed to unlock byte 201
2022-12-13T13:14:25.532834Z qemu-kvm: warning: Failed to unlock byte 201


Version-Release number of selected component (if applicable):

4.10.6

How reproducible:

75%


Actual results:

Node-to-Node Live Migration should be completed without any error.


Additional info:

Attach virsh dump xml, libvirtd, virt-launcher logs

>
virsh # list
 Id   Name                                    State
------------------------------------------------------
 1    migration_migration-test-10-1-100-181   paused

virsh # resume migration_migration-test-10-1-100-181
error: Failed to resume domain 'migration_migration-test-10-1-100-181'
error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockInfo)

Comment 1 Prince Sarvaiya 2022-12-13 14:26:47 UTC
Created attachment 1932371 [details]
libvirtd

Comment 2 Prince Sarvaiya 2022-12-13 14:28:25 UTC
Created attachment 1932372 [details]
virt-launcher pod logs

Comment 3 Kedar Bidarkar 2022-12-14 13:15:12 UTC
This bug could possibly be the same as this bug https://bugzilla.redhat.com/show_bug.cgi?id=2135381

Comment 4 sgott 2022-12-14 13:19:12 UTC
Per Comment #3, this could possibly be a duplicate of the mentioned BZ. Are you using ceph in filesystem mode or block mode? If filesystem mode, we suspect it is.

Comment 5 Prince Sarvaiya 2022-12-14 17:11:14 UTC
Hi team,

Yes, i am using Ceph filesystem. Will try with ceph block storage with RWX mode and update you on the observation.

Comment 7 Prince Sarvaiya 2022-12-22 08:17:04 UTC
Hi team,

Works fine with NFS RWX mode. Thanks for your help.

Comment 8 Kedar Bidarkar 2023-01-04 13:35:34 UTC
Closing this as Duplicate as per the comment5 and comment7 as above.

*** This bug has been marked as a duplicate of bug 2135381 ***


Note You need to log in before you can comment on or make changes to this bug.