2152909 – Unable to perform node to node migration

Bug 2152909 - Unable to perform node to node migration

Summary: Unable to perform node to node migration

Keywords:
Status:	CLOSED DUPLICATE of bug 2135381
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	4.10.6
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	sgott
QA Contact:	Kedar Bidarkar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-12-13 14:26 UTC by Prince Sarvaiya
Modified:	2023-01-04 13:35 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-01-04 13:35:34 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
virshdump (8.57 KB, text/plain) 2022-12-13 14:26 UTC, Prince Sarvaiya	no flags	Details
libvirtd (4.46 KB, text/plain) 2022-12-13 14:26 UTC, Prince Sarvaiya	no flags	Details
virt-launcher pod logs (69.74 KB, text/plain) 2022-12-13 14:28 UTC, Prince Sarvaiya	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	CNV-23372	0	None	None	None	2022-12-13 14:27:08 UTC

Description Prince Sarvaiya 2022-12-13 14:26:18 UTC

Created attachment 1932370 [details]
virshdump

Description of problem:

We want to migrate VM from Node-to-Node, but it is failing with below error.

Linux:
VirtualMachineInstance migration uid 29acdc88-b8d7-4ab2-add2-1131a6d8868a failed. reason:Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Failed to get "write" lock')

Windows:
server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Failed to get \"consistent read\" lock')"


> I am using Ceph(not RedHat Ceph) as my storage class with RWX mode.

> now, if used node selector in my YAML, VM runs on the particular node without any error.

> On initiating node to node migration from UI, it creates 1 more virt launcher pod in another node, but it goes on into completion rather than continue to stay in running. Then, migration fails with above error.

> In the second virt-launcher pod, on doing virsh list, we don't find any VMs running.

> Exisitng virt-launcher pod, we observe 

2022-12-13 13:14:22.884+0000: initiating migration
2022-12-13T13:14:25.532764Z qemu-kvm: warning: Failed to unlock byte 201
2022-12-13T13:14:25.532834Z qemu-kvm: warning: Failed to unlock byte 201


Version-Release number of selected component (if applicable):

4.10.6

How reproducible:

75%


Actual results:

Node-to-Node Live Migration should be completed without any error.


Additional info:

Attach virsh dump xml, libvirtd, virt-launcher logs

>
virsh # list
 Id   Name                                    State
------------------------------------------------------
 1    migration_migration-test-10-1-100-181   paused

virsh # resume migration_migration-test-10-1-100-181
error: Failed to resume domain 'migration_migration-test-10-1-100-181'
error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockInfo)

Comment 1 Prince Sarvaiya 2022-12-13 14:26:47 UTC

Created attachment 1932371 [details]
libvirtd

Comment 2 Prince Sarvaiya 2022-12-13 14:28:25 UTC

Created attachment 1932372 [details]
virt-launcher pod logs

Comment 3 Kedar Bidarkar 2022-12-14 13:15:12 UTC

This bug could possibly be the same as this bug https://bugzilla.redhat.com/show_bug.cgi?id=2135381

Comment 4 sgott 2022-12-14 13:19:12 UTC

Per Comment #3, this could possibly be a duplicate of the mentioned BZ. Are you using ceph in filesystem mode or block mode? If filesystem mode, we suspect it is.

Comment 5 Prince Sarvaiya 2022-12-14 17:11:14 UTC

Hi team,

Yes, i am using Ceph filesystem. Will try with ceph block storage with RWX mode and update you on the observation.

Comment 7 Prince Sarvaiya 2022-12-22 08:17:04 UTC

Hi team,

Works fine with NFS RWX mode. Thanks for your help.

Comment 8 Kedar Bidarkar 2023-01-04 13:35:34 UTC

Closing this as Duplicate as per the comment5 and comment7 as above.

*** This bug has been marked as a duplicate of bug 2135381 ***

Note You need to log in before you can comment on or make changes to this bug.