Bug 2152909

Summary:

Unable to perform node to node migration

Product:

Container Native Virtualization (CNV)

Reporter:

Prince Sarvaiya <prince.tcet>

Component:

Virtualization

Assignee:

sgott

Status:

CLOSED DUPLICATE

QA Contact:

Kedar Bidarkar <kbidarka>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

4.10.6

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2023-01-04 13:35:34 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
virshdump	none
libvirtd	none
virt-launcher pod logs	none

Description Prince Sarvaiya 2022-12-13 14:26:18 UTC

Created attachment 1932370 [details]
virshdump

Description of problem:

We want to migrate VM from Node-to-Node, but it is failing with below error.

Linux:
VirtualMachineInstance migration uid 29acdc88-b8d7-4ab2-add2-1131a6d8868a failed. reason:Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Failed to get "write" lock')

Windows:
server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: unable to execute QEMU command 'cont': Failed to get \"consistent read\" lock')"


> I am using Ceph(not RedHat Ceph) as my storage class with RWX mode.

> now, if used node selector in my YAML, VM runs on the particular node without any error.

> On initiating node to node migration from UI, it creates 1 more virt launcher pod in another node, but it goes on into completion rather than continue to stay in running. Then, migration fails with above error.

> In the second virt-launcher pod, on doing virsh list, we don't find any VMs running.

> Exisitng virt-launcher pod, we observe 

2022-12-13 13:14:22.884+0000: initiating migration
2022-12-13T13:14:25.532764Z qemu-kvm: warning: Failed to unlock byte 201
2022-12-13T13:14:25.532834Z qemu-kvm: warning: Failed to unlock byte 201


Version-Release number of selected component (if applicable):

4.10.6

How reproducible:

75%


Actual results:

Node-to-Node Live Migration should be completed without any error.


Additional info:

Attach virsh dump xml, libvirtd, virt-launcher logs

>
virsh # list
 Id   Name                                    State
------------------------------------------------------
 1    migration_migration-test-10-1-100-181   paused

virsh # resume migration_migration-test-10-1-100-181
error: Failed to resume domain 'migration_migration-test-10-1-100-181'
error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockInfo)

Comment 1 Prince Sarvaiya 2022-12-13 14:26:47 UTC

Created attachment 1932371 [details]
libvirtd

Comment 2 Prince Sarvaiya 2022-12-13 14:28:25 UTC

Created attachment 1932372 [details]
virt-launcher pod logs

Comment 3 Kedar Bidarkar 2022-12-14 13:15:12 UTC

This bug could possibly be the same as this bug https://bugzilla.redhat.com/show_bug.cgi?id=2135381

Comment 4 sgott 2022-12-14 13:19:12 UTC

Per Comment #3, this could possibly be a duplicate of the mentioned BZ. Are you using ceph in filesystem mode or block mode? If filesystem mode, we suspect it is.

Comment 5 Prince Sarvaiya 2022-12-14 17:11:14 UTC

Hi team,

Yes, i am using Ceph filesystem. Will try with ceph block storage with RWX mode and update you on the observation.

Comment 7 Prince Sarvaiya 2022-12-22 08:17:04 UTC

Hi team,

Works fine with NFS RWX mode. Thanks for your help.

Comment 8 Kedar Bidarkar 2023-01-04 13:35:34 UTC

Closing this as Duplicate as per the comment5 and comment7 as above.

*** This bug has been marked as a duplicate of bug 2135381 ***