2047203 – segfault at 68 on disk live migrate

Bug 2047203 - segfault at 68 on disk live migrate

Summary: segfault at 68 on disk live migrate

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-node
Classification:	oVirt
Component:	Included packages
Sub Component:
Version:	4.4.10
Hardware:	All
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.5.0
Target Release:	4.5.0
Assignee:	Benny Zlotnik
QA Contact:	sshmulev
Docs Contact:
URL:
Whiteboard:
Depends On:	2002607
Blocks:
TreeView+	depends on / blocked

Reported:	2022-01-27 11:54 UTC by Tommaso
Modified:	2022-04-20 06:33 UTC (History)
CC List:	6 users (show)
Fixed In Version:	qemu-kvm-6.2.0-1
Clone Of:
Environment:
Last Closed:	2022-04-20 06:33:59 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	pm-rhel: ovirt-4.5?

Attachments	(Terms of Use)
core-dump (18.66 MB, application/gzip) 2022-01-27 11:54 UTC, Tommaso	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHV-44554	0	None	None	None	2022-01-27 12:02:37 UTC

Description Tommaso 2022-01-27 11:54:59 UTC

Created attachment 1857079 [details]
core-dump

Description of problem:


During live disk migration from two NFS storage, after some minutes the vm crash with error.
On /var/log/messages we see that errors:


Jan 27 11:22:28 host1.server.com kernel: qemu-kvm[154794]: segfault at 68 ip 000055f8bfe5b8e1 sp 00007f45a4229e90 error 6 in qemu-kvm[55f8bf836000+b4c000]
Jan 27 11:22:28 host1.server.com kernel: Code: 48 89 c6 48 8b 47 38 4c 01 c0 4c 01 c8 48 f7 f1 49 39 fc 74 d4 48 83 e8 01 49 39 c6 77 cb 48 39 de 77 c6 48 83 7f 68 00 75 bf <49> 89 7c 24 68 31 f6 48
 83 c7 50 e8 8f 82 0d 00 49 c7 44 24 68 00
Jan 27 11:22:31 host1.server.com abrt-hook-ccpp[163412]: Process 154794 (qemu-kvm) of user 107 killed by SIGSEGV - dumping core
Jan 27 11:22:41 host1.server.com vdsm[139234]: WARN executor state: count=5 workers={<Worker name=periodic/4 waiting task#=1844 at 0x7f32787cde48>, <Worker name=periodic/1 waiting task#=2071 at 0x7f
3290087eb8>, <Worker name=periodic/5 waiting task#=728 at 0x7f327863a0f0>, <Worker name=periodic/2 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f32900774a8> a
t 0x7f3290077630> timeout=7.5, duration=7.50 at 0x7f32900779e8> discarded task#=2109 at 0x7f3290087f60>, <Worker name=periodic/6 waiting task#=0 at 0x7f3291ad97b8>}
Jan 27 11:23:50 host1.server.com abrt-hook-ccpp[163543]: Can't generate core backtrace: dwfl_getthread_frames failed: No DWARF information found
Jan 27 11:23:50 host1.server.com abrt-hook-ccpp[163412]: Core backtrace generator exited with error 1
Jan 27 10:25:09 host1.server.com kernel: IO iothread1[142832]: segfault at 68 ip 000055f0949b98e1 sp 00007fa5b77c4e90 error 6 in qemu-kvm[55f094394000+b4c000]
Jan 27 10:25:09 host1.server.com kernel: Code: 48 89 c6 48 8b 47 38 4c 01 c0 4c 01 c8 48 f7 f1 49 39 fc 74 d4 48 83 e8 01 49 39 c6 77 cb 48 39 de 77 c6 48 83 7f 68 00 75 bf <49> 89 7c 24 68 31 f6 48
 83 c7 50 e8 8f 82 0d 00 49 c7 44 24 68 00
Jan 27 10:25:09 host1.server.com abrt-hook-ccpp[154174]: Process 142827 (qemu-kvm) of user 107 killed by SIGSEGV - dumping core
Jan 27 10:25:26 host1.server.com vdsm[139234]: WARN executor state: count=5 workers={<Worker name=periodic/4 waiting task#=1049 at 0x7f32787cde48>, <Worker name=periodic/1 waiting task#=1322 at 0x7f
3290087eb8>, <Worker name=periodic/5 waiting task#=0 at 0x7f327863a0f0>, <Worker name=periodic/3 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f32900774a8> at
0x7f3290077630> timeout=7.5, duration=7.50 at 0x7f3278706f98> discarded task#=1322 at 0x7f3290087748>, <Worker name=periodic/2 waiting task#=1321 at 0x7f3290087f60>}
Jan 27 10:26:08 host1.server.com abrt-hook-ccpp[154293]: Can't generate core backtrace: dwfl_getthread_frames failed: No DWARF information found
Jan 27 10:26:08 host1.server.com abrt-hook-ccpp[154174]: Core backtrace generator exited with error 1

This error and the conseguent VM reboot fal the disk migration.

This kind of isse occour often on large windows VM.



Additional info:

[root@host1 ~]# rpm -qa | grep vdsm
vdsm-http-4.40.100.2-1.el8.noarch
vdsm-api-4.40.100.2-1.el8.noarch
vdsm-network-4.40.100.2-1.el8.x86_64
vdsm-4.40.100.2-1.el8.x86_64
vdsm-python-4.40.100.2-1.el8.noarch
vdsm-yajsonrpc-4.40.100.2-1.el8.noarch
vdsm-client-4.40.100.2-1.el8.noarch
vdsm-jsonrpc-4.40.100.2-1.el8.noarch
vdsm-common-4.40.100.2-1.el8.noarch
[root@host1 ~]# rpm -qa | grep qemu-kvm
qemu-kvm-docs-6.1.0-5.module_el8.6.0+1040+0ae94936.x86_64
qemu-kvm-block-curl-6.1.0-5.module_el8.6.0+1040+0ae94936.x86_64
qemu-kvm-core-6.1.0-5.module_el8.6.0+1040+0ae94936.x86_64
qemu-kvm-ui-spice-6.1.0-5.module_el8.6.0+1040+0ae94936.x86_64
qemu-kvm-common-6.1.0-5.module_el8.6.0+1040+0ae94936.x86_64
qemu-kvm-hw-usbredir-6.1.0-5.module_el8.6.0+1040+0ae94936.x86_64
qemu-kvm-6.1.0-5.module_el8.6.0+1040+0ae94936.x86_64
qemu-kvm-ui-opengl-6.1.0-5.module_el8.6.0+1040+0ae94936.x86_64
qemu-kvm-block-rbd-6.1.0-5.module_el8.6.0+1040+0ae94936.x86_64
qemu-kvm-block-ssh-6.1.0-5.module_el8.6.0+1040+0ae94936.x86_64
qemu-kvm-block-gluster-6.1.0-5.module_el8.6.0+1040+0ae94936.x86_64
qemu-kvm-block-iscsi-6.1.0-5.module_el8.6.0+1040+0ae94936.x86_64
[root@host1 ~]# uname -a
Linux host1.server.com 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Comment 1 RHEL Program Management 2022-01-27 13:25:03 UTC

The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 2 Tommaso 2022-01-27 15:41:15 UTC

The bug seems like this one on pve-qemu-kvm : https://forum.proxmox.com/threads/proxmox-7-0-14-1-crashes-vm-during-migrate-to-other-host.99678/
I it possible to have a patch like the one mentioned in that thread and aviable here: https://git.proxmox.com/?p=pve-qemu.git;a=commit;h=edbcc10a6914c115d9d148f498b3c6c7631820f6 ?

Comment 5 sshmulev 2022-04-18 11:54:17 UTC

Verified according to tier2 and tier3 automation runs TCs related to live merge.

Versions:
engine-4.5.0-0.237.el8ev
vdsm-4.50.0.10-1.el8ev
6.2.0 - 9.module+el8.6.0+14480+c0a3aa0f

Comment 6 Sandro Bonazzola 2022-04-20 06:33:59 UTC

This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.