1330394 – sometimes vm migration failed and generates the core of the qemu process in RHEV.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1330394 - sometimes vm migration failed and generates the core of the qemu process in RHEV.

Summary: sometimes vm migration failed and generates the core of the qemu process in R...

Keywords:
Status:	CLOSED DUPLICATE of bug 1281455
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	spice
Sub Component:
Version:	7.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Default Assignee for SPICE Bugs
QA Contact:	SPICE QE bug list
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-04-26 06:55 UTC by Sachin Raje
Modified:	2019-12-16 05:42 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-05-10 07:00:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
full backtrace (11.06 KB, text/plain) 2016-04-29 07:11 UTC, Amit Shah	no flags	Details
View All

Description Sachin Raje 2016-04-26 06:55:05 UTC

Description of problem:
During the live-migrations caused by the activation of the maintenance-mode on one hypervisor RHEL 7.2, one of the windows 2003 vm crashed and it didn't restarted on any other active hypervisor.

A core file was generated on the source hypervisor.


Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.3.0-31.el7_2.10.x86_64
rhevm-3.5.8-0.1.el6ev.noarch

How reproducible: 
Occurred only once at customer side


Steps to Reproduce:
1. Put the host to maintenance so vm auto-migration will start.
2. migration failed as vm 'qemu-kvm" process got killed.
3.

Actual results:
Sometimes vm process gets killed during migration.


Expected results:
Migration should complete successfully, at least the underlying qemu-kvm process should not crash.


Additional info:

Migration failed with following traceback in vdsm logs.

Thread-96776::ERROR::2016-04-07 20:45:11,461::migration::260::vm.Vm::(run) vmId=`22e0e8cd-7260-4e23-8460-77ec0a89fb67`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 246, in run
    self._startUnderlyingMigration(time.time())
  File "/usr/share/vdsm/virt/migration.py", line 335, in _startUnderlyingMigration
    None, maxBandwidth)
  File "/usr/share/vdsm/virt/vm.py", line 709, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 119, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1825, in migrateToURI2
    if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)
libvirtError: internal error: early end of file from monitor: possible problem:
2016-04-07T18:45:08.064325Z qemu-kvm: load of migration failed: Input/output error

Comment 1 Amit Shah 2016-04-26 07:03:05 UTC

(In reply to Sachin Raje from comment #0)
> libvirtError: internal error: early end of file from monitor: possible
> problem:
> 2016-04-07T18:45:08.064325Z qemu-kvm: load of migration failed: Input/output
> error

This error message isn't too descriptive, but it usually happens when there's a device mismatch (after hotplug operations).  Were any hotplug/unplug operations performed on the VM prior to migration?  What were the qemu command lines on the src and dest machines?

Comment 5 Amit Shah 2016-04-27 13:49:48 UTC

The core dump doesn't seem to belong to the crashed VM (or, the qemu version that produced the dump is different from the one that's mentioned).

Running gdb on the crash, I don't get a proper backtrace; and in fact there are some call sites that are shown to be in TCG (i.e. non-KVM) code, so something is definitely amiss here.

Can you check the qemu version that produced this crash?

Also, this crash was on the src host, right?  So the VM was lost during migration?

Any messages that QEMU output when it crashed?  Logs from src qemu and libvirt could provide clues.

Comment 8 Amit Shah 2016-04-29 07:07:40 UTC

So one thing I see from the provided qemu versions is that the src is on 7_2.10 version, and dest is on 7.2_4.

Since the 7.2_10 binary doesn't produce a valid gdb backtrace, I gave 7_2.4 a try, and it does work.

So it looks like the src host was in fact running 7_2.4 when the crash happened.

Backtrace is:

(gdb) bt
#0  timer_del (ts=0x2020202020202020) at qemu-timer.c:401
#1  0x00007f435e0ece41 in spice_server_vm_stop (s=<optimized out>) at reds.c:4615
#2  0x00007f4364e5c234 in qemu_spice_display_stop () at ui/spice-core.c:930
#3  vm_change_state_handler (opaque=<optimized out>, running=<optimized out>, state=<optimized out>) at ui/spice-core.c:639
#4  0x00007f4364d72082 in vm_state_notify (running=running@entry=0, state=state@entry=RUN_STATE_FINISH_MIGRATE) at vl.c:1517
#5  0x00007f4364cac8b2 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE) at /usr/src/debug/qemu-2.3.0/cpus.c:603
#6  vm_stop (state=RUN_STATE_FINISH_MIGRATE) at /usr/src/debug/qemu-2.3.0/cpus.c:1297
#7  0x00007f4364cac916 in vm_stop_force_state (state=state@entry=RUN_STATE_FINISH_MIGRATE) at /usr/src/debug/qemu-2.3.0/cpus.c:1305
#8  0x00007f4364e33832 in migration_thread (opaque=0x7f4365330fa0 <current_migration.34315>) at migration/migration.c:806
#9  0x00007f43637b6dc5 in start_thread (arg=0x7f414a3fe700) at pthread_create.c:308
#10 0x00007f435d19721d in lseek64 () at ../sysdeps/unix/syscall-template.S:81
#11 0x0000000000000000 in ?? ()

This looks like it's a use-after-free in spice-server.

Re-assigning the bug to Marc-Andre for further investigation.

Can you let us know what the spice-server version is on the src?

Attaching the full backtrace, as the core file is too huge to be downloaded in reasonable time.

Comment 9 Amit Shah 2016-04-29 07:11:54 UTC

Created attachment 1152147 [details]
full backtrace

Comment 22 Victor Toso 2016-05-10 07:00:10 UTC

Closing this one as it seems to be addressed in bug #1281455 with fixes at spice-0.12.4-17.el7

*** This bug has been marked as a duplicate of bug 1281455 ***

Note You need to log in before you can comment on or make changes to this bug.