1147282 – qemu vm guests hard shutdown unexpectedly

Bug 1147282 - qemu vm guests hard shutdown unexpectedly

Summary: qemu vm guests hard shutdown unexpectedly

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	qemu
Sub Component:
Version:	20
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Fedora Virtualization Maintainers
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1147398 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-09-28 21:28 UTC by Evan Fraser
Modified:	2015-12-08 21:13 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-12-08 21:13:47 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Evan Fraser 2014-09-28 21:28:09 UTC

Description of problem:
VM Guests occasionally hard shutdown unexpectedly with error: 

"qemu-system-x86_64: block.c:2806: bdrv_error_action: Assertion `error >= 0' failed."

In my environment it only appears to be my Windows 2008R2 guests that are affected, although I know of another environment with SLES guests that are affected.  As per email here: http://lists.gnu.org/archive/html/qemu-discuss/2014-06/msg00094.html


Version-Release number of selected component (if applicable):
  
  * qemu-system-x86-1.6.2-5.fc20.x86_64

How reproducible:

It is difficult to reproduce, it occurs roughly once a week each Guest VM for me.


Additional info:
  * I am running Openstack Icehouse on Fedora 20 (via packstack)
  * Kernels: kernel-3.14.4-200.fc20.x86_64 / kernel-3.14.8-200.fc20.x86_64
  * libvirt-1.1.3.5-2.fc20.x86_64
  * Guest OS: Windows 2008R2
  * Guest VirtIO driver version 0.1-81
  * Guest Storage is via NFS export from a Netapp FAS 6220 cluster.
  * These unexpected shutdowns do not occur for me at busy times for either the guests or the hosts.

Comment 1 Richard W.M. Jones 2014-09-29 08:38:23 UTC

*** Bug 1147398 has been marked as a duplicate of this bug. ***

Comment 2 Richard W.M. Jones 2014-09-29 08:40:15 UTC

bdrv_error_action is called from 3 places.  What is going to
help most of all here is a stack trace.  Easiest thing is
to enable core dumps and make sure the core dump is captured when
qemu fails.

Comment 3 Markus Stockhausen 2014-09-29 08:49:23 UTC

Thanks for the idea. Sounds better than mine to recompile qemu with debug messages. Can you give  a hint how to achieve it in an OVirt/libvirt environment.

Comment 4 QiangGuan 2015-02-09 02:55:54 UTC

I met this problem with qemu-1.6.1 too, while my problem is found at debian7 guests.

Comment 5 Fedora End Of Life 2015-05-29 12:59:03 UTC

This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 6 Cole Robinson 2015-05-31 19:12:00 UTC

Since F20 is EOL soon, closing this. If anyone can still reproduce with F21+, please reopen and I'll take a look

Comment 7 Matt Riedemann 2015-12-08 20:47:23 UTC

We see this in upstream openstack CI testing, viewable here:

http://logs.openstack.org/07/251407/2/check/gate-tempest-dsvm-full/144f7fc/logs/libvirt/libvirtd.txt.gz#_2015-11-30_18_20_18_168

2015-11-30 18:20:18.168+0000: 31539: error : qemuMonitorIO:656 : internal error: End of file from monitor
2015-11-30 18:20:18.168+0000: 31539: debug : qemuMonitorIO:710 : Error on monitor internal error: End of file from monitor
2015-11-30 18:20:18.168+0000: 31539: debug : qemuMonitorIO:731 : Triggering EOF callback
2015-11-30 18:20:18.168+0000: 31539: debug : qemuProcessHandleMonitorEOF:300 : Received EOF on 0x7fa310011240 'instance-00000066'
2015-11-30 18:20:18.168+0000: 31539: debug : qemuProcessHandleMonitorEOF:318 : Monitor connection to 'instance-00000066' closed without SHUTDOWN event; assuming the domain crashed
2015-11-30 18:20:18.168+0000: 31539: debug : virObjectEventNew:643 : obj=0x7fa340aab850
2015-11-30 18:20:18.168+0000: 31539: debug : qemuProcessStop:4235 : Shutting down vm=0x7fa310011240 name=instance-00000066 id=150 pid=17830 flags=0

This was the domain log:

http://logs.openstack.org/07/251407/2/check/gate-tempest-dsvm-full/144f7fc/logs/libvirt/qemu/instance-00000066.txt.gz

I noticed this:

char device redirected to /dev/pts/1 (label charserial1)
qemu-system-x86_64: /build/qemu-5LgLIn/qemu-2.0.0+dfsg/block.c:3491: bdrv_error_action: Assertion `error >= 0' failed.
2015-11-30 18:20:18.168+0000: shutting down

This is a volume-backed VM. I think around the time that this fails, we should be trying to plug a virtual interface.

Possibly also helpful:

http://logs.openstack.org/07/251407/2/check/gate-tempest-dsvm-full/144f7fc/logs/screen-n-net.txt.gz#_2015-11-30_18_19_45_252

2015-11-30 18:19:45.251 DEBUG oslo_concurrency.processutils [req-8911e8c7-2466-408f-832e-af4b78e9adec tempest-TestVolumeBootPattern-2142876884 tempest-TestVolumeBootPattern-1970238908] CMD "sudo nova-rootwrap /etc/nova/rootwrap.conf ebtables --concurrent -t nat -D PREROUTING --logical-in br100 -p ipv4 --ip-src 10.1.0.3 ! --ip-dst 10.1.0.0/20 -j redirect --redirect-target ACCEPT" returned: 255 in 0.147s execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:297

Comment 8 Matt Riedemann 2015-12-08 20:48:44 UTC

For comment 7, this is mitaka openstack.

libvirt version: 1.2.2

QEMU 2.0.0

Ubuntu 14.04 for the compute host.

Comment 9 Cole Robinson 2015-12-08 21:13:47 UTC

If you are hitting this on ubuntu, you need to file an ubuntu bug.

Note You need to log in before you can comment on or make changes to this bug.