Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1142857

Summary: [abrt] qemu-kvm: bdrv_error_action(): qemu-kvm killed by SIGABRT
Product: Red Hat Enterprise Linux 7 Reporter: Tomas Dolezal <todoleza>
Component: qemu-kvmAssignee: Paolo Bonzini <pbonzini>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.0CC: areis, armbru, dgilbert, famz, gwatson, hhuang, huding, jherrman, jraju, juzhang, michen, mklika, mrezanin, mzheng, pbonzini, rbalakri, todoleza, uobergfe, virt-maint, xfu
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: abrt_hash:070748678ed842e5f195e7365ca2467ac9f559ab
Fixed In Version: qemu-kvm-1.5.3-93.el7 Doc Type: Bug Fix
Doc Text:
Due to incorrect implementation of portable memory barriers, the QEMU emulator in some cases terminated unexpectedly when a virtual disk was under heavy I/O load. This update fixes the implementation in order to achieve correct synchronization between QEMU's threads. As a result, the described crash no longer occurs.
Story Points: ---
Clone Of:
: 1231335 1233643 (view as bug list) Environment:
Last Closed: 2015-11-19 04:56:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1231335, 1233643    
Attachments:
Description Flags
File: backtrace
none
File: cgroup
none
File: core_backtrace
none
File: dso_list
none
File: environ
none
File: limits
none
File: maps
none
File: open_fds
none
File: proc_pid_status none

Description Tomas Dolezal 2014-09-17 14:06:59 UTC
Description of problem:
happened sometime during installing from http provided image or local iso file

Version-Release number of selected component:
qemu-kvm-1.5.3-60.el7_0.7

Additional info:
reporter:       libreport-2.1.11
backtrace_rating: 4
cmdline:        /usr/libexec/qemu-kvm -name fcrawhide -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 0c2baf9c-c940-4572-96ed-fcb0ad013d8b -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/fcrawhide.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/fcrawhide.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:bd:12:9a,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7
crash_function: bdrv_error_action
executable:     /usr/libexec/qemu-kvm
kernel:         3.10.0-123.6.3.el7.x86_64
runlevel:       N 5
type:           CCpp
uid:            107

Truncated backtrace:
Thread no. 1 (10 frames)
 #4 bdrv_error_action at block.c:3318
 #5 virtio_blk_handle_rw_error at /usr/src/debug/qemu-1.5.3/hw/block/virtio-blk.c:69
 #6 virtio_blk_rw_complete at /usr/src/debug/qemu-1.5.3/hw/block/virtio-blk.c:81
 #7 bdrv_co_em_bh at block.c:4454
 #8 aio_bh_poll at async.c:81
 #9 aio_poll at aio-posix.c:185
 #10 aio_ctx_dispatch at async.c:194
 #13 glib_pollfds_poll at main-loop.c:187
 #14 os_host_main_loop_wait at main-loop.c:232
 #15 main_loop_wait at main-loop.c:464

Comment 1 Tomas Dolezal 2014-09-17 14:07:04 UTC
Created attachment 938477 [details]
File: backtrace

Comment 2 Tomas Dolezal 2014-09-17 14:07:05 UTC
Created attachment 938478 [details]
File: cgroup

Comment 3 Tomas Dolezal 2014-09-17 14:07:10 UTC
Created attachment 938479 [details]
File: core_backtrace

Comment 4 Tomas Dolezal 2014-09-17 14:07:12 UTC
Created attachment 938480 [details]
File: dso_list

Comment 5 Tomas Dolezal 2014-09-17 14:07:13 UTC
Created attachment 938481 [details]
File: environ

Comment 6 Tomas Dolezal 2014-09-17 14:07:15 UTC
Created attachment 938482 [details]
File: limits

Comment 7 Tomas Dolezal 2014-09-17 14:07:17 UTC
Created attachment 938483 [details]
File: maps

Comment 8 Tomas Dolezal 2014-09-17 14:07:19 UTC
Created attachment 938484 [details]
File: open_fds

Comment 9 Tomas Dolezal 2014-09-17 14:07:20 UTC
Created attachment 938485 [details]
File: proc_pid_status

Comment 11 Tomas Dolezal 2014-09-18 12:38:53 UTC
for the record: I was using virt-manager from remote el7 via qemu+ssh (remote: 
virt-manager-0.10.0-20.el7.noarch)

Comment 12 Markus Armbruster 2014-10-31 09:11:49 UTC
Backtrace suggests bdrv_co_em_bh() called virtio_blk_rw_complete()
through acb->common.cb with a positive ret argument.  Passes -ret
through virtio_blk_handle_rw_error() to bdrv_error_action(), tripping
its assertion.

What's the contract for acb->common.cb?  Two possibilities come to
mind:

1. Positive ret argument means success

   virtio_blk_rw_complete() needs to be fixed not to call
   virtio_blk_handle_rw_error() then.

2. Positive ret argument must not happen

   Whatever created the argument needs to be found and fixed.

Comment 13 Markus Armbruster 2014-10-31 09:23:54 UTC
A closer look at the backtrace:

#5  0x00007f61da3fbc89 in virtio_blk_handle_rw_error (req=req@entry=0x7f61dfb97e80, error=-1641789906, is_read=true) at /usr/src/debug/qemu-1.5.3/hw/block/virtio-blk.c:69

-1641789906 is not a negative errno.  Did something scribble over the acb?

Comment 14 Markus Armbruster 2015-01-27 13:40:10 UTC
We got a core, but no straightforward reproducer.  Too late for 7.1
without a heroic effort.  The bug looks too exotic to justify that.
Punting to 7.2.

Comment 17 Markus Armbruster 2015-03-19 11:59:07 UTC
Can't say whether it's the same bug without a core to inspect at least.  

If it got triggered the same way, using this BZ to track it is best.

Comment 26 Dr. David Alan Gilbert 2015-05-29 15:45:34 UTC
FaF reports from CentOS users:
https://retrace.fedoraproject.org/faf/problems/670281/

looks like the same bug; 3 reports on versions
   
  10:1.5.3-60.el7_0.7.0.1
  10:1.5.3-86.el7_1.1
  10:1.5.3-86.el7_1.2

Comment 30 Paolo Bonzini 2015-06-03 11:07:42 UTC
More discussion is happening on the mailing list, and it looks like the compiler is reordering the stores despite the barrier.

Note that it's expected that smp_rmb() and smp_wmb() produce no assembly code on x86.  We've contacted the tools team to understand if this is a QEMU bug, a GCC bug, or both.

Comment 31 Ulrich Obergfell 2015-06-03 12:33:40 UTC
Paolo,

it seems to me that there are actually two issues:


- The compiler should not generate a series of machine instructions
  that reorder the sequence in which 'state' and 'ret' are stored in
  the ThreadPoolElement.

        req->state = THREAD_DONE;
                                // r12->state = 2 (THREAD_DONE)
   0x00007fa71f51254b <+235>:   movl   $0x2,0x38(%r12)

        req->ret = ret;
                                // r12->ret = eax (ret)
   0x00007fa71f512554 <+244>:   mov    %eax,0x3c(%r12)

   worker_thread() should store 'ret' _before_ 'state' as intended by
   the C code.

        req->ret = ret;
        /* Write ret before state.  */
        smp_wmb();
        req->state = THREAD_DONE;


- However, even if the compiler would generate the intended series of
  machine instructions, i.e.
                                // r12->ret = eax (ret)
                            [1] mov    %eax,0x3c(%r12)
                                // r12->state = 2 (THREAD_DONE)
                            [2] movl   $0x2,0x38(%r12)

  wouldn't we also need an explicit memory barrier instruction between
  [1] and [2] similar to the fix for BZ 804578 to prevent the processor
  from reordering the two stores internally ?


Regards,

Uli

Comment 32 Ulrich Obergfell 2015-06-03 15:21:00 UTC
Paolo,

I think I found the answer to my question in comment #31 myself.
Intel SDM Vol. 3 states under

 "Memory Ordering in P6 and More Recent Processor Families
  ...
  Writes to memory are not reordered with other writes, with the
  following exceptions:
  - writes executed with the CLFLUSH instruction;
  - streaming stores (writes) executed with the non-temporal move
    instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD); and
  - string operations ..."

Since the 'mov' instructions [1] and [2] in comment #31 don't fall
into one of the above exception categories, we don't need a barrier
between them. Please correct me if I'm wrong.

Regards,

Uli

Comment 34 Ademar Reis 2015-06-03 22:39:12 UTC
Reassigning to Paolo, at least for now.

Comment 41 Ademar Reis 2015-06-12 17:42:26 UTC
Ulrich, Tomas: we'll need GSSApproved in the whiteboard to get the fix to the z-stream. Can you add it somehow?

Comment 46 Miroslav Rezanina 2015-06-24 05:03:46 UTC
Fix included in qemu-kvm-1.5.3-93.el7

Comment 64 errata-xmlrpc 2015-11-19 04:56:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2213.html