Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1375520 - qemu core dump when there is an I/O error on AHCI
qemu core dump when there is an I/O error on AHCI
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.3
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: John Snow
Xueqiang Wei
: ZStream
Depends On: 887844 953062
Blocks: 1227278 1393736
  Show dependency treegraph
 
Reported: 2016-09-13 05:57 EDT by jingzhao
Modified: 2017-08-01 23:29 EDT (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Due to asychronous I/O control blocks (AIOCBs) not being properly cleared, guests that use the Advanced Host Controller Interface (AHCI) in some cases terminated unexpectedly when an I/O error occurred. With this update, AIOCB is cleared properly, and I/O errors on guests with AHCI are resolved gracefully.
Story Points: ---
Clone Of: 887844
: 1393736 (view as bug list)
Environment:
Last Closed: 2017-08-01 19:34:44 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2392 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2017-08-01 16:04:36 EDT

  None (edit)
Comment 2 John Snow 2016-09-14 17:45:06 EDT
jingzhao, can you please do me a favor and try running this reproducer using the fix for BZ #1299876 ?

I haven't been able to reproduce yet, but by removing the obvious source of the segfault, maybe the problem will manifest differently in a way that helps us move forward with this issue.

I have a build based on qemu-kvm-rhev-2.6.0-25.el7 that includes the fixes for #1299876 that I think *might* stop the segfault here. If it does or it doesn't, it will tell us a lot about the nature of the problem that may help diagnose it better.

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11754826

-

In the meantime, I'd like to make sure I have my facts straight about the nature of this bug without the fix posted above:

(1) This is observed under qemu-kvm-rhev-2.6.0-*, most recently #24.

(2) The crash happens when using which guest? RHEL of some version?

(3) It does not appear to happen when using the loopback device, but does appear to happen when using iSCSI.

(4) It happens on both Q35 and PC machines when using the AHCI controller.

(5) When it happens, there is no opportunity to resume the VM, as it crashes before it pauses.

(6) No "STOP" event is emitted via the QMP stream.

(7) The crash appears to happen immediately after the disk becomes FULL with no further interaction from the user.

Would it be correct to say that the only difference you can observe is the different backing storage technique?


Sorry if I am being redundant, but I thank you for your patience and diligence.
--John
Comment 5 jingzhao 2016-09-21 05:21:52 EDT
(In reply to John Snow from comment #2)
> jingzhao, can you please do me a favor and try running this reproducer using
> the fix for BZ #1299876 ?
> 
> I haven't been able to reproduce yet, but by removing the obvious source of
> the segfault, maybe the problem will manifest differently in a way that
> helps us move forward with this issue.
> 
> I have a build based on qemu-kvm-rhev-2.6.0-25.el7 that includes the fixes
> for #1299876 that I think *might* stop the segfault here. If it does or it
> doesn't, it will tell us a lot about the nature of the problem that may help
> diagnose it better.
> 
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11754826

Sorry for the late

Also can reproduce on the bz and bz1299876 qemu-kvm-rhev-2.6.0-25.el7.x86_64 (https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=513032). Could you share the private build with me again because I lost it.

> 
> -
> 
> In the meantime, I'd like to make sure I have my facts straight about the
> nature of this bug without the fix posted above:
> 
> (1) This is observed under qemu-kvm-rhev-2.6.0-*, most recently #24.
> 
> (2) The crash happens when using which guest? RHEL of some version?
--Use the rhel guest.(kernel-3.10.0-481.el7.x86_64)
> 
> (3) It does not appear to happen when using the loopback device, but does
> appear to happen when using iSCSI.
--yeap
> 
> (4) It happens on both Q35 and PC machines when using the AHCI controller.
--yeap
> 
> (5) When it happens, there is no opportunity to resume the VM, as it crashes
> before it pauses.
--yeap
> 
> (6) No "STOP" event is emitted via the QMP stream.

--Actually, seems guest stop because there have stop event via QMP

{"timestamp": {"seconds": 1474449608, "microseconds": 538541}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk1", "nospace": true, "__com.redhat_reason": "enospc", "reason": "No space left on device", "operation": "write", "action": "stop"}}

> 
> (7) The crash appears to happen immediately after the disk becomes FULL with
> no further interaction from the user.
--yeap
> 
> Would it be correct to say that the only difference you can observe is the
> different backing storage technique?
---the backing storage and I didn't do the "system_reset"


Thanks
Jing Zhao
Comment 6 John Snow 2016-09-21 14:00:58 EDT
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11792798

http://download-node-02.eng.bos.redhat.com/brewroot/work/tasks/2802/11792802/

This is the fix for #1299876 applied on top of qemu-kvm-rhev-2.6.0-26.el7.

I've re-hosted the files at http://file.bos.redhat.com/jhuston/11792802/ this time so they don't disappear on us.
Comment 7 jingzhao 2016-09-22 02:18:21 EDT
(In reply to John Snow from comment #6)
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11792798
> 
> http://download-node-02.eng.bos.redhat.com/brewroot/work/tasks/2802/11792802/
> 
> This is the fix for #1299876 applied on top of qemu-kvm-rhev-2.6.0-26.el7.
> 
> I've re-hosted the files at http://file.bos.redhat.com/jhuston/11792802/
> this time so they don't disappear on us.

Hi John

1.Also reproduce the core dump issue used the test build (http://file.bos.redhat.com/jhuston/11792802/).

2.And I tried bz1299876 with the test build according to https://bugzilla.redhat.com/show_bug.cgi?id=1299876#c3, and didn't hit the core dump issue.

Thanks
Jing Zhao
Comment 8 John Snow 2016-09-22 15:52:02 EDT
Aaaaaaah ... !!

Please try http://file.bos.redhat.com/jhuston/11801934/ instead, the fix for AHCI was incomplete.

Sorry for the inconvenience.
Comment 9 jingzhao 2016-09-22 22:59:49 EDT
(In reply to John Snow from comment #8)
> Aaaaaaah ... !!
> 
> Please try http://file.bos.redhat.com/jhuston/11801934/ instead, the fix for
> AHCI was incomplete.
> 
> Sorry for the inconvenience.

Hi John

Tested it with http://file.bos.redhat.com/jhuston/11801934/

Didn't reproduce the bz

1.Boot guest with the iscsi backend
2.In guest, dd if=/dev/zero of=/dev/sda bs=1M count=8192
3.Check the status through hmp
(qemu) info status
VM status: paused (io-error)
4.In hmp
(qemu) c
(qemu) info status
VM status: running
(qemu) system_reset 
and guest had a response 

Add info:
/usr/libexec/qemu-kvm \
-M pc \
-cpu SandyBridge \
-nodefaults -rtc base=utc \
-m 4G \
-smp 2,sockets=2,cores=1,threads=1 \
-enable-kvm \
-name rhel7.3 \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 \
-k en-us \
-nodefaults \
-serial unix:/tmp/serial0,server,nowait \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-chardev file,path=/home/seabios.log,id=seabios \
-device isa-debugcon,chardev=seabios,iobase=0x402 \
-qmp tcp:0:6666,server,nowait \
-device VGA,id=video \
-vnc :2 \
-drive file=/home/bug/rhel73.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop \
-device virtio-blk-pci,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-device virtio-net-pci,netdev=tap10,mac=9a:6a:6b:6c:6d:6e -netdev tap,id=tap10 \
-device ahci,id=ahci0 \
-drive file=/mnt/test.qcow2,if=none,id=drive-virtio-disk1,format=qcow2,werror=stop,rerror=stop \
-device ide-hd,drive=drive-virtio-disk1,id=virtio-disk1,bus=ahci0.0 \
-monitor stdio \


Is it enough? and please tell me if need to do much more test.

Thanks
Jing Zhao
Comment 10 John Snow 2016-09-23 00:20:40 EDT
If the guest didn't report any IO errors and everything appears to have worked correctly, I'll submit my patches downstream and move the bug into POST.

Thanks for your patience!
Comment 12 Ademar Reis 2016-09-27 21:49:33 EDT
For reference, this is the cluster of BZ related to this issue: bug 1281713, bug 1299876, bug 1299875, bug 1361487, bug 1361490, bug 1361488, bug 1375520
Comment 17 Xueqiang Wei 2017-06-07 02:32:46 EDT
According to https://bugzilla.redhat.com/show_bug.cgi?id=887844#c11 , reproduce this bug on:
host kernel:3.10.0-496.el7.x86_64
qemu-kvm-rhev-2.6.0-22.el7.x86_64


Retest on the latest RHEL7.3.z, not hit this issue:
host kernel: 3.10.0-514.25.2.el7.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.10


After "dd" in guest:
(qemu) info status 
VM status: paused (io-error)
(qemu) system_reset
(qemu) info status 
VM status: paused (prelaunch)
(qemu) info status 
VM status: running

So verify it.
Comment 18 Xueqiang Wei 2017-06-07 03:21:17 EDT
Retest on the latest RHEL7.4, not hit this issue:
host kernel: 3.10.0-679.el7.x86_64
qemu-kvm-rhev-2.9.0-8.el7
Comment 20 errata-xmlrpc 2017-08-01 19:34:44 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 21 errata-xmlrpc 2017-08-01 21:12:22 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 22 errata-xmlrpc 2017-08-01 22:04:21 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 23 errata-xmlrpc 2017-08-01 22:45:08 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 24 errata-xmlrpc 2017-08-01 23:09:50 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 25 errata-xmlrpc 2017-08-01 23:29:59 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Note You need to log in before you can comment on or make changes to this bug.