Bug 1375520
Summary: | qemu core dump when there is an I/O error on AHCI | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | jingzhao <jinzhao> | |
Component: | qemu-kvm-rhev | Assignee: | John Snow <jsnow> | |
Status: | CLOSED ERRATA | QA Contact: | Xueqiang Wei <xuwei> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 7.3 | CC: | chayang, coli, jen, jherrman, jinzhao, jsnow, juzhang, knoel, kraxel, kwolf, mrezanin, nerijus, pbonzini, virt-bugs, virt-maint, xfu | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Due to asychronous I/O control blocks (AIOCBs) not being properly cleared, guests that use the Advanced Host Controller Interface (AHCI) in some cases terminated unexpectedly when an I/O error occurred. With this update, AIOCB is cleared properly, and I/O errors on guests with AHCI are resolved gracefully.
|
Story Points: | --- | |
Clone Of: | 887844 | |||
: | 1393736 (view as bug list) | Environment: | ||
Last Closed: | 2017-08-01 23:34:44 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 887844, 953062 | |||
Bug Blocks: | 1227278, 1393736 |
Comment 2
John Snow
2016-09-14 21:45:06 UTC
(In reply to John Snow from comment #2) > jingzhao, can you please do me a favor and try running this reproducer using > the fix for BZ #1299876 ? > > I haven't been able to reproduce yet, but by removing the obvious source of > the segfault, maybe the problem will manifest differently in a way that > helps us move forward with this issue. > > I have a build based on qemu-kvm-rhev-2.6.0-25.el7 that includes the fixes > for #1299876 that I think *might* stop the segfault here. If it does or it > doesn't, it will tell us a lot about the nature of the problem that may help > diagnose it better. > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11754826 Sorry for the late Also can reproduce on the bz and bz1299876 qemu-kvm-rhev-2.6.0-25.el7.x86_64 (https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=513032). Could you share the private build with me again because I lost it. > > - > > In the meantime, I'd like to make sure I have my facts straight about the > nature of this bug without the fix posted above: > > (1) This is observed under qemu-kvm-rhev-2.6.0-*, most recently #24. > > (2) The crash happens when using which guest? RHEL of some version? --Use the rhel guest.(kernel-3.10.0-481.el7.x86_64) > > (3) It does not appear to happen when using the loopback device, but does > appear to happen when using iSCSI. --yeap > > (4) It happens on both Q35 and PC machines when using the AHCI controller. --yeap > > (5) When it happens, there is no opportunity to resume the VM, as it crashes > before it pauses. --yeap > > (6) No "STOP" event is emitted via the QMP stream. --Actually, seems guest stop because there have stop event via QMP {"timestamp": {"seconds": 1474449608, "microseconds": 538541}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk1", "nospace": true, "__com.redhat_reason": "enospc", "reason": "No space left on device", "operation": "write", "action": "stop"}} > > (7) The crash appears to happen immediately after the disk becomes FULL with > no further interaction from the user. --yeap > > Would it be correct to say that the only difference you can observe is the > different backing storage technique? ---the backing storage and I didn't do the "system_reset" Thanks Jing Zhao https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11792798 http://download-node-02.eng.bos.redhat.com/brewroot/work/tasks/2802/11792802/ This is the fix for #1299876 applied on top of qemu-kvm-rhev-2.6.0-26.el7. I've re-hosted the files at http://file.bos.redhat.com/jhuston/11792802/ this time so they don't disappear on us. (In reply to John Snow from comment #6) > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11792798 > > http://download-node-02.eng.bos.redhat.com/brewroot/work/tasks/2802/11792802/ > > This is the fix for #1299876 applied on top of qemu-kvm-rhev-2.6.0-26.el7. > > I've re-hosted the files at http://file.bos.redhat.com/jhuston/11792802/ > this time so they don't disappear on us. Hi John 1.Also reproduce the core dump issue used the test build (http://file.bos.redhat.com/jhuston/11792802/). 2.And I tried bz1299876 with the test build according to https://bugzilla.redhat.com/show_bug.cgi?id=1299876#c3, and didn't hit the core dump issue. Thanks Jing Zhao Aaaaaaah ... !! Please try http://file.bos.redhat.com/jhuston/11801934/ instead, the fix for AHCI was incomplete. Sorry for the inconvenience. (In reply to John Snow from comment #8) > Aaaaaaah ... !! > > Please try http://file.bos.redhat.com/jhuston/11801934/ instead, the fix for > AHCI was incomplete. > > Sorry for the inconvenience. Hi John Tested it with http://file.bos.redhat.com/jhuston/11801934/ Didn't reproduce the bz 1.Boot guest with the iscsi backend 2.In guest, dd if=/dev/zero of=/dev/sda bs=1M count=8192 3.Check the status through hmp (qemu) info status VM status: paused (io-error) 4.In hmp (qemu) c (qemu) info status VM status: running (qemu) system_reset and guest had a response Add info: /usr/libexec/qemu-kvm \ -M pc \ -cpu SandyBridge \ -nodefaults -rtc base=utc \ -m 4G \ -smp 2,sockets=2,cores=1,threads=1 \ -enable-kvm \ -name rhel7.3 \ -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \ -smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 \ -k en-us \ -nodefaults \ -serial unix:/tmp/serial0,server,nowait \ -boot menu=on \ -bios /usr/share/seabios/bios.bin \ -chardev file,path=/home/seabios.log,id=seabios \ -device isa-debugcon,chardev=seabios,iobase=0x402 \ -qmp tcp:0:6666,server,nowait \ -device VGA,id=video \ -vnc :2 \ -drive file=/home/bug/rhel73.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop \ -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ -device virtio-net-pci,netdev=tap10,mac=9a:6a:6b:6c:6d:6e -netdev tap,id=tap10 \ -device ahci,id=ahci0 \ -drive file=/mnt/test.qcow2,if=none,id=drive-virtio-disk1,format=qcow2,werror=stop,rerror=stop \ -device ide-hd,drive=drive-virtio-disk1,id=virtio-disk1,bus=ahci0.0 \ -monitor stdio \ Is it enough? and please tell me if need to do much more test. Thanks Jing Zhao If the guest didn't report any IO errors and everything appears to have worked correctly, I'll submit my patches downstream and move the bug into POST. Thanks for your patience! For reference, this is the cluster of BZ related to this issue: bug 1281713, bug 1299876, bug 1299875, bug 1361487, bug 1361490, bug 1361488, bug 1375520 According to https://bugzilla.redhat.com/show_bug.cgi?id=887844#c11 , reproduce this bug on: host kernel:3.10.0-496.el7.x86_64 qemu-kvm-rhev-2.6.0-22.el7.x86_64 Retest on the latest RHEL7.3.z, not hit this issue: host kernel: 3.10.0-514.25.2.el7.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.10 After "dd" in guest: (qemu) info status VM status: paused (io-error) (qemu) system_reset (qemu) info status VM status: paused (prelaunch) (qemu) info status VM status: running So verify it. Retest on the latest RHEL7.4, not hit this issue: host kernel: 3.10.0-679.el7.x86_64 qemu-kvm-rhev-2.9.0-8.el7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 |