Bug 1605026

Summary: Quitting VM causes qemu core dump once the block mirror job paused for no enough target space
Product: Red Hat Enterprise Linux 7 Reporter: Gu Nini <ngu>
Component: qemu-kvm-rhevAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED ERRATA QA Contact: Gu Nini <ngu>
Severity: high Docs Contact:
Priority: medium    
Version: 7.6CC: chayang, coli, juzhang, kwolf, michen, mrezanin, mtessun, qzhang, virt-maint, xianwang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.12.0-13.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1635583 (view as bug list) Environment:
Last Closed: 2018-11-01 11:13:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1635583    
Attachments:
Description Flags
gdb_debug_info_all_threads-07202018 none

Description Gu Nini 2018-07-20 03:26:19 UTC
Description of problem:
Do block mirror with '"on-source-error": "stop"' and '"on-target-error": "stop"' attached in qmp cmd, when the mirror target met no space, the job would paused; quit the guest, then the guest would Aborted (core dumped).

Version-Release number of selected component (if applicable):
Host kernel: 3.10.0-918.el7.x86_64
Qemu-kvm-rhev: qemu-kvm-rhev-2.12.0-7.el7.x86_6

How reproducible:
100%

Steps to Reproduce:
1. Create a small lv, which will be used for the mirror target
# qemu-img create -f raw /home/disk.img 10G
# losetup /dev/loop0 /home/disk.img
# pvcreate /dev/loop0
# vgcreate vg1 /dev/loop0
# lvcreate -L 256M -n lvlv1 vg1

2. Start a guest with a data disk 'drive_image2'
3. Do block mirror for above data disk to the lv created in step1, set '"on-source-error": "stop"' and '"on-target-error": "stop"':
{ "execute": "drive-mirror", "arguments": { "device": "drive_image2", "target": "/dev/vg1/lvlv1", "mode": "absolute-paths", "format":"qcow2","sync": "full", "on-source-error": "stop", "on-target-error": "stop"}
{"timestamp": {"seconds": 1532055583, "microseconds": 991329}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "drive_image2"}}
{"timestamp": {"seconds": 1532055583, "microseconds": 991447}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "drive_image2"}}
{"return": {}}
{"timestamp": {"seconds": 1532055586, "microseconds": 253020}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive_image2", "operation": "write", "action": "stop"}}
{"timestamp": {"seconds": 1532055586, "microseconds": 253229}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive_image2", "operation": "write", "action": "stop"}}
{"timestamp": {"seconds": 1532055586, "microseconds": 253371}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive_image2", "operation": "write", "action": "stop"}}
......
{"timestamp": {"seconds": 1532055586, "microseconds": 363679}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive_image2", "operation": "write", "action": "stop"}}
{"timestamp": {"seconds": 1532055586, "microseconds": 363712}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "drive_image2"}}
{ "execute": "quit"}
{"return": {}}
{"timestamp": {"seconds": 1532055603, "microseconds": 934305}, "event": "SHUTDOWN", "data": {"guest": false}}

4. After the block job paused, try to quit the guest.


Actual results:
The guest core dump:
./vm2.sh rhel76.qcow2 
QEMU 2.12.0 monitor - type 'help' for more information
(qemu) 
(qemu)
(qemu) Formatting '/dev/vg1/lvlv1', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
qemu-kvm: blockjob.c:437: block_job_iostatus_reset: Assertion `job->job.user_paused && job->job.pause_count > 0' failed.
./vm2.sh: line 29:   931 Aborted                 (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm 


Expected results:
No core dump for the guest

Additional info:

Comment 2 Gu Nini 2018-07-20 03:32:02 UTC
Created attachment 1464849 [details]
gdb_debug_info_all_threads-07202018

Comment 3 Jeff Cody 2018-08-15 16:00:57 UTC
We are clearing the 'user_paused' flag too soon; the user_resume handler for the job should run first.

Patch submitted to qemu-devel.

Comment 6 Miroslav Rezanina 2018-09-04 14:35:18 UTC
Fix included in qemu-kvm-rhev-2.12.0-13.el7

Comment 8 Gu Nini 2018-09-05 09:27:02 UTC
Verify the bug on qemu-kvm-rhev-2.12.0-13.el7.x86_64 with the same steps in the bug description part.

Comment 10 errata-xmlrpc 2018-11-01 11:13:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3443