Bug 1285928 - linux-aio aborts on io_submit() failure
linux-aio aborts on io_submit() failure
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.2
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: Kevin Wolf
CongLi
:
Depends On:
Blocks: 1401400
  Show dependency treegraph
 
Reported: 2015-11-26 20:45 EST by Han Han
Modified: 2017-08-01 23:24 EDT (History)
25 users (show)

See Also:
Fixed In Version: qemu-kvm-rhev-2.8.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-01 19:29:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs (119.18 KB, application/x-xz)
2015-11-26 20:45 EST, Han Han
no flags Details
The dmesg info (150.44 KB, text/plain)
2015-11-29 22:41 EST, Han Han
no flags Details
Backtrace and logs on latest qemu-kvm (4.97 KB, application/x-gzip)
2016-09-12 07:12 EDT, Han Han
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2392 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2017-08-01 16:04:36 EDT

  None (edit)
Description Han Han 2015-11-26 20:45:16 EST
Created attachment 1099511 [details]
logs

Description of problem:
In libvirt-lockd environment, start 2 guests pointing the same scsi device. The start of the second guest will cause the first guest abort when then first guest is booting.

Version:
libvirt-1.2.17-13.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7.x86_64
kernel-3.10.0-330.el7.x86_64

Steps to Reproduce:
1. Setup lockd environment
# vim /etc/libvirt/qemu-lockd.conf
auto_disk_leases = 1
require_lease_for_disks = 1
file_lockspace_dir = "/var/lib/libvirt/lockd/scsivolumes"
# mkdir -p /var/lib/libvirt/lockd/scsivolumes
# systemctl restart virtlockd; systemctl restart libvirtd


2. Two guests attach the same scsi device.
# vim disk.xml
 <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/sdb'/>
      <target dev='hda' bus='ide'/>
    </disk>
#  for i in n{3,4};do virsh attach-device $i disk.xml --config;done
Device attached successfully
Device attached successfully

3. Start 2 guest one by one.
# for i in n{3,4};do virsh start $i;sleep 1;done
Domain n3 started

error: Failed to start domain n4
error: resource busy: Lockspace resource '97177c7bd790b481f854131c62cd658a8adceb6d71532de0b609c064fc1d7c2a' is locked

The ABRT has detected 1 problem:
# abrt-cli list --since 1448532246
id 5bb9ab8a439e2fd1e6a0bae5334e7ef76dca6940
reason:         qemu-kvm killed by SIGABRT
time:           Thu 26 Nov 2015 06:06:16 PM CST
cmdline:        /usr/libexec/qemu-kvm -name n3 -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Penryn -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 0b895613-c1e3-4e10-ace1-d111e61be00b -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-n3/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/var/lib/libvirt/images/n3.qcow2,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/sdb,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=native -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ba:ab:e5,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-n3/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
package:        qemu-kvm-rhev-2.3.0-31.el7
uid:            107 (qemu)
Directory:      /var/spool/abrt/ccpp-2015-11-26-18:06:16-27014
Run 'abrt-cli report /var/spool/abrt/ccpp-2015-11-26-18:06:16-27014' for creating a case in Red Hat Customer Portal


Additional info:
The libvirtd.log n3.log n4.log files are in attachment.
When n3 has booted, the start of n4 don't cause n3 abort immediately.
But when you execute reboot in n3, it also aborts.
The core file back trace:
(gdb) bt
#0  0x00007f8ff512f5f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f8ff5130ce8 in __GI_abort () at abort.c:90
#2  0x00007f900072ba36 in ioq_submit (s=s@entry=0x7f9001d50800) at block/linux-aio.c:208
#3  0x00007f900072be1c in laio_submit (bs=bs@entry=0x7f90025b4000, aio_ctx=0x7f9001d50800, fd=16, sector_num=sector_num@entry=0, qiov=qiov@entry=0x7f9003770bf8,
    nb_sectors=nb_sectors@entry=1, cb=cb@entry=0x7f90006ed960 <bdrv_co_io_em_complete>, opaque=opaque@entry=0x7f9007365c30, type=type@entry=1) at block/linux-aio.c:277
#4  0x00007f9000729af6 in raw_aio_submit (bs=0x7f90025b4000, sector_num=0, qiov=0x7f9003770bf8, nb_sectors=1, cb=0x7f90006ed960 <bdrv_co_io_em_complete>,
    opaque=0x7f9007365c30, type=type@entry=1) at block/raw-posix.c:1317
#5  0x00007f9000729b40 in raw_aio_readv (bs=<optimized out>, sector_num=<optimized out>, qiov=<optimized out>, nb_sectors=<optimized out>, cb=<optimized out>,
    opaque=<optimized out>) at block/raw-posix.c:1361
#6  0x00007f90006eda16 in bdrv_co_io_em (bs=0x7f90025b4000, sector_num=0, nb_sectors=1, iov=0x7f9003770bf8, is_write=<optimized out>) at block.c:5153
#7  0x00007f90006f2a7d in bdrv_aligned_preadv (bs=bs@entry=0x7f90025b4000, req=req@entry=0x7f9007365d70, offset=offset@entry=0, bytes=bytes@entry=512,
    align=align@entry=512, qiov=qiov@entry=0x7f9003770bf8, flags=flags@entry=0) at block.c:3090
#8  0x00007f90006f2d83 in bdrv_co_do_preadv (bs=0x7f90025b4000, offset=0, bytes=512, qiov=0x7f9003770bf8, flags=(unknown: 0)) at block.c:3193
#9  0x00007f90006f2a7d in bdrv_aligned_preadv (bs=bs@entry=0x7f9001d48800, req=req@entry=0x7f9007365f00, offset=offset@entry=0, bytes=bytes@entry=512,
    align=align@entry=512, qiov=qiov@entry=0x7f9003770bf8, flags=flags@entry=0) at block.c:3090
#10 0x00007f90006f2d83 in bdrv_co_do_preadv (bs=bs@entry=0x7f9001d48800, offset=0, bytes=512, qiov=0x7f9003770bf8, flags=(unknown: 0)) at block.c:3193
#11 0x00007f90006f3e6f in bdrv_co_do_readv (flags=<optimized out>, qiov=<optimized out>, nb_sectors=<optimized out>, sector_num=<optimized out>, bs=0x7f9001d48800)
    at block.c:3215
#12 bdrv_co_do_rw (opaque=0x7f9001c57ad0) at block.c:4994
#13 0x00007f90006fdfea in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:80
#14 0x00007f8ff5141110 in ?? () from /usr/lib64/libc-2.17.so
#15 0x00007f8fea4f8f30 in ?? ()
#16 0x0000000000000000 in ?? ()

So I guess it is something wrong in qemu boot queue functions.
Comment 2 Jaroslav Suchanek 2015-11-27 04:43:15 EST
Moving to qemu-kvm-rhev for further analysis of the qemu-kvm crash.
Comment 3 Han Han 2015-11-29 21:04:01 EST
I'm sorry that the lock space in comment1 should be scsi_lockspace_dir = "/var/lib/libvirt/lockd/scsivolumes". But the bug can be reproduced,too.
Comment 4 Han Han 2015-11-29 22:38:20 EST
One more issue is related to the bug. But it is no always reproduced.
1. # vim /etc/libvirt/qemu-lockd.conf
auto_disk_leases = 1
require_lease_for_disks = 1
file_lockspace_dir = "/var/lib/libvirt/lockd/files"

2. # systemctl restart virtlockd;systemctl restart libvirtd
prepare 2 guest n1 n2
Cold plug a disk file to n1 and n2:
<disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/test.img'/>
      <backingStore/>
      <target dev='hda' bus='ide'/>
      <alias name='ide0-0-0'/>
    </disk>
start n1 n2, n2 will be start failure due to lockd

# virsh start n1;virsh start n2           
Domain n1 started

error: Failed to start domain n2
error: resource busy: Lockspace resource '06bbb3c2da705d3532bcf1066e5058c5d29ff9d154697073b3818ec1da3b6fd3' is locked

login n1, check dmesg info, if there is ABRT or error

3. # dmesg|grep ata -i
get error like this:
[    1.572718] libata version 3.00 loaded.
[    1.657973] ata_piix 0000:00:01.1: version 2.13
[    1.663004] scsi host0: ata_piix
[    1.666029] scsi host1: ata_piix
[    1.666393] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc120 irq 14
[    1.667082] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc128 irq 15
[    1.819109] ata1.01: NODEV after polling detection
[    1.819329] ata1.00: ATA-7: QEMU HARDDISK, 2.3.0, max UDMA/100
[    1.819958] ata1.00: 1024000 sectors, multi 16: LBA48 
[    1.820938] ata1.00: configured for MWDMA2
[    1.821442] scsi 0:0:0:0: Direct-Access     ATA      QEMU HARDDISK    0    PQ: 0 ANSI: 5
[    1.848895] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    1.849614] ata1.00: BMDMA stat 0x4
[    1.850010] ata1.00: failed command: READ DMA
:


[    1.853789] ata1.00: configured for MWDMA2
[    1.854272] ata1: EH complete
[    1.864919] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    1.865656] ata1.00: BMDMA stat 0x4
[    1.866039] ata1.00: failed command: READ DMA
n
[    1.868118] ata1.00: status: { DRDY ERR }
[    1.868536] ata1.00: error: { ABRT }
[    1.869656] ata1.00: configured for MWDMA2
[    1.870257] ata1: EH complete
[    1.880942] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    1.881653] ata1.00: BMDMA stat 0x4
[    1.882045] ata1.00: failed command: READ DMA

And the disk newly added cannot read and write.

Using lvm lockspace, not reproduced the problem.
Comment 5 Han Han 2015-11-29 22:41 EST
Created attachment 1100342 [details]
The dmesg info
Comment 6 Kevin Wolf 2016-01-12 06:40:55 EST
Paolo, do you remember why we abort() on io_submit() failure?  Of course, I
also can't see why it would fail in the first place (other than -EAGAIN), but
maybe abort() is a bit too much.

Han, can you please provide the output of "bt full", so that the local
variables are included in the stack trace? I'm afraid that the interesting part
will be optimised out anyway, but it's worth a try.

If this doesn't reveal the io_submit() error code because 'ret' is optimised
out, attaching strace to qemu just before it fails could help us to understand
why io_submit() fails.
Comment 7 Han Han 2016-01-13 05:40:14 EST
(gdb) bt full
#0  0x00007f5b3c9af5f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        resultvar = 0
        pid = 5430
        selftid = 5456
#1  0x00007f5b3c9b0ce8 in __GI_abort () at abort.c:90
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 0, sa_restorer = 0x0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007f5b4478ffc6 in ioq_submit (s=s@entry=0x7f5b47302000) at block/linux-aio.c:208
        ret = <optimized out>
        len = 1
        aiocb = <optimized out>
        iocbs = {0x7f5b46a33030, 0x0 <repeats 95 times>, 0x7f0, 0x7f5b3fca6048 <(anonymous namespace)::do_memalign(size_t, size_t)+40>, 0x0, 0x7f0, 0x7f5b4c8f7a90, 0x800, 
          0xa0, 0x1, 0xa, 0x7f5b3fcc6772 <tc_posix_memalign(void**, size_t, size_t)+66>, 0xa0, 0x48, 0x800, 0x7f5b42965e88 <slab_allocator_alloc_chunk+136>, 
          0x7f5b46a33000, 0x844cd79ec949b600, 0x0, 0x7f5b46a32000, 0x98, 0x7f5b46a320a0, 0xa0, 0x7f5b429b096d <g_slice_alloc+1357>, 0x0, 0x7f5b46a33000, 0x90, 
          0x7f5b46a33500, 0x0, 0x7f5b44bc4e60 <laio_aiocb_info>, 0x7f5b472f8000, 0x7f5b44751ef0 <bdrv_co_io_em_complete>, 0x7f5b4c8f7c30, 0x0}
        completed = {sqh_first = 0x0, sqh_last = 0x0}
#3  0x00007f5b447903ac in laio_submit (bs=bs@entry=0x7f5b472f8000, aio_ctx=0x7f5b47302000, fd=20, sector_num=sector_num@entry=0, qiov=qiov@entry=0x7f5b483dd7f8, 
    nb_sectors=nb_sectors@entry=1, cb=cb@entry=0x7f5b44751ef0 <bdrv_co_io_em_complete>, opaque=opaque@entry=0x7f5b4c8f7c30, type=type@entry=1) at block/linux-aio.c:277
        s = 0x7f5b47302000
        laiocb = 0x7f5b46a33000
        iocbs = 0x7f5b46a33030
        offset = 0
        __func__ = "laio_submit"
#4  0x00007f5b4478e086 in raw_aio_submit (bs=0x7f5b472f8000, sector_num=0, qiov=0x7f5b483dd7f8, nb_sectors=1, cb=0x7f5b44751ef0 <bdrv_co_io_em_complete>, 
    opaque=0x7f5b4c8f7c30, type=type@entry=1) at block/raw-posix.c:1317
        s = 0x7f5b469a6580
#5  0x00007f5b4478e0d0 in raw_aio_readv (bs=<optimized out>, sector_num=<optimized out>, qiov=<optimized out>, nb_sectors=<optimized out>, cb=<optimized out>, 
    opaque=<optimized out>) at block/raw-posix.c:1361
No locals.
#6  0x00007f5b44751fa6 in bdrv_co_io_em (bs=0x7f5b472f8000, sector_num=0, nb_sectors=1, iov=0x7f5b483dd7f8, is_write=<optimized out>) at block.c:5153
        co = {coroutine = 0x7f5b4730a880, ret = 0}
        acb = <optimized out>
#7  0x00007f5b4475700d in bdrv_aligned_preadv (bs=bs@entry=0x7f5b472f8000, req=req@entry=0x7f5b4c8f7d70, offset=offset@entry=0, bytes=bytes@entry=512, 
    align=align@entry=512, qiov=qiov@entry=0x7f5b483dd7f8, flags=flags@entry=0) at block.c:3090
        total_sectors = 83886080
        max_nb_sectors = 83886080
        drv = 0x7f5b44c0f1e0 <bdrv_host_device>
        ret = <optimized out>
        sector_num = 0
        nb_sectors = 1
        __PRETTY_FUNCTION__ = "bdrv_aligned_preadv"
#8  0x00007f5b44757313 in bdrv_co_do_preadv (bs=0x7f5b472f8000, offset=0, bytes=512, qiov=0x7f5b483dd7f8, flags=(unknown: 0)) at block.c:3193
        drv = <optimized out>
        req = {bs = 0x7f5b472f8000, offset = 0, bytes = 512, is_write = false, serialising = false, overlap_offset = 0, overlap_bytes = 512, list = {le_next = 0x0, 
            le_prev = 0x7f5b472fb290}, co = 0x7f5b4730a880, wait_queue = {entries = {tqh_first = 0x0, tqh_last = 0x7f5b4c8f7db0}}, waiting_for = 0x0}
        align = 512
        head_buf = 0x0
        tail_buf = 0x0
        local_qiov = {iov = 0x0, niov = 0, nalloc = 0, size = 0}
        use_local_qiov = false
        ret = <optimized out>
#9  0x00007f5b4475700d in bdrv_aligned_preadv (bs=bs@entry=0x7f5b46a8c800, req=req@entry=0x7f5b4c8f7f00, offset=offset@entry=0, bytes=bytes@entry=512, 
    align=align@entry=512, qiov=qiov@entry=0x7f5b483dd7f8, flags=flags@entry=0) at block.c:3090
        total_sectors = 83886080
        max_nb_sectors = 83886080
        drv = 0x7f5b44c0b540 <bdrv_raw>
        ret = <optimized out>
        sector_num = 0
        nb_sectors = 1
        __PRETTY_FUNCTION__ = "bdrv_aligned_preadv"
#10 0x00007f5b44757313 in bdrv_co_do_preadv (bs=bs@entry=0x7f5b46a8c800, offset=0, bytes=512, qiov=0x7f5b483dd7f8, flags=(unknown: 0)) at block.c:3193
        drv = <optimized out>
        req = {bs = 0x7f5b46a8c800, offset = 0, bytes = 512, is_write = false, serialising = false, overlap_offset = 0, overlap_bytes = 512, list = {le_next = 0x0, 
            le_prev = 0x7f5b46a8fa90}, co = 0x7f5b4730a880, wait_queue = {entries = {tqh_first = 0x0, tqh_last = 0x7f5b4c8f7f40}}, waiting_for = 0x0}
        align = 512
        head_buf = 0x0
        tail_buf = 0x0
        local_qiov = {iov = 0x0, niov = 0, nalloc = 0, size = 0}
        use_local_qiov = false
        ret = <optimized out>
#11 0x00007f5b447583ff in bdrv_co_do_readv (flags=<optimized out>, qiov=<optimized out>, nb_sectors=<optimized out>, sector_num=<optimized out>, bs=0x7f5b46a8c800)
    at block.c:3215
No locals.
#12 bdrv_co_do_rw (opaque=0x7f5b469dd2d0) at block.c:4994
        acb = 0x7f5b469dd2d0
        bs = 0x7f5b46a8c800
#13 0x00007f5b4476257a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:80
        self = 0x7f5b4730a880
        co = 0x7f5b4730a880
#14 0x00007f5b3c9c1110 in ?? () from /usr/lib64/libc-2.17.so
No symbol table info available.
#15 0x00007f5b3200ffb0 in ?? ()
No symbol table info available.
#16 0x0000000000000000 in ?? ()
No symbol table info available.

More info:
If there is a long time interval between the starting of two guests, the problem cannot be reproduced.
For example:
# virsh start n3; sleep 30;virsh start n4
Comment 8 Paolo Bonzini 2016-01-27 13:17:48 EST
It's just because we couldn't "see why it would fail in the first place", indeed.  We should prepare a new RPM that logs the error (e.g. with g_assert_cmpint).
Comment 9 Kevin Wolf 2016-08-05 11:41:16 EDT
Can you please try to reproduce with the following scratch build? Before it
aborts, it should print some debug information to stderr (so it should end up
in libvirt's qemu log). Please report back with this output.

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11512442
Comment 10 Han Han 2016-08-07 22:04:24 EDT
I installed the scratch version and try to reproduce it by:
# for i in n{1,2};do virsh start $i;sleep 1;done 
Domain n1 started

error: Failed to start domain n2
error: resource busy: Lockspace resource '97177c7bd790b481f854131c62cd658a8adceb6d71532de0b609c064fc1d7c2a' is locked

n1 keeps running for a while after n2 start failed and then get SIGABRT.
The log of n1:
2016-08-08 02:00:24.082+0000: starting up libvirt version: 2.0.0, package: 4.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-08-02-09:15:12, x86-034.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-19.el7.bz1285928), hostname: lab.test.me
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=n1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-n1/master-key.aes -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off,vmport=off -cpu Penryn -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 660932f2-45a5-4756-880b-bd19345dbd46 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-n1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/n1.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/sdb,format=raw,if=none,id=drive-ide0-0-0,cache=none,aio=native -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=28,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:a7:e8:75,bus=pci.0,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,tls-port=5901,addr=0.0.0.0,disable-ticketing,x509-dir=/etc/pki/libvirt-spice,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=1 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
char device redirected to /dev/pts/3 (label charserial0)
io_submit(len = 1)
aiocb[0] = read of 0x1 bytes at offset 0
aiocb[0].qiov[0] = 0x7f7f65451000 + 0x200
2016-08-08 02:00:32.670+0000: shutting down

The log of n2:
2016-08-08 02:00:25.298+0000: starting up libvirt version: 2.0.0, package: 4.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-08-02-09:15:12, x86-034.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-19.el7.bz1285928), hostname: lab.test.me
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=n2,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-4-n2/master-key.aes -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off,vmport=off -cpu Penryn -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 0df3dcb5-4fb2-4dec-b51d-926c9b07df6e -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-4-n2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/n2.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/sdb,format=raw,if=none,id=drive-ide0-0-0,cache=none,aio=native -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=29,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:36:f4:2c,bus=pci.0,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5902,tls-port=5903,addr=0.0.0.0,disable-ticketing,x509-dir=/etc/pki/libvirt-spice,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=1 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
char device redirected to /dev/pts/4 (label charserial0)
2016-08-08 02:00:25.381+0000: shutting down
2016-08-08T02:00:25.382691Z qemu-kvm: terminating on signal 15 from pid 6654
Comment 11 Kevin Wolf 2016-08-08 08:53:24 EDT
(In reply to Han Han from comment #10)
> io_submit(len = 1)
> aiocb[0] = read of 0x1 bytes at offset 0
> aiocb[0].qiov[0] = 0x7f7f65451000 + 0x200
> 2016-08-08 02:00:32.670+0000: shutting down

Thanks, this is the information we're looking for. Unfortunately, I messed it
up and forgot to log the most important part (the error code). Sorry about
that. Can you try again with a new scratch build and post those lines?

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11517005
Comment 12 Han Han 2016-08-08 21:31:50 EDT
# cat /var/log/libvirt/qemu/n1.log
2016-08-09 01:29:47.114+0000: starting up libvirt version: 2.0.0, package: 4.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-08-02-09:15:12, x86-034.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-19.el7.bz1285928_v2), hostname: lab.test.me
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=n1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-18-n1/master-key.aes -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off,vmport=off -cpu Penryn -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 660932f2-45a5-4756-880b-bd19345dbd46 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-18-n1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/n1.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/sdb,format=raw,if=none,id=drive-ide0-0-0,cache=none,aio=native -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=34,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:a7:e8:75,bus=pci.0,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5904,tls-port=5905,addr=0.0.0.0,disable-ticketing,x509-dir=/etc/pki/libvirt-spice,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=1 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
char device redirected to /dev/pts/13 (label charserial0)
io_submit(len = 1) = -13 (Permission denied)
aiocb[0] = read of 0x200 bytes at offset 0
aiocb[0] = fd 21, opcode 7, iov = 0x7f7978821490, qiov = 0x7f7978821490, niov = 1
aiocb[0].qiov[0] = 0x7f7977dfe000 + 0x200
2016-08-09 01:29:56.378+0000: shutting down

# cat /var/log/libvirt/qemu/n2.log
2016-08-09 01:29:48.374+0000: starting up libvirt version: 2.0.0, package: 4.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-08-02-09:15:12, x86-034.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-19.el7.bz1285928_v2), hostname: lab.test.me
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=n2,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-19-n2/master-key.aes -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off,vmport=off -cpu Penryn -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 0df3dcb5-4fb2-4dec-b51d-926c9b07df6e -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-19-n2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/n2.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/sdb,format=raw,if=none,id=drive-ide0-0-0,cache=none,aio=native -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=35,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:36:f4:2c,bus=pci.0,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5906,tls-port=5907,addr=0.0.0.0,disable-ticketing,x509-dir=/etc/pki/libvirt-spice,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=1 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
char device redirected to /dev/pts/14 (label charserial0)
2016-08-09 01:29:48.461+0000: shutting down
2016-08-09T01:29:48.462249Z qemu-kvm: terminating on signal 15 from pid 6654
Comment 13 Kevin Wolf 2016-08-09 06:13:23 EDT
(In reply to Han Han from comment #12)
> io_submit(len = 1) = -13 (Permission denied)

I am not completely sure, but such permission problems could be related to
SELinux; generally EACCES is not an error code that is expected or documented
for io_submit(), which operates on an already opened file descriptor. On the
other hand, I'd rather expect EPERM from SELinux, but who knows.

Is it possible that libvirt revokes the first VM's access permissions to the
device as soon as the second one accessing the same device is started? If so,
shouldn't libvirt rather refuse to start the second VM?

I'm moving this to libvirt for further investigation.

Note that qemu should probably be fixed, too, to deal more gracefully with the
problem instead of aborting, but a qemu fix alone would still result in I/O
errors, so I think we need to fix the root cause primarily. We can create a new
qemu BZ for the abort() if we confirm this to be really a libvirt bug.
Comment 14 Ján Tomko 2016-08-09 06:33:28 EDT
Yes, libvirt cuts off both the VMs from accessing the device.

The goal of virtlockd is to prevent simultaneous writes to the same disk by two different domains:
https://bugzilla.redhat.com/show_bug.cgi?id=1191802#c8

We have the long-standing bug 547546 against the security driver to remember the security labels. After that is fixed, starting a new domain should fail before changing other domains' disks' labels.

Unless you want to keep this bug for tracking qemu's abort(), I'll close it as a duplicate of bug 547546.
Comment 15 Kevin Wolf 2016-08-09 07:31:46 EDT
Thanks, Ján. I'll use it for my purposes then.

Han, I proposed a patch upstream to handle io_submit() failure gracefully. I'm
building a backport of this patch for you to test at:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11527226
Comment 16 Han Han 2016-08-22 21:37:33 EDT
Hi Kevin,
I cannot find the download link of you scratch build. Pls check it.
Comment 17 Kevin Wolf 2016-08-23 04:23:42 EDT
Yes, looks like it was already old enough for it to expire. I made a new build
with the same patch for you:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11644607
Comment 18 Han Han 2016-08-26 02:41:09 EDT
(In reply to Kevin Wolf from comment #17)
> Yes, looks like it was already old enough for it to expire. I made a new
> build
> with the same patch for you:
> 
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11644607

After installing your scratch build, did as comment0 does. The VM n4 start failed due to locking and n3 continue running without SIGABRT.

logs:
# cat /var/log/libvirt/qemu/n3.log 
2016-08-26 06:36:14.966+0000: starting up libvirt version: 2.0.0, package: 5.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-08-10-12:57:06, x86-019.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-19.el7.bz1285928_v3), hostname: lab.work.me
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=n3,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-4-n3/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,vmport=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 35047947-6221-4001-87ef-440f2f5fe1fe -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-4-n3/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0xb -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xa -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/n3.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/sdb,format=raw,if=none,id=drive-ide0-0-0,cache=none,aio=native -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:83:9d:15,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-4-n3/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -device virtio-mouse-pci,id=input1,bus=pci.0,addr=0xc -device virtio-keyboard-pci,id=input2,bus=pci.0,addr=0xd -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x9 -msg timestamp=on 

# cat /var/log/libvirt/qemu/n4.log
2016-08-26 06:36:16.392+0000: shutting down
Comment 19 Kevin Wolf 2016-08-31 03:56:30 EDT
Thanks, posting the backport now so that it can be merged.
Comment 21 Miroslav Rezanina 2016-09-06 06:32:42 EDT
Fix included in qemu-kvm-rhev-2.6.0-23.el7
Comment 23 Han Han 2016-09-12 06:59:36 EDT
Did it as bug report on libvirt-2.0.0-8.el7.x86_64 qemu-kvm-rhev-2.6.0-24.el7.x86_64, bug was reproduced.

id 5d2a593f91ccb7e768ed66f43c3685ab3b5f0468
reason:         qemu-kvm killed by SIGABRT
time:           Mon 12 Sep 2016 18:50:59 CST
cmdline:        /usr/libexec/qemu-kvm -name guest=n1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-n1/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,vmport=off -cpu Westmere -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 152945f4-476d-4159-be56-e1a2e84ebc0f -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-10-n1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/n1.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/sdb,format=raw,if=none,id=drive-ide0-0-0,cache=none,aio=native -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3e:b1:ea,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-10-n1/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
package:        qemu-kvm-rhev-2.6.0-24.el7
uid:            107 (qemu)
Directory:      /var/spool/abrt/ccpp-2016-09-12-18:50:59-25644
Run 'abrt-cli report /var/spool/abrt/ccpp-2016-09-12-18:50:59-25644' for creating a case in Red Hat Customer Portal

The Autoreporting feature is disabled. Please consider enabling it by issuing
'abrt-auto-reporting enabled' as a user with root privileges
Comment 24 Han Han 2016-09-12 07:12 EDT
Created attachment 1200188 [details]
Backtrace and logs on latest qemu-kvm
Comment 25 Kevin Wolf 2016-09-12 07:36:31 EDT
Hm, indeed, I think I see the problem. For a complete fix, we would have to use
a bottom half in the error handling path to avoid entering the request coroutine
recursively.

I'm afraid this is too late for 7.3 then.
Comment 28 Paolo Bonzini 2017-03-10 10:33:57 EST
Should be fixed in 2.8.0 (commit fe121b9, "linux-aio: fix re-entrant completion processing", 2016-09-28).
Comment 29 Ademar Reis 2017-03-20 14:06:32 EDT
(In reply to Paolo Bonzini from comment #28)
> Should be fixed in 2.8.0 (commit fe121b9, "linux-aio: fix re-entrant
> completion processing", 2016-09-28).

I'm changing it to POST then. Kevin, any comments? (I'm leaving the needinfo(kwolf) here)
Comment 30 Kevin Wolf 2017-04-03 08:09:13 EDT
(In reply to Paolo Bonzini from comment #28)
> Should be fixed in 2.8.0 (commit fe121b9, "linux-aio: fix re-entrant
> completion processing", 2016-09-28).

Yes, had a look, this should be the one.
Comment 32 Han Han 2017-05-05 06:12:07 EDT
Remember to set lock_manager = "lockd" in /etc/libvirt/qemu.conf and then restart virtlockd and libvirtd.
Comment 33 CongLi 2017-05-05 06:16:03 EDT
Thanks Han Han's help.

Verified this bug on:
qemu-kvm-rhev-2.9.0-2.el7.x86_64

Steps:
1. Set lock_manager = "lockd" in /etc/libvirt/qemu.conf

2. Setup lockd environment
# vim /etc/libvirt/qemu-lockd.conf
auto_disk_leases = 1
require_lease_for_disks = 1
scsi_lockspace_dir = "/var/lib/libvirt/lockd/scsivolumes"
# mkdir -p /var/lib/libvirt/lockd/scsivolumes
# systemctl restart virtlockd; systemctl restart libvirtd

3. Two guests attach the same scsi device.
# vim disk.xml
 <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/sdb'/>
      <target dev='hda' bus='ide'/>
    </disk>
#  for i in n{3,4};do virsh attach-device $i disk.xml --config;done
Device attached successfully
Device attached successfully

4. Start 2 guest one by one.
# for i in n{3,4};do virsh start $i;sleep 1;done


Result:
n3 start successfully, and n4 locked, which is as expected.

# for i in n{3,4};do virsh start $i;sleep 1;done
Domain n3 started

error: Failed to start domain n4
error: resource busy: Lockspace resource '35333333000002ee0' is locked


Set bug to 'VERIFIED' based on the test result above.

Thanks.
Comment 35 errata-xmlrpc 2017-08-01 19:29:42 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 36 errata-xmlrpc 2017-08-01 21:07:21 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 37 errata-xmlrpc 2017-08-01 21:59:20 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 38 errata-xmlrpc 2017-08-01 22:40:06 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 39 errata-xmlrpc 2017-08-01 23:04:50 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 40 errata-xmlrpc 2017-08-01 23:24:58 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Note You need to log in before you can comment on or make changes to this bug.