Bug 1456086 - Random SIGABRT when migrate from 7.4 to 7.3 !(bs->open_flags & 0x0800)
Summary: Random SIGABRT when migrate from 7.4 to 7.3 !(bs->open_flags & 0x0800)
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Dr. David Alan Gilbert
QA Contact: huiqingding
URL:
Whiteboard:
Depends On:
Blocks: 1376765
TreeView+ depends on / blocked
 
Reported: 2017-05-27 02:00 UTC by Han Han
Modified: 2017-07-11 10:16 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-08 11:04:41 UTC


Attachments (Terms of Use)
all thread backtrace (11.55 KB, text/plain)
2017-05-27 02:00 UTC, Han Han
no flags Details

Description Han Han 2017-05-27 02:00:19 UTC
Created attachment 1282799 [details]
all thread backtrace

Description of problem:
As subject

Version-Release number of selected component (if applicable):
src host:
qemu-kvm-rhev-2.9.0-6.el7.x86_64
libvirt-3.2.0-6.el7.x86_64
kernel-3.10.0-668.el7.x86_64
dst host:
qemu-kvm-rhev-2.6.0-28.el7_3.10.x86_64
libvirt-2.0.0-10.el7_3.9.x86_64
kernel-3.10.0-514.21.1.el7.x86_64

How reproducible:
Seldom

Steps to Reproduce:
1. Run a host as following on RHEL7.4:
/usr/libexec/qemu-kvm -name guest=dominfo,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1876-dominfo/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 48df3ffd-0d6d-4977-95bc-9b5c48413cb1 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1876-dominfo/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -no-acpi -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device pxb,bus_nr=10,id=pci.1,bus=pci.0,addr=0x8 -device pxb,bus_nr=50,id=pci.2,bus=pci.0,addr=0x9 -device pxb,bus_nr=100,id=pci.3,bus=pci.0,addr=0xa -device pci-bridge,chassis_nr=4,id=pci.4,bus=pci.1,addr=0x0 -device pci-bridge,chassis_nr=5,id=pci.5,bus=pci.2,addr=0x0 -device pci-bridge,chassis_nr=6,id=pci.6,bus=pci.3,addr=0x0 -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/exports/dominfo.qcow2,format=qcow2,if=none,id=drive-ide0-0-0 -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=27,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:86:ba:d7,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=1 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on

2. Migrate to a RHEL7.3 host
# virsh migrate dominfo qemu+ssh://intel-e5530-8-2.lab.eng.pek2.redhat.com/system --verbose --unsafe

Actual results:
Sometimes migrate failed and get SIGABRT on dst host:
# abrt-cli ls
id a5f8397fe659fcd68103b7105838c24aca3d80e1
reason:         qemu-kvm killed by SIGABRT
time:           2017年05月26日 星期五 05时49分11秒
cmdline:        /usr/libexec/qemu-kvm -name guest=dominfo,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-dominfo/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 48df3ffd-0d6d-4977-95bc-9b5c48413cb1 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-21-dominfo/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -no-acpi -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device pxb,bus_nr=10,id=pci.1,bus=pci.0,addr=0x8 -device pxb,bus_nr=50,id=pci.2,bus=pci.0,addr=0x9 -device pxb,bus_nr=100,id=pci.3,bus=pci.0,addr=0xa -device pci-bridge,chassis_nr=4,id=pci.4,bus=pci.1,addr=0x0 -device pci-bridge,chassis_nr=5,id=pci.5,bus=pci.2,addr=0x0 -device pci-bridge,chassis_nr=6,id=pci.6,bus=pci.3,addr=0x0 -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/exports/dominfo.qcow2,format=qcow2,if=none,id=drive-ide0-0-0 -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=37,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:86:ba:d7,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5902,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=1 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
package:        qemu-kvm-rhev-2.6.0-28.el7_3.10
uid:            107 (qemu)
count:          1
Directory:      /var/spool/abrt/ccpp-2017-05-26-05:49:11-28336
Run 'abrt-cli report /var/spool/abrt/ccpp-2017-05-26-05:49:11-28336' for creating a case in Red Hat Customer Portal

Expected results:
NO SIGABRT

Additional info:
The backtrace:
(gdb) bt fu
#0  0x00007fa30ee781d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        resultvar = 0
        pid = 28336
        selftid = 28336
#1  0x00007fa30ee798c8 in __GI_abort () at abort.c:90
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7ffe6f99f5f8, sa_sigaction = 0x7ffe6f99f5f8}, sa_mask = {__val = {140338307790064, 
              140338727754896, 1342, 140338785189312, 140338306427555, 4, 140338785189152, 47244640257, 128, 0, 0, 0, 0, 21474836480, 
              140338307790064, 140338307802088}}, sa_flags = 665579520, sa_restorer = 0x7fa30efc23e8}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007fa30ee71146 in __assert_fail_base (fmt=0x7fa30efc23e8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x7fa328041d23 "!(bs->open_flags & 0x0800)", file=file@entry=0x7fa328041c90 "block/io.c", line=line@entry=1342, 
    function=function@entry=0x7fa328042110 <__PRETTY_FUNCTION__.34660> "bdrv_co_do_pwritev") at assert.c:92
        str = 0x7fa32abea5b0 "Ф\276*\243\177"
        total = 4096
#3  0x00007fa30ee711f2 in __GI___assert_fail (assertion=assertion@entry=0x7fa328041d23 "!(bs->open_flags & 0x0800)", 
    file=file@entry=0x7fa328041c90 "block/io.c", line=line@entry=1342, 
    function=function@entry=0x7fa328042110 <__PRETTY_FUNCTION__.34660> "bdrv_co_do_pwritev") at assert.c:101
No locals.
#4  0x00007fa327f2eaf7 in bdrv_co_do_pwritev (bs=0x7fa32ac95400, offset=<optimized out>, bytes=65536, qiov=0x7ffe6f99cfc0, flags=(unknown: 0))
    at block/io.c:1342
        req = {bs = 0x7fa32ac95400, offset = 11119296512, bytes = 65536, type = BDRV_TRACKED_READ, serialising = false, 
          overlap_offset = 11119296512, overlap_bytes = 65536, list = {le_next = 0x0, le_prev = 0x7fa32ac985b8}, co = 0x7fa32ac2c880, 
          wait_queue = {entries = {tqh_first = 0x7fa327a41c38, tqh_last = 0x2}}, waiting_for = 0x7fa32ac95400}
        align = 512
        head_buf = 0x0
        tail_buf = 0x0
        local_qiov = {iov = 0x7fa32ac01000, niov = 725630976, nalloc = 32675, size = 512}
        use_local_qiov = false
        ret = <optimized out>
        __PRETTY_FUNCTION__ = "bdrv_co_do_pwritev"
#5  0x00007fa327f2ebc2 in bdrv_rw_co_entry (opaque=0x7ffe6f99cf70) at block/io.c:588
        rwco = 0x7ffe6f99cf70
#6  0x00007fa327f9218a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78
        self = 0x7fa32ac2c880
        co = 0x7fa32ac2c880
#7  0x00007fa30ee89cf0 in ?? () from /usr/lib64/libc-2.17.so
No symbol table info available.
---Type <return> to continue, or q <return> to quit---0
#8  0x00007ffe6f99c690 in ?? ()
No symbol table info available.
#9  0x0000000000000000 in ?? ()
No symbol table info available.


Since the bugs is hard to reproduce(only one time when I run the test), could you analysis the backtrace and find a proper way to reproduce it easily. Thanks

Comment 2 Dr. David Alan Gilbert 2017-05-30 12:49:05 UTC
Hi,
  Can you confirm - is it the 7.3 destination that's aborting?

Dave

Comment 3 Han Han 2017-05-31 01:55:02 UTC
Sure. The destination is RHEL7.3.z.

Comment 4 Dr. David Alan Gilbert 2017-05-31 11:00:28 UTC
Hi,
  1) Can you tell me what does the 'dominfo' image you're running does - is it some type of stress test or what?
  2) What guest OS is it running?
  3) Please repeat the test until you hit a failure, when that happens please check on the state of the source VM - is it still running?

Dave

Comment 5 Dr. David Alan Gilbert 2017-06-06 17:13:00 UTC
Hi - can you repeat this one?

Comment 6 Juan Quintela 2017-06-07 08:18:01 UTC
Hi

you are missing cache=none on the disk.

Once you change that, if you can still reproduce it, could you check if it also happens when you use virtio-blk for the disk instead of ide?

Thanks, Juan.

Comment 7 Han Han 2017-06-08 02:56:12 UTC
Hi I didn't hit the issue any more.
I also tried virtio-blk with cache=none, but not reproduced.

Comment 8 Dr. David Alan Gilbert 2017-06-08 10:22:58 UTC
(In reply to Han Han from comment #7)
> Hi I didn't hit the issue any more.
> I also tried virtio-blk with cache=none, but not reproduced.

Hmm ok, I guess we'll need to close it then if we can't reproduce it; however,  please answer the questions about the guest OS from comment 4.

Comment 9 Han Han 2017-06-08 10:56:26 UTC
(In reply to Dr. David Alan Gilbert from comment #4)
> Hi,
>   1) Can you tell me what does the 'dominfo' image you're running does - is
> it some type of stress test or what?
>   2) What guest OS is it running?
>   3) Please repeat the test until you hit a failure, when that happens
> please check on the state of the source VM - is it still running?
> 
> Dave

1)
As I remember I didn't run any stress test. But the dst host cannot resolve the src host's hostname when migration.
2)
Guest OS is RHEL7.4 with kernel-3.10.0-675.el7.x86_64.
3)
I didn't hit the bug any more. As I remember VM kept running on src host after meet the failure.

Comment 10 Dr. David Alan Gilbert 2017-06-08 11:04:41 UTC
OK, then lets close it, but keep an eye out for anything similar.


Note You need to log in before you can comment on or make changes to this bug.