Bug 1456086

Summary: Random SIGABRT when migrate from 7.4 to 7.3 !(bs->open_flags & 0x0800)
Product: Red Hat Enterprise Linux 7 Reporter: Han Han <hhan>
Component: qemu-kvm-rhevAssignee: Dr. David Alan Gilbert <dgilbert>
Status: CLOSED INSUFFICIENT_DATA QA Contact: huiqingding <huding>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: chayang, dyuan, fjin, hhan, huding, juzhang, knoel, michen, peterx, quintela, virt-maint, xfu, xuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-08 11:04:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 1376765    
Attachments:
Description Flags
all thread backtrace none

Description Han Han 2017-05-27 02:00:19 UTC
Created attachment 1282799 [details]
all thread backtrace

Description of problem:
As subject

Version-Release number of selected component (if applicable):
src host:
qemu-kvm-rhev-2.9.0-6.el7.x86_64
libvirt-3.2.0-6.el7.x86_64
kernel-3.10.0-668.el7.x86_64
dst host:
qemu-kvm-rhev-2.6.0-28.el7_3.10.x86_64
libvirt-2.0.0-10.el7_3.9.x86_64
kernel-3.10.0-514.21.1.el7.x86_64

How reproducible:
Seldom

Steps to Reproduce:
1. Run a host as following on RHEL7.4:
/usr/libexec/qemu-kvm -name guest=dominfo,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1876-dominfo/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 48df3ffd-0d6d-4977-95bc-9b5c48413cb1 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1876-dominfo/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -no-acpi -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device pxb,bus_nr=10,id=pci.1,bus=pci.0,addr=0x8 -device pxb,bus_nr=50,id=pci.2,bus=pci.0,addr=0x9 -device pxb,bus_nr=100,id=pci.3,bus=pci.0,addr=0xa -device pci-bridge,chassis_nr=4,id=pci.4,bus=pci.1,addr=0x0 -device pci-bridge,chassis_nr=5,id=pci.5,bus=pci.2,addr=0x0 -device pci-bridge,chassis_nr=6,id=pci.6,bus=pci.3,addr=0x0 -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/exports/dominfo.qcow2,format=qcow2,if=none,id=drive-ide0-0-0 -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=27,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:86:ba:d7,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=1 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on

2. Migrate to a RHEL7.3 host
# virsh migrate dominfo qemu+ssh://intel-e5530-8-2.lab.eng.pek2.redhat.com/system --verbose --unsafe

Actual results:
Sometimes migrate failed and get SIGABRT on dst host:
# abrt-cli ls
id a5f8397fe659fcd68103b7105838c24aca3d80e1
reason:         qemu-kvm killed by SIGABRT
time:           2017年05月26日 星期五 05时49分11秒
cmdline:        /usr/libexec/qemu-kvm -name guest=dominfo,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-dominfo/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 48df3ffd-0d6d-4977-95bc-9b5c48413cb1 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-21-dominfo/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -no-acpi -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device pxb,bus_nr=10,id=pci.1,bus=pci.0,addr=0x8 -device pxb,bus_nr=50,id=pci.2,bus=pci.0,addr=0x9 -device pxb,bus_nr=100,id=pci.3,bus=pci.0,addr=0xa -device pci-bridge,chassis_nr=4,id=pci.4,bus=pci.1,addr=0x0 -device pci-bridge,chassis_nr=5,id=pci.5,bus=pci.2,addr=0x0 -device pci-bridge,chassis_nr=6,id=pci.6,bus=pci.3,addr=0x0 -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/exports/dominfo.qcow2,format=qcow2,if=none,id=drive-ide0-0-0 -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=37,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:86:ba:d7,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5902,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=1 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
package:        qemu-kvm-rhev-2.6.0-28.el7_3.10
uid:            107 (qemu)
count:          1
Directory:      /var/spool/abrt/ccpp-2017-05-26-05:49:11-28336
Run 'abrt-cli report /var/spool/abrt/ccpp-2017-05-26-05:49:11-28336' for creating a case in Red Hat Customer Portal

Expected results:
NO SIGABRT

Additional info:
The backtrace:
(gdb) bt fu
#0  0x00007fa30ee781d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        resultvar = 0
        pid = 28336
        selftid = 28336
#1  0x00007fa30ee798c8 in __GI_abort () at abort.c:90
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7ffe6f99f5f8, sa_sigaction = 0x7ffe6f99f5f8}, sa_mask = {__val = {140338307790064, 
              140338727754896, 1342, 140338785189312, 140338306427555, 4, 140338785189152, 47244640257, 128, 0, 0, 0, 0, 21474836480, 
              140338307790064, 140338307802088}}, sa_flags = 665579520, sa_restorer = 0x7fa30efc23e8}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007fa30ee71146 in __assert_fail_base (fmt=0x7fa30efc23e8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x7fa328041d23 "!(bs->open_flags & 0x0800)", file=file@entry=0x7fa328041c90 "block/io.c", line=line@entry=1342, 
    function=function@entry=0x7fa328042110 <__PRETTY_FUNCTION__.34660> "bdrv_co_do_pwritev") at assert.c:92
        str = 0x7fa32abea5b0 "Ф\276*\243\177"
        total = 4096
#3  0x00007fa30ee711f2 in __GI___assert_fail (assertion=assertion@entry=0x7fa328041d23 "!(bs->open_flags & 0x0800)", 
    file=file@entry=0x7fa328041c90 "block/io.c", line=line@entry=1342, 
    function=function@entry=0x7fa328042110 <__PRETTY_FUNCTION__.34660> "bdrv_co_do_pwritev") at assert.c:101
No locals.
#4  0x00007fa327f2eaf7 in bdrv_co_do_pwritev (bs=0x7fa32ac95400, offset=<optimized out>, bytes=65536, qiov=0x7ffe6f99cfc0, flags=(unknown: 0))
    at block/io.c:1342
        req = {bs = 0x7fa32ac95400, offset = 11119296512, bytes = 65536, type = BDRV_TRACKED_READ, serialising = false, 
          overlap_offset = 11119296512, overlap_bytes = 65536, list = {le_next = 0x0, le_prev = 0x7fa32ac985b8}, co = 0x7fa32ac2c880, 
          wait_queue = {entries = {tqh_first = 0x7fa327a41c38, tqh_last = 0x2}}, waiting_for = 0x7fa32ac95400}
        align = 512
        head_buf = 0x0
        tail_buf = 0x0
        local_qiov = {iov = 0x7fa32ac01000, niov = 725630976, nalloc = 32675, size = 512}
        use_local_qiov = false
        ret = <optimized out>
        __PRETTY_FUNCTION__ = "bdrv_co_do_pwritev"
#5  0x00007fa327f2ebc2 in bdrv_rw_co_entry (opaque=0x7ffe6f99cf70) at block/io.c:588
        rwco = 0x7ffe6f99cf70
#6  0x00007fa327f9218a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78
        self = 0x7fa32ac2c880
        co = 0x7fa32ac2c880
#7  0x00007fa30ee89cf0 in ?? () from /usr/lib64/libc-2.17.so
No symbol table info available.
---Type <return> to continue, or q <return> to quit---0
#8  0x00007ffe6f99c690 in ?? ()
No symbol table info available.
#9  0x0000000000000000 in ?? ()
No symbol table info available.


Since the bugs is hard to reproduce(only one time when I run the test), could you analysis the backtrace and find a proper way to reproduce it easily. Thanks

Comment 2 Dr. David Alan Gilbert 2017-05-30 12:49:05 UTC
Hi,
  Can you confirm - is it the 7.3 destination that's aborting?

Dave

Comment 3 Han Han 2017-05-31 01:55:02 UTC
Sure. The destination is RHEL7.3.z.

Comment 4 Dr. David Alan Gilbert 2017-05-31 11:00:28 UTC
Hi,
  1) Can you tell me what does the 'dominfo' image you're running does - is it some type of stress test or what?
  2) What guest OS is it running?
  3) Please repeat the test until you hit a failure, when that happens please check on the state of the source VM - is it still running?

Dave

Comment 5 Dr. David Alan Gilbert 2017-06-06 17:13:00 UTC
Hi - can you repeat this one?

Comment 6 Juan Quintela 2017-06-07 08:18:01 UTC
Hi

you are missing cache=none on the disk.

Once you change that, if you can still reproduce it, could you check if it also happens when you use virtio-blk for the disk instead of ide?

Thanks, Juan.

Comment 7 Han Han 2017-06-08 02:56:12 UTC
Hi I didn't hit the issue any more.
I also tried virtio-blk with cache=none, but not reproduced.

Comment 8 Dr. David Alan Gilbert 2017-06-08 10:22:58 UTC
(In reply to Han Han from comment #7)
> Hi I didn't hit the issue any more.
> I also tried virtio-blk with cache=none, but not reproduced.

Hmm ok, I guess we'll need to close it then if we can't reproduce it; however,  please answer the questions about the guest OS from comment 4.

Comment 9 Han Han 2017-06-08 10:56:26 UTC
(In reply to Dr. David Alan Gilbert from comment #4)
> Hi,
>   1) Can you tell me what does the 'dominfo' image you're running does - is
> it some type of stress test or what?
>   2) What guest OS is it running?
>   3) Please repeat the test until you hit a failure, when that happens
> please check on the state of the source VM - is it still running?
> 
> Dave

1)
As I remember I didn't run any stress test. But the dst host cannot resolve the src host's hostname when migration.
2)
Guest OS is RHEL7.4 with kernel-3.10.0-675.el7.x86_64.
3)
I didn't hit the bug any more. As I remember VM kept running on src host after meet the failure.

Comment 10 Dr. David Alan Gilbert 2017-06-08 11:04:41 UTC
OK, then lets close it, but keep an eye out for anything similar.