Bug 820194

Summary: guest hang after migration on "Intel(R) Core(TM)2 Quad CPU Q9500 @ 2.83GHz" host
Product: Red Hat Enterprise Linux 6 Reporter: langfang <flang>
Component: qemu-kvmAssignee: Juan Quintela <quintela>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.3CC: acathrow, bsarathy, chayang, dyasny, flang, juzhang, knoel, michen, mkenneth, owasserm, quintela, qzhang, shu, virt-maint, wdai, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-18 11:24:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description langfang 2012-05-09 11:32:31 UTC
Description of problem:
two test machine
1.when A-->B(migration sucessful,guest work well )
2.when B-->A(migration sucessful,but guest hang) 

attetion:(2 is not pong B-->A)

Version-Release number of selected component (if applicable):
two host:

model name	: Intel(R) Core(TM)2 Quad CPU    Q9500  @ 2.83GHz

#uname -r
2.6.32-269.el6.x86_64
# rpm -q  qemu-kvm
qemu-kvm-0.12.1.2-2.290.el6.x86_64

guest:
#uname -r
2.6.32-269.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot guest on B host 
/usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Penryn -m 2048 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name rhel6.3 -uuid 6dae16a9-08c1-4ee5-97c3-98d1480f3666 -k en-us -rtc base=localtime,driftfix=slew -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x3 -chardev file,id=charchannel0,path=/tmp/serial-socket -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -drive file=/root/nfs/RHEL-Server-6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=koTUXQrb,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=63:95:48:97:95:21,bus=pci.0,addr=0x5,bootindex=3 -monitor stdio -qmp tcp:0:6666,server,nowait -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -vnc :10  -chardev socket,id=serial0,path=/var/test1,server,nowait -device isa-serial,chardev=serial0 

2.with listen mode on host A
 /usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Penryn -m 2048 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name rhel6.3 -uuid 6dae16a9-08c1-4ee5-97c3-98d1480f3666 -k en-us -rtc base=localtime,driftfix=slew -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x3 -chardev file,id=charchannel0,path=/tmp/serial-socket -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -drive file=/root/nfs/RHEL-Server-6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=koTUXQrb,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=63:95:48:97:95:21,bus=pci.0,addr=0x5,bootindex=3 -monitor stdio -qmp tcp:0:6666,server,nowait -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -vnc :10  -chardev socket,id=serial0,path=/var/test1,server,nowait -device isa-serial,chardev=serial0 -incoming tcp:0:5999
3.on host B
(qemu)migrate -d tcp:10.66.65.90:5999
  
Actual results:
after migration completed,the rhel guest hang

on host A
(qemu)info status
running
(qemu) info registers--->the RIP and RC3 is constant
RAX=ffff88007fa72200 RBX=ffff8800022116a0 RCX=0000003378144efc RDX=0000000000000000
RSI=00000000000000c1 RDI=00000033e672bf76 RBP=ffff880002203f58 RSP=ffff880002203f38
R8 =0000000000000000 R9 =0000000000000000 R10=00000033b3af9c7a R11=0000000000000001
R12=0000000000000001 R13=ffff8800022116a0 R14=00000033e672ae13 R15=0000000000093780
RIP=ffffffff810a2413 RFL=00010086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0000 0000000000000000 ffffffff 00000000
GS =0000 ffff880002200000 ffffffff 00000000
LDT=0000 0000000000000000 ffffffff 00000000
TR =0040 ffff880002214200 00002087 00008b00 DPL=0 TSS64-busy
GDT=     ffff880002204000 0000007f
IDT=     ffffffff81dd7000 00000fff
CR0=8005003b CR2=00007fdffd419000 CR3=000000007d3a1000 CR4=000006f0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
XMM08=00000000000000000000000000000000 XMM09=00000000000000000000000000000000
XMM10=00000000000000000000000000000000 XMM11=00000000000000000000000000000000
XMM12=00000000000000000000000000000000 XMM13=00000000000000000000000000000000
XMM14=00000000000000000000000000000000 XMM15=00000000000000000000000000000000
# top -p `pidof qemu-kvm` -H
15946 root      20   0 2465m 555m 4428 R 99.9  7.6  21:16.71 qemu-kvm                                                                                                                                                                        
15947 root      20   0 2465m 555m 4428 R 99.9  7.6  21:16.65 qemu-kvm                                                                                                                                                                        
15935 root      20   0 2465m 555m 4428 S  0.0  7.6   0:05.52 qemu-kvm  


Expected results:
after migration guest work well 

Additional info:
two test machine
senario 1.when A-->B(migration sucessful,guest work well )
senario 2.when B-->A(migration sucessful,but guest hang) (attention this is not pong )
after seanrio 2, migration completed ,guest hang,but when pong A->B,the hang guest re-alive .

another info:
test this issue with  external nfs server is also have this problem
test this issue mount with sync option is also have this problem

Comment 1 langfang 2012-05-09 12:15:28 UTC
Your comment was:

    after guest hang,run the command on  host
    #kvm_stat -1
    efer_reload                    0         0
    exits                 2813005057    669565
    fpu_reload                     2         0
    halt_exits                     0         0
    halt_wakeup                    0         0
    host_state_reload         770585       202
    hypercalls                     0         0
    insn_emulation                 0         0
    insn_emulation_fail            0         0
    invlpg                         0         0
    io_exits                       0         0
    irq_exits                 678890       154
    irq_injections        1348782579    330421
    irq_window                     0         0
    largepages                     5         0
    mmio_exits                     0         0
    mmu_cache_miss                28         0
    mmu_flooded                    0         0
    mmu_pde_zapped                 0         0
    mmu_pte_updated                0         0
    mmu_pte_write                  0         0
    mmu_recycled                   0         0
    mmu_shadow_zapped             23         0
    mmu_unsync                     0         0
    nmi_injections                 0         0
    nmi_window                     0         0
    pf_fixed              2812325301    669399
    pf_guest                       0         0
    remote_tlb_flush               8         0
    request_irq                    0         0
    signal_exits                   4         0
    tlb_flush                      0         0

    addition info:currently, can reproduce host:
     Intel(R) Core(TM)2 Quad CPU    Q9500  @ 2.83GHz

    oh host :Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz  don't hit this problem

Comment 2 Juan Quintela 2012-05-15 11:46:29 UTC
One of the machines have "nx" enabled an the other not.  Could you enable it on both and retest?

Comment 3 langfang 2012-05-16 06:26:12 UTC
hi! Juan Quintela.the result as following:

enable both hosts nx:A-->B(migrate successful,guest work well)
                     B-->A(migrate successful,guest work well )


disable both hosts nx: A-->B(migrate successful,guest work well)
                       B-->A(migrate successful,guest work well )

Comment 4 Juan Quintela 2012-05-18 11:24:44 UTC
So, it is fixed (not trivial to detect that while running, but will try to think something to detect that missconfiguration).

Closing.