Bug 820194 - guest hang after migration on "Intel(R) Core(TM)2 Quad CPU Q9500 @ 2.83GHz" host
guest hang after migration on "Intel(R) Core(TM)2 Quad CPU Q9500 @ 2.83GH...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.3
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Juan Quintela
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-09 07:32 EDT by langfang
Modified: 2012-05-18 07:24 EDT (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-05-18 07:24:44 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description langfang 2012-05-09 07:32:31 EDT
Description of problem:
two test machine
1.when A-->B(migration sucessful,guest work well )
2.when B-->A(migration sucessful,but guest hang) 

attetion:(2 is not pong B-->A)

Version-Release number of selected component (if applicable):
two host:

model name	: Intel(R) Core(TM)2 Quad CPU    Q9500  @ 2.83GHz

#uname -r
2.6.32-269.el6.x86_64
# rpm -q  qemu-kvm
qemu-kvm-0.12.1.2-2.290.el6.x86_64

guest:
#uname -r
2.6.32-269.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot guest on B host 
/usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Penryn -m 2048 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name rhel6.3 -uuid 6dae16a9-08c1-4ee5-97c3-98d1480f3666 -k en-us -rtc base=localtime,driftfix=slew -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x3 -chardev file,id=charchannel0,path=/tmp/serial-socket -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -drive file=/root/nfs/RHEL-Server-6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=koTUXQrb,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=63:95:48:97:95:21,bus=pci.0,addr=0x5,bootindex=3 -monitor stdio -qmp tcp:0:6666,server,nowait -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -vnc :10  -chardev socket,id=serial0,path=/var/test1,server,nowait -device isa-serial,chardev=serial0 

2.with listen mode on host A
 /usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Penryn -m 2048 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name rhel6.3 -uuid 6dae16a9-08c1-4ee5-97c3-98d1480f3666 -k en-us -rtc base=localtime,driftfix=slew -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x3 -chardev file,id=charchannel0,path=/tmp/serial-socket -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -drive file=/root/nfs/RHEL-Server-6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=koTUXQrb,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=63:95:48:97:95:21,bus=pci.0,addr=0x5,bootindex=3 -monitor stdio -qmp tcp:0:6666,server,nowait -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -vnc :10  -chardev socket,id=serial0,path=/var/test1,server,nowait -device isa-serial,chardev=serial0 -incoming tcp:0:5999
3.on host B
(qemu)migrate -d tcp:10.66.65.90:5999
  
Actual results:
after migration completed,the rhel guest hang

on host A
(qemu)info status
running
(qemu) info registers--->the RIP and RC3 is constant
RAX=ffff88007fa72200 RBX=ffff8800022116a0 RCX=0000003378144efc RDX=0000000000000000
RSI=00000000000000c1 RDI=00000033e672bf76 RBP=ffff880002203f58 RSP=ffff880002203f38
R8 =0000000000000000 R9 =0000000000000000 R10=00000033b3af9c7a R11=0000000000000001
R12=0000000000000001 R13=ffff8800022116a0 R14=00000033e672ae13 R15=0000000000093780
RIP=ffffffff810a2413 RFL=00010086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0000 0000000000000000 ffffffff 00000000
GS =0000 ffff880002200000 ffffffff 00000000
LDT=0000 0000000000000000 ffffffff 00000000
TR =0040 ffff880002214200 00002087 00008b00 DPL=0 TSS64-busy
GDT=     ffff880002204000 0000007f
IDT=     ffffffff81dd7000 00000fff
CR0=8005003b CR2=00007fdffd419000 CR3=000000007d3a1000 CR4=000006f0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
XMM08=00000000000000000000000000000000 XMM09=00000000000000000000000000000000
XMM10=00000000000000000000000000000000 XMM11=00000000000000000000000000000000
XMM12=00000000000000000000000000000000 XMM13=00000000000000000000000000000000
XMM14=00000000000000000000000000000000 XMM15=00000000000000000000000000000000
# top -p `pidof qemu-kvm` -H
15946 root      20   0 2465m 555m 4428 R 99.9  7.6  21:16.71 qemu-kvm                                                                                                                                                                        
15947 root      20   0 2465m 555m 4428 R 99.9  7.6  21:16.65 qemu-kvm                                                                                                                                                                        
15935 root      20   0 2465m 555m 4428 S  0.0  7.6   0:05.52 qemu-kvm  


Expected results:
after migration guest work well 

Additional info:
two test machine
senario 1.when A-->B(migration sucessful,guest work well )
senario 2.when B-->A(migration sucessful,but guest hang) (attention this is not pong )
after seanrio 2, migration completed ,guest hang,but when pong A->B,the hang guest re-alive .

another info:
test this issue with  external nfs server is also have this problem
test this issue mount with sync option is also have this problem
Comment 1 langfang 2012-05-09 08:15:28 EDT
Your comment was:

    after guest hang,run the command on  host
    #kvm_stat -1
    efer_reload                    0         0
    exits                 2813005057    669565
    fpu_reload                     2         0
    halt_exits                     0         0
    halt_wakeup                    0         0
    host_state_reload         770585       202
    hypercalls                     0         0
    insn_emulation                 0         0
    insn_emulation_fail            0         0
    invlpg                         0         0
    io_exits                       0         0
    irq_exits                 678890       154
    irq_injections        1348782579    330421
    irq_window                     0         0
    largepages                     5         0
    mmio_exits                     0         0
    mmu_cache_miss                28         0
    mmu_flooded                    0         0
    mmu_pde_zapped                 0         0
    mmu_pte_updated                0         0
    mmu_pte_write                  0         0
    mmu_recycled                   0         0
    mmu_shadow_zapped             23         0
    mmu_unsync                     0         0
    nmi_injections                 0         0
    nmi_window                     0         0
    pf_fixed              2812325301    669399
    pf_guest                       0         0
    remote_tlb_flush               8         0
    request_irq                    0         0
    signal_exits                   4         0
    tlb_flush                      0         0

    addition info:currently, can reproduce host:
     Intel(R) Core(TM)2 Quad CPU    Q9500  @ 2.83GHz

    oh host :Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz  don't hit this problem
Comment 2 Juan Quintela 2012-05-15 07:46:29 EDT
One of the machines have "nx" enabled an the other not.  Could you enable it on both and retest?
Comment 3 langfang 2012-05-16 02:26:12 EDT
hi! Juan Quintela.the result as following:

enable both hosts nx:A-->B(migrate successful,guest work well)
                     B-->A(migrate successful,guest work well )


disable both hosts nx: A-->B(migrate successful,guest work well)
                       B-->A(migrate successful,guest work well )
Comment 4 Juan Quintela 2012-05-18 07:24:44 EDT
So, it is fixed (not trivial to detect that while running, but will try to think something to detect that missconfiguration).

Closing.

Note You need to log in before you can comment on or make changes to this bug.