Bug 645188

Summary: guest migration turns failed by the end (16G + stress load)
Product: Red Hat Enterprise Linux 6 Reporter: Glauber Costa <gcosta>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.0CC: cpelland, cww, iheim, lihuang, lilu, llim, lpeer, mkenneth, mshao, ndai, plyons, qwan, Rhev-m-bugs, rwu, syeghiay, tburke, virt-maint, xinsun, yeylon, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 601045 Environment:
Last Closed: 2010-10-21 09:16:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 513765, 599330, 601045    
Bug Blocks: 568128, 643970    

Description Glauber Costa 2010-10-20 23:21:11 UTC
+++ This bug was initially created as a clone of Bug #601045 +++

clone to kvm. 

More Test:
1. 16g v-mem guest + stress load ( stress -c 3 --vm 12 --vm-bytest 1G )
=> FAILED
remaining ram stuck at 1842200 kbytes 

Migration status: active
transferred ram: 14956920 kbytes
remaining ram: 1842200 kbytes
total ram: 16797708 kbytes

2. 16g v-mem guest + stress load ( stress -c 3 --vm 12 --vm-bytest 256M )
=> FAILED
remaining ram stuck at 2987840 kbytes 
Migration status: active
transferred ram: 29963348 kbytes
remaining ram: 2987840 kbytes
total ram: 16797708 kbytes 

there is a similar bug about migration (without load). bug 513765

3. 8g  v-mem guest + stress load ( stress -c 3 --vm 12 --vm-bytest 1G )
==> PASS




+++ This bug was initially created as a clone of Bug #599330 +++

Description of problem:
while trying to migrate a 4vcpu/16g guest with some stress loaded on it, the migration started ok, but ended with failure.

Version-Release number of selected component (if applicable):
sm71 (rhevh-hypervisor-5.5-2.2.4, rhevm-2.2.0.46140)

How reproducible:
always

Steps to Reproduce:
1- create a guest with 4vcpu and 16g memory.
2- install any os on it. boot this guest up.
3- load 75% memory stress on the guest.
   e.g  #stress -m 48   (load 12g memory stress on rhel)
4- migrate this guest to another host
  
Actual results:
migration ended up with failure. guest turns running still on the source host.

Expected results:
migration succeeded. guest runs on the target host.
OR
give a warning without starting migration if the condition is not suitable for migration.

Additional info:
1- the source host and target host are on the same cluster, and both of the hosts owned 8cpu/32gb memory each.
2- whatever the guest os is, all failed.
3- there are 2 screenshots when migration start and end. hope they can help.
4- some info from vdc-log.txt below (for two failure migrations):
------------------------
02Jun 09:45:50 [3424] INFO  - Running command: MigrateVmToServerCommand
02Jun 09:45:50 [5768] INFO  - IncreasePendingVms::MigrateVmIncreasing vds intel-5310-32-2 pending vcpu count, now 4. Vm: IIS_win08r2_64
02Jun 09:55:58 [2756] ERROR - Rerun vm 66b6a11f-3063-4dc7-a825-f585c9037326. Called from vds intel-5310-32-1
-------------------
03Jun 03:26:33 [5520] INFO  - Running command: MigrateVmToServerCommand
03Jun 03:26:34 [4116] INFO  - IncreasePendingVms::MigrateVmIncreasing vds intel-5310-32-2 pending vcpu count, now 4. Vm: Mysql_rhel5u5_64
------------------------


(In reply to comment #7)
> I assume this bug is on the reporting issue to user, and you opened another bug
> to kvm on the fact the migration is failing?    

Lingqing Lu and I didn't file the other migration bug for kvm.

--- Additional comment from lihuang on 2010-06-07 01:24:47 EDT ---

kvm status in test 1


kvm statistics

 efer_reload                 30       0
 exits                198251092  390809
 fpu_reload              633903     504
 halt_exits             3874161       0
 halt_wakeup             184263       0
 host_state_reload    6951987    1663
 hypercalls                   0       0
 insn_emulation         9582288    5507
 insn_emulation_fail          0       0
 invlpg                  117224       0
 io_exits               1222678     590
 irq_exits              2283870    2787
 irq_injections         7821956    5100
 irq_window             1908481    1521
 kvm_request_irq              0       0
 largepages                   0       0
 mmio_exits              372732       0
 mmu_cache_miss          268363     374
 mmu_flooded              28435       0
 mmu_pde_zapped          279824     373
 mmu_pte_updated            901       0
 mmu_pte_write           333181     373
 mmu_recycled                 0       0
 mmu_shadow_zapped     259495       0
 mmu_unsync                 118      -4
 mmu_unsync_global            0       0
 nmi_injections               0       0
 nmi_window                   0       0
 pf_fixed              93753433  191083
 pf_guest              84925057  189560
 remote_tlb_flush        770846     425
 request_nmi                  0       0
 signal_exits                 1       0
 tlb_flush              1531492     942 



qemu-kvm command line :
/usr/libexec/qemu-kvm -no-hpet -no-kvm-pit-reinjection -usbdevice tablet -rtc-td-hack -startdate 2010-06-04T17:02:11 -name Mysql_rhel5u5_64 -smp 4,cores=1 -k en-us -m 16384 -boot c -net nic,vlan=1,macaddr=00:1a:4a:42:46:00,model=virtio -net tap,vlan=1,ifname=virtio_10_1,script=no -drive file=/rhev/data-center/ea8dd427-53d4-441c-8bdf-8eb4c205ff15/6df2e9d8-1366-4f28-aac2-380a7954e738/images/09d33ef8-104d-438f-81f3-a7a398407e28/f81c19f0-c0af-494e-b221-bc1847256711,media=disk,if=virtio,cache=off,serial=8f-81f3-a7a398407e28,boot=on,format=raw,werror=stop -pidfile /var/vdsm/7d73dc91-4f55-46d7-82e2-5cae180487c4.pid -vnc 0:10,password -cpu qemu64,+sse2,+cx16,+ssse3 -M rhel5.5.0 -notify all -balloon none -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=5.5-2.2-4,serial=44454C4C-5900-1051-8031-C3C04F4D3258_00:22:19:bb:4a:d3,uuid=7d73dc91-4f55-46d7-82e2-5cae180487c4 -vmchannel di:0200,unix:/var/vdsm/7d73dc91-4f55-46d7-82e2-5cae180487c4.guest.socket,server -monitor unix:/var/vdsm/7d73dc91-4f55-46d7-82e2-5cae180487c4.monitor.socket,server

Host cpuinfo: 
processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz
stepping        : 11
cpu MHz         : 1595.926
cache size      : 4096 KB
physical id     : 1
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx tm2 ssse3 cx16 xtpr lahf_lm
bogomips        : 3191.91
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

host meminfo :
MemTotal:     32809788 kB
MemFree:      13321836 kB
Buffers:         40188 kB
Cached:       19249952 kB
SwapCached:          0 kB
Active:         200504 kB
Inactive:     19146268 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:     32809788 kB
LowFree:      13321836 kB
SwapTotal:     1023992 kB
SwapFree:      1023992 kB
Dirty:              56 kB
Writeback:           0 kB
AnonPages:       56844 kB
Mapped:          11008 kB
Slab:            93472 kB
PageTables:       4008 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  17428884 kB
Committed_AS:   485296 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    272716 kB
VmallocChunk: 34359464619 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

--- Additional comment from lihuang on 2010-06-07 01:27:33 EDT ---

Created attachment 421714 [details]
kvmtrace in test 1

kvmtrace in test 1

--- Additional comment from lihuang on 2010-06-07 01:28:25 EDT ---

Created attachment 421716 [details]
kvmtrace in test 2

kvmtrace in test 2

--- Additional comment from llim on 2010-06-07 04:32:16 EDT ---

Is this specific to OS? Or applicable to all OS?

--- Additional comment from xinsun on 2010-06-07 05:25:04 EDT ---

*** Bug 599330 has been marked as a duplicate of this bug. ***

--- Additional comment from lihuang on 2010-06-07 10:49:39 EDT ---

FYI.


same test run on another host :
1. RHEV Hypervisor 5.5-2.2 (0.10). RHEL5.4 i386 PAE , 16g v-mem, 75% load. with
npt
   --> PASS

2. RHEV Hypervisor 5.5-2.2 (4)   . RHEL5.5 x86_64   . 16g v-mem, 75% load.with
npt
   --> PASS

3.2. RHEV Hypervisor 5.5-2.2 (4)   . RHEL5.5 x86_64   . 16g v-mem, 75%
load.without npt
   --> PASS


processor       : 11
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 8
model name      : Six-Core AMD Opteron(tm) Processor 2427
stepping        : 0
cpu MHz         : 2200.026
cache size      : 512 KB
physical id     : 1
siblings        : 6
core id         : 5
cpu cores       : 6
apicid          : 13
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm
cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch
osvw
bogomips        : 4399.42
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate [8]


[root@amd-2427-32-1 ~]# cat /proc/meminfo 
MemTotal:     32835876 kB
MemFree:      32166204 kB
Buffers:         53688 kB
Cached:         464344 kB
SwapCached:          0 kB
Active:         155556 kB
Inactive:       410880 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:     32835876 kB
LowFree:      32166204 kB
SwapTotal:    24809464 kB
SwapFree:     24809464 kB
Dirty:               0 kB
Writeback:           0 kB
AnonPages:       48428 kB
Mapped:          13204 kB
Slab:            45488 kB
PageTables:       2840 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  41227400 kB
Committed_AS:   418440 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    546184 kB
VmallocChunk: 34359190131 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

Comment 1 Dor Laor 2010-10-21 09:16:13 UTC
It was already cloned to rhel6

*** This bug has been marked as a duplicate of bug 643970 ***