643970 – guest migration turns failed by the end (16G + stress load)

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 643970 - guest migration turns failed by the end (16G + stress load)

Summary: guest migration turns failed by the end (16G + stress load)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	6.1
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	6.1
Assignee:	Juan Quintela
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	632557 645188 (view as bug list)
Depends On:	513765 599330 601045 645188
Blocks:	568128 Rhel6KvmTier1 675199
TreeView+	depends on / blocked

Reported:	2010-10-18 17:03 UTC by Chris Pelland
Modified:	2013-01-11 03:24 UTC (History)
CC List:	22 users (show)
Fixed In Version:	qemu-kvm-0.12.1.2-2.138.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:	601045
Environment:
Last Closed:	2011-05-19 11:30:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0534	0	normal	SHIPPED_LIVE	Important: qemu-kvm security, bug fix, and enhancement update	2011-05-19 11:20:36 UTC

Description Chris Pelland 2010-10-18 17:03:42 UTC

+++ This bug was initially created as a clone of Bug #601045 +++

clone to kvm. 

More Test:
1. 16g v-mem guest + stress load ( stress -c 3 --vm 12 --vm-bytest 1G )
=> FAILED
remaining ram stuck at 1842200 kbytes 

Migration status: active
transferred ram: 14956920 kbytes
remaining ram: 1842200 kbytes
total ram: 16797708 kbytes

2. 16g v-mem guest + stress load ( stress -c 3 --vm 12 --vm-bytest 256M )
=> FAILED
remaining ram stuck at 2987840 kbytes 
Migration status: active
transferred ram: 29963348 kbytes
remaining ram: 2987840 kbytes
total ram: 16797708 kbytes 

there is a similar bug about migration (without load). bug 513765

3. 8g  v-mem guest + stress load ( stress -c 3 --vm 12 --vm-bytest 1G )
==> PASS




+++ This bug was initially created as a clone of Bug #599330 +++

Description of problem:
while trying to migrate a 4vcpu/16g guest with some stress loaded on it, the migration started ok, but ended with failure.

Version-Release number of selected component (if applicable):
sm71 (rhevh-hypervisor-5.5-2.2.4, rhevm-2.2.0.46140)

How reproducible:
always

Steps to Reproduce:
1- create a guest with 4vcpu and 16g memory.
2- install any os on it. boot this guest up.
3- load 75% memory stress on the guest.
   e.g  #stress -m 48   (load 12g memory stress on rhel)
4- migrate this guest to another host
  
Actual results:
migration ended up with failure. guest turns running still on the source host.

Expected results:
migration succeeded. guest runs on the target host.
OR
give a warning without starting migration if the condition is not suitable for migration.

Additional info:
1- the source host and target host are on the same cluster, and both of the hosts owned 8cpu/32gb memory each.
2- whatever the guest os is, all failed.
3- there are 2 screenshots when migration start and end. hope they can help.
4- some info from vdc-log.txt below (for two failure migrations):
------------------------
02Jun 09:45:50 [3424] INFO  - Running command: MigrateVmToServerCommand
02Jun 09:45:50 [5768] INFO  - IncreasePendingVms::MigrateVmIncreasing vds intel-5310-32-2 pending vcpu count, now 4. Vm: IIS_win08r2_64
02Jun 09:55:58 [2756] ERROR - Rerun vm 66b6a11f-3063-4dc7-a825-f585c9037326. Called from vds intel-5310-32-1
-------------------
03Jun 03:26:33 [5520] INFO  - Running command: MigrateVmToServerCommand
03Jun 03:26:34 [4116] INFO  - IncreasePendingVms::MigrateVmIncreasing vds intel-5310-32-2 pending vcpu count, now 4. Vm: Mysql_rhel5u5_64
------------------------


(In reply to comment #7)
> I assume this bug is on the reporting issue to user, and you opened another bug
> to kvm on the fact the migration is failing?    

Lingqing Lu and I didn't file the other migration bug for kvm.

--- Additional comment from lihuang on 2010-06-07 01:24:47 EDT ---

kvm status in test 1


kvm statistics

 efer_reload                 30       0
 exits                198251092  390809
 fpu_reload              633903     504
 halt_exits             3874161       0
 halt_wakeup             184263       0
 host_state_reload    6951987    1663
 hypercalls                   0       0
 insn_emulation         9582288    5507
 insn_emulation_fail          0       0
 invlpg                  117224       0
 io_exits               1222678     590
 irq_exits              2283870    2787
 irq_injections         7821956    5100
 irq_window             1908481    1521
 kvm_request_irq              0       0
 largepages                   0       0
 mmio_exits              372732       0
 mmu_cache_miss          268363     374
 mmu_flooded              28435       0
 mmu_pde_zapped          279824     373
 mmu_pte_updated            901       0
 mmu_pte_write           333181     373
 mmu_recycled                 0       0
 mmu_shadow_zapped     259495       0
 mmu_unsync                 118      -4
 mmu_unsync_global            0       0
 nmi_injections               0       0
 nmi_window                   0       0
 pf_fixed              93753433  191083
 pf_guest              84925057  189560
 remote_tlb_flush        770846     425
 request_nmi                  0       0
 signal_exits                 1       0
 tlb_flush              1531492     942 



qemu-kvm command line :
/usr/libexec/qemu-kvm -no-hpet -no-kvm-pit-reinjection -usbdevice tablet -rtc-td-hack -startdate 2010-06-04T17:02:11 -name Mysql_rhel5u5_64 -smp 4,cores=1 -k en-us -m 16384 -boot c -net nic,vlan=1,macaddr=00:1a:4a:42:46:00,model=virtio -net tap,vlan=1,ifname=virtio_10_1,script=no -drive file=/rhev/data-center/ea8dd427-53d4-441c-8bdf-8eb4c205ff15/6df2e9d8-1366-4f28-aac2-380a7954e738/images/09d33ef8-104d-438f-81f3-a7a398407e28/f81c19f0-c0af-494e-b221-bc1847256711,media=disk,if=virtio,cache=off,serial=8f-81f3-a7a398407e28,boot=on,format=raw,werror=stop -pidfile /var/vdsm/7d73dc91-4f55-46d7-82e2-5cae180487c4.pid -vnc 0:10,password -cpu qemu64,+sse2,+cx16,+ssse3 -M rhel5.5.0 -notify all -balloon none -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=5.5-2.2-4,serial=44454C4C-5900-1051-8031-C3C04F4D3258_00:22:19:bb:4a:d3,uuid=7d73dc91-4f55-46d7-82e2-5cae180487c4 -vmchannel di:0200,unix:/var/vdsm/7d73dc91-4f55-46d7-82e2-5cae180487c4.guest.socket,server -monitor unix:/var/vdsm/7d73dc91-4f55-46d7-82e2-5cae180487c4.monitor.socket,server

Host cpuinfo: 
processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz
stepping        : 11
cpu MHz         : 1595.926
cache size      : 4096 KB
physical id     : 1
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx tm2 ssse3 cx16 xtpr lahf_lm
bogomips        : 3191.91
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

host meminfo :
MemTotal:     32809788 kB
MemFree:      13321836 kB
Buffers:         40188 kB
Cached:       19249952 kB
SwapCached:          0 kB
Active:         200504 kB
Inactive:     19146268 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:     32809788 kB
LowFree:      13321836 kB
SwapTotal:     1023992 kB
SwapFree:      1023992 kB
Dirty:              56 kB
Writeback:           0 kB
AnonPages:       56844 kB
Mapped:          11008 kB
Slab:            93472 kB
PageTables:       4008 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  17428884 kB
Committed_AS:   485296 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    272716 kB
VmallocChunk: 34359464619 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

--- Additional comment from lihuang on 2010-06-07 01:27:33 EDT ---

Created attachment 421714 [details]
kvmtrace in test 1

kvmtrace in test 1

--- Additional comment from lihuang on 2010-06-07 01:28:25 EDT ---

Created attachment 421716 [details]
kvmtrace in test 2

kvmtrace in test 2

--- Additional comment from llim on 2010-06-07 04:32:16 EDT ---

Is this specific to OS? Or applicable to all OS?

--- Additional comment from xinsun on 2010-06-07 05:25:04 EDT ---

*** Bug 599330 has been marked as a duplicate of this bug. ***

--- Additional comment from lihuang on 2010-06-07 10:49:39 EDT ---

FYI.


same test run on another host :
1. RHEV Hypervisor 5.5-2.2 (0.10). RHEL5.4 i386 PAE , 16g v-mem, 75% load. with
npt
   --> PASS

2. RHEV Hypervisor 5.5-2.2 (4)   . RHEL5.5 x86_64   . 16g v-mem, 75% load.with
npt
   --> PASS

3.2. RHEV Hypervisor 5.5-2.2 (4)   . RHEL5.5 x86_64   . 16g v-mem, 75%
load.without npt
   --> PASS


processor       : 11
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 8
model name      : Six-Core AMD Opteron(tm) Processor 2427
stepping        : 0
cpu MHz         : 2200.026
cache size      : 512 KB
physical id     : 1
siblings        : 6
core id         : 5
cpu cores       : 6
apicid          : 13
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm
cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch
osvw
bogomips        : 4399.42
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate [8]


[root@amd-2427-32-1 ~]# cat /proc/meminfo 
MemTotal:     32835876 kB
MemFree:      32166204 kB
Buffers:         53688 kB
Cached:         464344 kB
SwapCached:          0 kB
Active:         155556 kB
Inactive:       410880 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:     32835876 kB
LowFree:      32166204 kB
SwapTotal:    24809464 kB
SwapFree:     24809464 kB
Dirty:               0 kB
Writeback:           0 kB
AnonPages:       48428 kB
Mapped:          13204 kB
Slab:            45488 kB
PageTables:       2840 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  41227400 kB
Committed_AS:   418440 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    546184 kB
VmallocChunk: 34359190131 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

Comment 1 Dor Laor 2010-10-21 09:16:13 UTC

*** Bug 645188 has been marked as a duplicate of this bug. ***

Comment 3 Juan Quintela 2011-02-04 12:47:23 UTC

*** Bug 632557 has been marked as a duplicate of this bug. ***

Comment 10 Mike Cao 2011-02-18 08:26:01 UTC

Tested on qemu-kvm-0.12.1.2-2.146.el6 & kernel 2.6.32-113.el6.x86_64

How reproducible:
100%

steps:
1.start a VM with 16GB mem on src host.
eg:/usr/libexec/qemu-kvm -enable-kvm -m 16G -smp 4 -name rhel5U6 -uuid ddcbfb49-3411-1701-3c36-6bdbc00bedb9 -rtc base=utc,clock=host,driftfix=slew -boot c -drive file=/mnt/rhel6.raw,if=none,id=drive-ide0-0-0,boot=on,format=raw,cache=none -device virtio-blk-pci,drive=drive-ide0-0-0,id=drive-ide0-0-0 -netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:50:a4:c2:c1 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc :2 -device virtio-balloon-pci,id=ballooning -monitor stdio
2.running sress in the guest
eg :#stress --cpu --vm 16 --vm-byte 256M --verbose
3.start listenning port.
4.do live migration .after 1 hour ,stop stress and wait for migration complete.
5.check guest's status

Actual Results:
migration successfully.
network of guest was DOWN after migration.

Additional info :
Tried on the same image with 4G mem ,can not reproduce 

Based on above ,re-assign this bug .

Comment 11 Juan Quintela 2011-02-21 13:24:33 UTC

This is a different bug, it is related to this two bugzillas (they are for rhel5, but rhel6 has the same problem):

 https://bugzilla.redhat.com/show_bug.cgi?id=586643
 https://bugzilla.redhat.com/show_bug.cgi?id=647189

basically you can't expect to have a 16GB host running a 16GB guest, problems are going to happen sooner or later.  Migration has special problems because it needs extra memory for buffers to do the saves.  My experience is that you need to leave around 1GB of memory for the host for everything to work smoothely.

Comment 12 Juan Quintela 2011-02-21 14:13:01 UTC

Several other things:
a- I can't reproduce, it works for me.
b- nitpik: I guess you mean --cpu 1 (or other number) that command fails for me.

Later, Juan.

Comment 13 Mike Cao 2011-02-21 15:44:36 UTC

(In reply to comment #11)
> This is a different bug, it is related to this two bugzillas (they are for
> rhel5, but rhel6 has the same problem):
>  https://bugzilla.redhat.com/show_bug.cgi?id=586643
>  https://bugzilla.redhat.com/show_bug.cgi?id=647189
> basically you can't expect to have a 16GB host running a 16GB guest, problems
> are going to happen sooner or later.  Migration has special problems because it
> needs extra memory for buffers to do the saves.  My experience is that you need
> to leave around 1GB of memory for the host for everything to work smoothely.

I am using 2 hosts and both host's memory is 512G, I will reserve the machine again tormorrow for further testing .

Comment 14 Mike Cao 2011-02-22 10:28:16 UTC

Re-tried ,guest no network down.
did the steps in comment #10 with a wrong network config.

Based on above ,this issue has been fixed

Comment 15 errata-xmlrpc 2011-05-19 11:30:12 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html

Comment 16 errata-xmlrpc 2011-05-19 12:49:39 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html

Note You need to log in before you can comment on or make changes to this bug.

amit.shah
bcao
cpelland
cww
gcosta
iheim
khong
lihuang
lilu
llim
michen
mkenneth
mshao
ndai
plyons
qwan
Rhev-m-bugs
syeghiay
tburke
virt-maint
yeylon
ykaul