Bug 893350 - 4k 'remaining ram' left during migration and can not finish with xbzrle enabled (mem r/w generator running inside guest)
4k 'remaining ram' left during migration and can not finish with xbzrle enabl...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.0
Unspecified Unspecified
high Severity medium
: rc
: ---
Assigned To: Hai Huang
Virtualization Bugs
:
: 916968 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-09 02:57 EST by Qunfang Zhang
Modified: 2014-06-17 23:20 EDT (History)
7 users (show)

See Also:
Fixed In Version: QEMU 1.4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 05:36:26 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Qunfang Zhang 2013-01-09 02:57:43 EST
Description of problem:
Running a simple r/w generator inside guest to dirty pages and then migrate guest with xbzrle enabled. However, migration will not finished and only 4k ram left in the end. If do not run the ram r/w generator, does not hit this issue.

Version-Release number of selected component (if applicable):
kernel-3.6.0-0.29.el7.x86_64
qemu-kvm-1.3.0-3.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Boot a guest on src host, and also boot it with listening mode (-incoming tcp:0:5800" on dst host.

2. Running a memory r/w generator inside guest:

#cat test.c

#include <stdlib.h>
main()
//example to generate dirty pages and delay migration for demo purposes:
{
    unsigned char *array;
    long int i,j,k;
    unsigned char c;
    long int loop=0;
    array=malloc(1024*1024*1024);
    while(1)
    {
        for(i=0;i<1024;i++)
 {
            c=0;
            for(j=0;j<1024;j++)
            {
                c++;
                for(k=0;k<1024;k++)
                {
                    array[i*1024*1024+j*1024+k]=c;
                }
            }
        }
        loop++;
    }
}

#gcc test.c -o test
#./test

3. On dst host:
(qemu) migrate_set_capability xbzrle on 

4. On source host:
(qemu) migrate_set_capability xbzrle on 
(qemu) migrate_cache_size 2G
(qemu) migrate -d tcp:$dst_host_ip:5800
  
Actual results:

On source host:

(qemu) info migrate
capabilities: xbzrle: on 
Migration status: active
total time: 576818 milliseconds
expected downtime: 24552 milliseconds
transferred ram: 1642267 kbytes
remaining ram: 4 kbytes
total ram: 2105728 kbytes
duplicate: 116037 pages
normal: 410489 pages
normal bytes: 1641956 kbytes
dirty pages rate: 94295 pages
cache size: 2147483648 bytes
xbzrle transferred: 198 kbytes
xbzrle pages: 1138 pages
xbzrle cache miss: 410486
xbzrle overflow : 3

Migration never finish, only 4k ram left to transfer.

Expected results:
Guest should finish migration and work well.

Additional info: (src and dst host are the same, 4 cpu in total for each host)
Host info:
processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
stepping	: 7
microcode	: 0x28
cpu MHz		: 1600.000
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6186.50
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

#free -m
             total       used       free     shared    buffers     cached
Mem:          7556       4366       3189          0         43        688
-/+ buffers/cache:       3635       3921
Swap:         7823          0       7823


Host top info:

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND                                              
14186 root      20   0 5167m 3.3g 6204 S 101.8 44.7  18:28.05 qemu-kvm                                             
    1 root      20   0 50728 6868 2052 S   0.0  0.1   0:03.77 systemd            


Guest top info:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 2774 root      20   0 1027m 1.0g  296 R 99.8 54.5  17:41.30 test               
    9 root      20   0     0    0    0 S 12.6  0.0   0:27.30 ksoftirqd/1
Comment 2 Qunfang Zhang 2013-01-09 03:53:01 EST
This bug can be reproduced with the default migration downtime (30ms IIRC) or 1s, 2s, 3s downtime. And if I set the downtime with a higher value like 5s, can finish the migration.
Comment 3 Orit Wasserman 2013-01-10 04:46:12 EST
This look a bug in the dirty pages counting that was fixed in qemu 1.4
http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg01297.html
Comment 4 Shaolong Hu 2013-03-01 05:32:01 EST
*** Bug 916968 has been marked as a duplicate of this bug. ***
Comment 8 mazhang 2014-01-16 04:32:03 EST
Reproduce this bug with qemu-kvm-1.3.0-3.el7.x86_64.

Host:
qemu-kvm-1.3.0-3.el7.x86_64
kernel-3.7.0-0.36.el7.x86_64

Guest:
RHEL6U6-64
kernel-2.6.32-438.el6.x86_64

Cli:
gdb --args /usr/libexec/qemu-kvm \
-name 'vm1' \
-nodefaults \
-chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20130218-133213-tne4yYwu,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-drive file=/home/rhel6u6-64.raw,if=none,id=drive-virtio-disk0,format=raw,cache=none,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-netdev tap,id=hostnet0 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:39:13:2c \
-m 2048 \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
-cpu 'SandyBridge',hv_relaxed \
-M pc \
-rtc base=localtime,clock=host,driftfix=slew \
-boot menu=on \
-enable-kvm \
-monitor stdio \
-vga qxl \
-spice port=5900,disable-ticketing \

Steps:
Run the application in comment#0 provide, then migrate guest with xbzrle enabled.

Result:
Migration can't finish, only 4k ram left to transfer.

(qemu) info migrate
capabilities: xbzrle: on 
Migration status: active
total time: 438139 milliseconds
expected downtime: 20558 milliseconds
transferred ram: 1645552 kbytes
remaining ram: 4 kbytes
total ram: 2228688 kbytes
duplicate: 152313 pages
normal: 410383 pages
normal bytes: 1641532 kbytes
dirty pages rate: 116850 pages
cache size: 2147483648 bytes
xbzrle transferred: 3871 kbytes
xbzrle pages: 2458 pages
xbzrle cache miss: 410239
xbzrle overflow : 144
(qemu) info migrate
capabilities: xbzrle: on 
Migration status: active
total time: 438667 milliseconds
expected downtime: 20558 milliseconds
transferred ram: 1645552 kbytes
remaining ram: 4 kbytes
total ram: 2228688 kbytes
duplicate: 152313 pages
normal: 410383 pages
normal bytes: 1641532 kbytes
dirty pages rate: 116850 pages
cache size: 2147483648 bytes
xbzrle transferred: 3871 kbytes
xbzrle pages: 2458 pages
xbzrle cache miss: 410239
xbzrle overflow : 144
(qemu) info migrate
capabilities: xbzrle: on 
Migration status: active
total time: 439179 milliseconds
expected downtime: 20558 milliseconds
transferred ram: 1645552 kbytes
remaining ram: 4 kbytes
total ram: 2228688 kbytes
duplicate: 152313 pages
normal: 410383 pages
normal bytes: 1641532 kbytes
dirty pages rate: 116850 pages
cache size: 2147483648 bytes
xbzrle transferred: 3871 kbytes
xbzrle pages: 2458 pages
xbzrle cache miss: 410239
xbzrle overflow : 144
(qemu) info migrate
capabilities: xbzrle: on 
Migration status: active
total time: 439666 milliseconds
expected downtime: 20558 milliseconds
transferred ram: 1645552 kbytes
remaining ram: 4 kbytes
total ram: 2228688 kbytes
duplicate: 152313 pages
normal: 410383 pages
normal bytes: 1641532 kbytes
dirty pages rate: 116850 pages
cache size: 2147483648 bytes
xbzrle transferred: 3871 kbytes
xbzrle pages: 2458 pages
xbzrle cache miss: 410239
xbzrle overflow : 144
Comment 11 Ludek Smid 2014-06-13 05:36:26 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.