Bug 586643

Summary:

Redhat 5.5 KVM save/restore VM fails when guest consumes large memory

Product:

Red Hat Enterprise Linux 5

Reporter:

Chong Chen <cchen317>

Component:

kvm

Assignee:

Andrea Arcangeli <aarcange>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

urgent

Docs Contact:

Priority:

low

Version:

5.5

CC:

aarcange, cchen317, closms, gcosta, jdenemar, juzhang, llim, michen, plyons, tburke, virt-maint

Target Milestone:

Keywords:

Triaged

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-10-04 14:20:09 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

580948

Attachments:

Description	Flags
kernel panic in the VM when restoring VM	none

Description Chong Chen 2010-04-28 03:38:31 UTC

Created attachment 409652 [details]
kernel panic in the VM when restoring VM

Description of problem:

We are evaluating RHEL5.5 KVM to save a VM on a host, and then restore it on another host. The feature is very important for us. 

We have hit this problem where if the state file is over 2GB then the 
restore sometimes fails.  The larger the state file the more likely the 
failure. We are not sure if this is KVM problem or just an environment problem and requires some assistance. 

Version-Release number of selected component (if applicable):

Redhat 5.5 official release

How reproducible:

Very easy to reproduce. I was able to do that on two different machines, Intel and AMD. 

Steps to Reproduce:

For example, with a 4GB vm, running a 1 GB memory intensive application 
save and restore seems to be reliable: 

  [root@hb06b11 XM]# /usr/bin/time virsh save 1 STATE 
  Domain 1 saved to STATE 
  0.00user 0.00system 0:44.22elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 
  0inputs+0outputs (0major+539minor)pagefaults 0swaps 

  [root@hb06b11 XM]# ls -lh 
  total 1.3G 
  -rw------- 1 root root 1.3G Apr 23 14:39 STATE 

  [root@hb06b11 XM]# /usr/bin/time virsh restore STATE 
  Domain restored from STATE 
  0.00user 0.00system 0:05.56elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 
  0inputs+0outputs (0major+536minor)pagefaults 0swaps 



But with a 3.5GB application in the same 4GB vm, the restore fails. 

  [root@hb06b11 XM]# /usr/bin/time virsh save 3 STATE 
  Domain 3 saved to STATE 
  0.00user 0.00system 1:57.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 
  0inputs+0outputs (0major+540minor)pagefaults 0swaps 

  [root@hb06b11 XM]# ls -lh 
  total 3.6G 
  -rw------- 1 root root 3.6G Apr 23 14:49 STATE 

 [root@hb06b11 XM]# /usr/bin/time virsh restore STATE 
  error: Failed to restore domain from STATE 
  error: operation failed: failed to start VM 
  Command exited with non-zero status 1 
  0.00user 0.00system 0:10.12elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 
  0inputs+0outputs (0major+545minor)pagefaults 0swaps 

It seems failed at the following situations: 

- The size of the state file on the disk seems to be the key variable. 
  If the size > 2GB, the restore action fails. 

We have tried migration (cold and live) and both seem to be 
reliable.  It is just save and restore actions that have trouble. 

After some memory usage in a VM, the virsh restore action causes the 
kernel panic in the VM, and gives the kernel oops, see attached. Not sure if this is related. 

additional info:

Here is the our test environment:

HP ProLiant BL465c G5:

CPU: Quad-core AMD Opteron (tm) Processor 2382 (8 cores)

RAM: DDR2 800mhz 16 G (2G * 8)

Network: 1GbE

RHEL 5.5 offiical release

Here is the test program consumes the memory 3.5 G memory:

#include<stdio.h>
#include<stdlib.h>
#include<assert.h>
#include <sys/time.h>

#define SIZE 3500000000L

int
main()
{
    long            ix, i;
    char           *A = malloc(SIZE);
    struct timeval last, curr;
    double         d;

    assert(A);

    gettimeofday(&last, NULL);
    for(i=0;;i++) {
        for(ix = 0; ix < SIZE; ix++) {
            A[ix] = (char)random();
        }
        gettimeofday(&curr, NULL);
        d = (curr.tv_sec-last.tv_sec);
        d += ((((double)curr.tv_usec)-((double)last.tv_usec))/1000000.);
        printf("%d %lf\n", i, d);
        last = curr;
    }

    return 0;
}

Comment 1 Dor Laor 2010-06-09 06:32:58 UTC

Can you check by running qemu directly? I like to see the specific error message.
Thanks

Comment 2 Chong Chen 2010-06-10 03:26:36 UTC

Can you provide qemu command line? 

Thanks

Comment 3 Michael Closson 2010-06-14 21:19:32 UTC

The error message is "migration failed".  Here is what I did to reproduce it:
- Start a VM with the following qemu command-line
/usr/libexec/qemu-kvm \
  -S \
  -M rhel5.4.0 \
  -m 3000 \
  -smp 1 \
  -name vm0 \
  -uuid fc8b3336-5b4d-024c-fa8f-f60ee9fd235f \
  -pidfile /var/run/libvirt/qemu//vm0.pid \
  -boot c \
  -drive file=/dev/MyVolGroup/vm0,if=virtio,index=0,boot=on,cache=none \
  -serial pty \
  -parallel none \
  -usb \
  -k en-us

- When it boots, start the memjob program.
- Save the VM to a file with the monitor commands
  - stop
  - migrate "exec:cat > STATEFILE"
- Restore the VM with the qemu command-line

/usr/libexec/qemu-kvm \
  -S \
  -M rhel5.4.0 \
  -m 3000 \
  -smp 1 \
  -name vm0 \
  -uuid fc8b3336-5b4d-024c-fa8f-f60ee9fd235f \
  -pidfile /var/run/libvirt/qemu//vm0.pid \
  -boot c \
  -drive file=/dev/MyVolGroup/vm0,if=virtio,index=0,boot=on,cache=none \
  -serial pty \
  -parallel none \
  -usb \
  -k en-us \
  -incoming "exec:cat < STATEFILE"

- The VM starts and memjob continues to run.
- Try again to save the vm
  - stop
  - migrate "exec: cat > STATEFILE2"
  - the qemu monitor prints the error message "migration failed".

Comment 4 Dor Laor 2010-06-15 07:23:25 UTC

I have some questions that might help realizing what happens:
 - If you run the same load but use live migration, will it fails?
 - Does the host swap (check vmstat 1)?
 - What happens if you do not use the -M flag?
 - Are you using the latest rhel5.5 (or even rhel5.6 candidate code?

Comment 5 Michael Closson 2010-06-15 16:18:09 UTC

Hey Dor,
- live migration works OK.
- There is no swapping during the save or restore (swap si/so are all zero)
- The problem still happens if I omit the -M flag.
- The problem also happens w/ RHEL6 beta. (is that the same thing as rhel5.6 candidate code?)  I haven't applied any updates to the base rhel5.5 installation.

Comment 6 Michael Closson 2010-06-15 17:52:07 UTC

I installed the debuginfo rpms for KVM and attached GDB and set a breakpoint in do_migrate().  The code fails in popen().  errno is 12: ENOMEM.   This is surprising to me.

Here are the details:


#0  exec_start_outgoing_migration (command=0xeacfd5 "cat > STATE2", bandwidth_limit=33554432, async=0) at migration-exec.c:65
#1  0x000000000046b4d3 in do_migrate (detach=0, uri=0xeacfd0 "exec:cat > STATE2") at migration.c:66
#2  0x00000000004107eb in monitor_handle_command (opaque=<value optimized out>, cmdline=<value optimized out>)
    at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/monitor.c:2705
#3  monitor_handle_command1 (opaque=<value optimized out>, cmdline=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/monitor.c:3076
#4  0x0000000000464212 in readline_handle_byte (ch=<value optimized out>) at readline.c:398
#5  0x000000000040ecff in term_read (opaque=<value optimized out>, buf=0x2000000 <Address 0x2000000 out of bounds>, size=1)
    at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/monitor.c:3069
#6  0x0000000000465841 in kbd_send_chars (opaque=<value optimized out>) at console.c:1098
#7  0x00000000004659c3 in kbd_put_keysym (keysym=<value optimized out>) at console.c:1151
#8  0x000000000047dac1 in sdl_refresh (ds=0xb4bce0) at sdl.c:439
#9  0x00000000004081e4 in gui_update (opaque=0xeacfd5) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:3684
#10 0x00000000004071bc in qemu_run_timers (ptimer_head=0xb38e00, current_time=1188340664) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:1271
#11 0x0000000000409577 in main_loop_wait (timeout=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:4021
#12 0x00000000004ff1ea in kvm_main_loop () at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:596
#13 0x000000000040e425 in main_loop (argc=15, argv=0x7fffffffe8b8, envp=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:4040



f = popen(command, "w");                                                                                  if (f == NULL) {
  dprintf("Unable to popen exec target\n");
  goto err_after_alloc;
}


(gdb) p f
$12 = (FILE *) 0x0
(gdb) p errno
$13 = 12

Comment 7 Dor Laor 2010-06-28 12:43:08 UTC

Try with -incoming exec:"cat<file"
                       ^^^
The " should be post the exec

Comment 8 Michael Closson 2010-06-29 17:32:20 UTC

The result is the same.  I tried without the memory intensive job and I also got the same error.

I ran qemu under strace.  Here is the instance that worked.


read(20, 0x7fff7fa583a0, 128)           = -1 EAGAIN (Resource temporarily unavailable)
clock_gettime(CLOCK_MONOTONIC, {1117600, 410184995}) = 0
select(15, [14], NULL, NULL, {0, 0})    = 1 (in [14], left {0, 0})
ioctl(14, FIONREAD, [32])               = 0
read(14, "\2$\342\0\241n\201\23Z\1\0\0\3\0\340\5\r\0\340\5\201\3\30\1\363\1u\0\20\0\1\0", 32) = 32
select(15, [14], NULL, NULL, {0, 0})    = 0 (Timeout)
write(15, "H\2\206\0\r\0\340\5\16\0\340\5\10\0\20\0(\0010\0\0\30\1\0\0\0\0\377\0\0\0\377"..., 536) = 536
write(15, "H\2\206\0\r\0\340\5\16\0\340\5\10\0\20\0\0\0@\0\0\30\1\0\252\252\252\377\252\252\252\377"..., 536) = 536
pipe([22, 23])                          = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b6fb4e7f020) = 19783
close(22)                               = 0
fcntl(23, F_SETFD, 0x800 /* FD_??? */)  = 0
clock_gettime(CLOCK_MONOTONIC, {1117600, 540088995}) = 0
ioctl(8, 0x4020ae46, 0x7fff7fa57a70)    = 0
ioctl(8, 0x4020ae46, 0x7fff7fa57a70)    = 0
ioctl(8, 0x4020ae46, 0x7fff7fa57a70)    = 0
ioctl(8, 0x4020ae46, 0x7fff7fa57a70)    = 0
ioctl(8, 0x4020ae46, 0x7fff7fa57a70)    = 0
ioctl(8, 0x4020ae46, 0x7fff7fa57a70)    = 0



And here is the save after the restore, the call to clone fails.




20128 clock_gettime(CLOCK_MONOTONIC, {1117850, 528054995}) = 0
20128 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
20128 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0
20128 clock_gettime(CLOCK_MONOTONIC, {1117850, 528156995}) = 0
20128 select(15, [14], NULL, NULL, {0, 0}) = 1 (in [14], left {0, 0})
20128 ioctl(14, FIONREAD, [32])         = 0
20128 read(14, "\2$\244\0\237?\205\23Z\1\0\0\3\0\340\5\r\0\340\5\256\3\277\1\261\1\342\0\20\0\1\0", 32) = 32
20128 select(15, [14], NULL, NULL, {0, 0}) = 0 (Timeout)
20128 write(15, "H\2\206\0\r\0\340\5\16\0\340\5\10\0\20\0000\0010\0\0\30\1\0\0\0\0\377\0\0\0\377"..., 536) = 536
20128 write(15, "H\2\206\0\r\0\340\5\16\0\340\5\10\0\20\0\0\0@\0\0\30\1\0\252\252\252\377\252\252\252\377"..., 536) = 536
20128 pipe([22, 23])                    = 0
20128 clone( <unfinished ...>
20159 <... rt_sigtimedwait resumed> {si_signo=SIGALRM, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value={int=0, ptr=0}}, 0, 8) = 14
20159 write(21, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
20159 rt_sigtimedwait([ALRM IO],  <unfinished ...>
20128 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2acf7b348020) = -1 ENOMEM 
(Cannot allocate memory)
20128 close(22)                         = 0
20128 close(23)                         = 0
20128 write(15, "H\2\206\0\r\0\340\5\16\0\340\5\10\0\20\0\0\0@\0\0\30\1\0\0\0\0\377\0\0\0\377"..., 536) = 536
20128 write(15, "H\2\206\0\r\0\340\5\16\0\340\5\10\0\20\0\0\0@\0\0\30\1\0\0\0\0\377\0\0\0\377"..., 536) = 536
20128 write(15, "H\2\206\0\r\0\340\5\16\0\340\5\10\0\20\0\10\0@\0\0\30\1\0\0\0\0\377\0\0\0\377"..., 536) = 536
20128 write(15, "H\2\206\0\r\0\340\5\16\0\340\5\10\0\20\0\20\0@\0\0\30\1\0\0\0\0\377\0\0\0\377"..., 536) = 536


The hypervisor has 4GB and the VM is 3GB.   /var/log/messages doesn't show anything interesting.


Jun 29 13:08:43 delint06 gconfd (root-18385): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
Jun 29 13:09:46 delint06 kernel: kvm: 18440: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x130079
Jun 29 13:09:46 delint06 kernel: kvm: 18440: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xffe18f0a
Jun 29 13:09:46 delint06 kernel: kvm: 18440: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x530079
Jun 29 13:15:05 delint06 kernel: device tap0 entered promiscuous mode
Jun 29 13:15:07 delint06 kernel: br0: topology change detected, propagating
Jun 29 13:15:07 delint06 kernel: br0: port 2(tap0) entering forwarding state
Jun 29 13:15:56 delint06 kernel: kvm: 19182: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x130079
Jun 29 13:15:56 delint06 kernel: kvm: 19182: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xffe18f0a
Jun 29 13:15:56 delint06 kernel: kvm: 19182: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x530079


Is there any way to determine why clone() failed?

Comment 9 Dor Laor 2010-06-30 08:33:29 UTC

clone returned -ENOMEM. This means there is not enough mem on the host.
Please use larger host or increase the swap files.
Sending /proc/meminfo will help too.

Comment 10 Michael Closson 2010-07-01 14:31:01 UTC

I can also reproduce with 3GB VM on a 4GB hypervisor, if the VM is running a large memory job.

I just seems odd that the system can't fork() when there is still over 300MB of available ram.


[root@delint06 ~]# free
             total       used       free     shared    buffers     cached
Mem:       4043172    3729836     313336          0      30344     285684
-/+ buffers/cache:    3413808     629364
Swap:      2096472        136    2096336


[root@delint06 ~]# ps -eF | grep qemu
root     29911 12071 38 847246 3169380 0 10:21 pts/3   00:02:44 /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 3072 -boot c -drive file=/dev/MyVolGroup/vm0,if=virtio,index=0,boot=on,cache=none -net nic -net tap,ifname=tap0,script=no,downscript=no
root     30802 14924  0 15295   732   2 10:28 pts/0    00:00:00 grep qemu


[root@delint06 ~]# cat /proc/meminfo 
MemTotal:      4043172 kB
MemFree:        314536 kB
Buffers:         30636 kB
Cached:         285764 kB
SwapCached:          0 kB
Active:        3254860 kB
Inactive:       292572 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      4043172 kB
LowFree:        314536 kB
SwapTotal:     2096472 kB
SwapFree:      2096336 kB
Dirty:              12 kB
Writeback:           0 kB
AnonPages:     3230984 kB
Mapped:          21784 kB
Slab:           109264 kB
PageTables:      12220 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   4118056 kB
Committed_AS:  3602380 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    270924 kB
VmallocChunk: 34359467403 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

Comment 11 Michael Closson 2010-07-01 14:58:22 UTC

Citrix Xen (probably RH xen also) can save/restore the same configuration.  4G hypervisor, 3G VM, large memory job.

Comment 12 Michael Closson 2010-07-08 15:07:53 UTC

Dor,

Does popen() use fork() or vfork()?

Mike C

Comment 14 RHEL Program Management 2011-01-11 20:11:15 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 15 RHEL Program Management 2011-01-11 22:51:02 UTC

This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 16 Chong Chen 2011-01-15 05:33:02 UTC

> This request was erroneously denied for the current release of
> Red Hat Enterprise Linux.  The error has been fixed and this
> request has been re-proposed for the current release.

Does this mean if the bug has been fixed in the latest 6.0 release? 

Chong

Comment 17 Bill Burns 2011-01-15 11:55:26 UTC

No, all comment 15 does is cancel out a mistake made in comment 14. And this BZ is for RHEL 5 not RHEL 6.

Comment 19 Juan Quintela 2011-01-17 12:14:56 UTC

*** Bug 647189 has been marked as a duplicate of this bug. ***

Comment 20 Chong Chen 2011-01-18 03:24:42 UTC

> No, all comment 15 does is cancel out a mistake made in comment 14. And this BZ
is for RHEL 5 not RHEL 6.

[Chong] So, are you saying if the latest RHEL 6 does not have this problem?

Comment 21 Dor Laor 2011-01-18 16:53:59 UTC

(In reply to comment #20)
> > No, all comment 15 does is cancel out a mistake made in comment 14. And this BZ
> is for RHEL 5 not RHEL 6.
> 
> [Chong] So, are you saying if the latest RHEL 6 does not have this problem?

No, it might work or not for rhel6. 
The above comment just fixes some automatic bot that changed the bugzilla state.

Comment 22 Miya Chen 2011-01-30 10:01:32 UTC

Test it in kvm-83-224.el5 with the following steps, cannot reproduce it.

steps:
1.start a vm with 3.5g MEM in a 4G host:
# /usr/libexec/qemu-kvm -rtc-td-hack -no-hpet -M rhel5.6.0 -m 3500 -smp 1 -name rhel56-64 -uuid `uuidgen` -monitor stdio -drive file=rhel56-64-virtio.qcow2,if=virtio,boot=on,format=qcow2,cache=none -net nic,macaddr=20:20:20:14:56:18,model=virtio,vlan=0 -net tap,script=/etc/qemu-ifup,vlan=0 -usb -vnc :1
2. run the 3.5G memory-consuming program provided in #Description, then in guest:
# free -lm
             total       used       free     shared    buffers     cached
Mem:          3359       3338         20          0          7        181
Low:          3359       3338         20
High:            0          0          0
-/+ buffers/cache:       3150        209
Swap:         4959        293       4666
3. Save the VM to a file with the monitor commands
  - stop
  - migrate "exec:cat > STATEFILE"
4. shutdown guest
5. Restore the VM with:
# /usr/libexec/qemu-kvm -rtc-td-hack -no-hpet -M rhel5.6.0 -m 3500 -smp 1 -name rhel56-64 -uuid `uuidgen` -monitor stdio -drive file=rhel56-64-virtio.qcow2,if=virtio,boot=on,format=qcow2,cache=none -net nic,macaddr=20:20:20:14:56:18,model=virtio,vlan=0 -net tap,script=/etc/qemu-ifup,vlan=0 -usb -vnc :1 -incoming "exec:cat < statefile"

Actual result:
do save/restore for 2 times, no failure found.

michen-->Chong Chen, can you still hit this problem? can you give any suggestion about how to reproduce this? thanks.

Comment 23 Chong Chen 2011-02-01 01:42:07 UTC

Can you try save/restore through virsh interface? It may be libvirt problem rather than qemu KVM.

Chong

Comment 24 Michael Closson 2011-02-01 19:05:00 UTC

Can reproduce with:
kvm-83-164.el5
kvm-qemu-img-83-164.el5
kmod-kvm-83-164.el5


Where can I get kvm-224?

Comment 25 Dor Laor 2011-02-02 09:45:28 UTC

(In reply to comment #24)
> Can reproduce with:
> kvm-83-164.el5
> kvm-qemu-img-83-164.el5
> kmod-kvm-83-164.el5
> 
> 
> Where can I get kvm-224?

Upgrade to rhel5.6

Comment 27 Michael Closson 2011-02-03 22:46:37 UTC

I am able to reproduce on RHEL56.


[root@hb06b07 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
[root@hb06b07 ~]# rpm -qa | grep kvm
etherboot-zroms-kvm-5.4.4-13.el5
kvm-83-224.el5
kmod-kvm-83-224.el5


Hypervisor:

[root@hb06b07 ~]# free
             total       used       free     shared    buffers     cached
Mem:       4045524     458848    3586676          0      30020     221716
-/+ buffers/cache:     207112    3838412
Swap:      2097144         16    2097128

VM:
[root@hb06b07 ~]# cat startVM.sh 
/usr/libexec/qemu-kvm \
-S \
-M rhel5.4.0 \
-m 3000 \
-smp 1 \
-boot c \
-drive file=/dev/vmvg/rhel55tmpl4kvm,if=virtio,index=0,boot=on,cache=none \
-net nic,macaddr=DE:AD:BE:EF:26:8F,model=virtio -net tap,script=/root/qemu-ifup


memory intensive program is the same, SIZE set as:
#define SIZE 3000000000L

After starting memjob, wait for it to print out the result of the first iteration, so the hypervisor allocs all memory.  It took about 5 minutes in my environment.

mem on hypervisor before trying to save:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
31660 root      25   0 4752m 3.4g 4388 R 100.2 87.6  13:10.10 qemu-kvm          

[root@hb06b07 ~]# cat /proc/meminfo 
MemTotal:      4045524 kB
MemFree:         30828 kB
Buffers:         31764 kB
Cached:         226164 kB
SwapCached:         16 kB
Active:        3714808 kB
Inactive:       155596 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      4045524 kB
LowFree:         30828 kB
SwapTotal:     2097144 kB
SwapFree:      2097128 kB
Dirty:              28 kB
Writeback:           0 kB
AnonPages:     3612456 kB
Mapped:          26092 kB
Slab:            92728 kB
PageTables:      12816 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   4119904 kB
Committed_AS:  5028648 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    264380 kB
VmallocChunk: 34359473783 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB


mem in VM:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 2990 root      25   0 2864m 2.8g  380 R  8.2 97.1   8:44.26 memjob             

[root@localhost ~]# cat /proc/meminfo 
MemTotal:      3016480 kB
MemFree:         13096 kB
Buffers:           724 kB
Cached:          10284 kB
SwapCached:       1924 kB
Active:        2520996 kB
Inactive:       437892 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      3016480 kB
LowFree:         13096 kB
SwapTotal:     2096472 kB
SwapFree:      2084076 kB
Dirty:             156 kB
Writeback:           0 kB
AnonPages:     2946676 kB
Mapped:           8912 kB
Slab:            15136 kB
PageTables:       9488 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   3604712 kB
Committed_AS:  3141632 kB
VmallocTotal: 34359738367 kB
VmallocUsed:      1868 kB
VmallocChunk: 34359736491 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB



strace of kvm process when I try to save.

pipe([11, 12])                          = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2af3d7874b90) = -1 ENOMEM (Cannot allocate memory)
close(11)                               = 0
close(12)                               = 0

Comment 28 Andrea Arcangeli 2011-02-08 17:20:50 UTC

Is this lack of MADV_DONTFORK in qemu-kvm? I added it in RHEL6, but not yet on RHEL5. So it shouldn't happen on RHEL6. If we don't want to mess with exec.c and we ignore qemu tcg, it's enough to remove "&& !kvm_has_sync_mmu()" in qemu-kvm.c:kvm_setup_guest_memory to fix.

Comment 29 Dor Laor 2011-02-09 09:59:09 UTC

(In reply to comment #28)
> Is this lack of MADV_DONTFORK in qemu-kvm? I added it in RHEL6, but not yet on
> RHEL5. So it shouldn't happen on RHEL6. If we don't want to mess with exec.c
> and we ignore qemu tcg, it's enough to remove "&& !kvm_has_sync_mmu()" in
> qemu-kvm.c:kvm_setup_guest_memory to fix.

Go ahead and try it, looks like you nailed it.

Comment 30 Michael Closson 2011-02-09 21:43:43 UTC

I tried the same test with RHEL6 as host _and_ guest OS.  Surprisingly the RES of the memjob process in the guest was much smaller than anticipated.


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                      
 1524 root      20   0 2864m 781m  220 D 17.3 78.3   1:09.88 memjob                        


Save also fails.


[root@hb06b07 ~]# virsh save 3 STATEFILE
error: Failed to save domain 3 to STATEFILE
error: operation failed: Migration unexpectedly failed


I tried starting the VM with qemu-kvm, but the SDL window doesn't come up and the network did work as it did in RHEL5 (I started the VM the same way as shown above, but there was no eth0 in the VM.).  Any ideas?  I'll keep working on it.

Comment 31 Michael Closson 2011-04-26 20:32:55 UTC

This bug could be related: https://bugzilla.redhat.com/show_bug.cgi?id=639305  The latter appends focus on the save side, but the restore also failes (see the first append).

Comment 32 Michael Closson 2011-05-05 17:00:08 UTC

16 GB HV and 10GB VM also fails.  RHEL55   a 4 GB VM on the same host saves OK.

top - 12:45:33 up 9 days, 18:03,  1 user,  load average: 1.17, 0.97, 0.76
Tasks: 130 total,   1 running, 129 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.1%us, 35.3%sy,  0.0%ni, 63.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16508580k total, 11374800k used,  5133780k free,    90784k buffers
Swap:  2048276k total,   657500k used,  1390776k free,   929596k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
20545 root      15   0 10.1g 8.9g 2516 S 138.4 56.5   6:26.29 qemu-kvm

Comment 33 Michael Closson 2011-05-06 00:03:50 UTC

I increased the swap on the machine to 32GB and then was able to save a 15GB VM.  Also, after applying the change in #639305 I was able to restore the 15GB VM.

Comment 34 Michael Closson 2011-06-20 18:43:30 UTC

From my perspective this bug can be closed.
1) The SAVE problem was fixed by adding more swap.
2) The RESTORE problem was fixed in RHEL56.

We have three more problems we found with RHEL56.  I'll log them as separate issues.