Bug 703704 - Can not start up VM when using 0.8.2 libvirtd
Summary: Can not start up VM when using 0.8.2 libvirtd
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.6
Hardware: x86_64
OS: Linux
medium
urgent
Target Milestone: rc
: ---
Assignee: Virtualization Maintenance
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-11 05:36 UTC by Guangya Liu
Modified: 2018-11-14 11:44 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-13 16:29:17 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Guangya Liu 2011-05-11 05:36:48 UTC
Description of problem:

I saw this behavior in the past but not with this frequency and failure 
rate. In the past no more than one VM failed in each compute node. Now 
all VMs failed to start in more than one compute node.

We upgrade recently libvirt in our compute nodes. Now we are running the 
0.8.2 version. But I'm not sure if it's related. We never had this 
problem with OpenNebula.

In our production instance each compute node has 24GB of memory and each 
VM has 2GB of memory (8VMs x 2 GB). So enough space for them.

[root@lxbst0501 ~]# free
              total       used       free     shared    buffers     cached
Mem:      24676304    8191072   16485232          0     305408    5255460
-/+ buffers/cache:    2630204   22046100
Swap:      4192956          0    4192956

If I try to start the VMs by hand, I get the same error:
[root@lxbst0501 ~]# virsh start vmbst050107
error: Failed to start domain vmbst050107
error: internal error Process exited while reading console log output: 
Could not allocate physical memory

Version-Release number of selected component (if applicable):


How reproducible:
Repeat power off and start VM frequently

Steps to Reproduce:
Repeat power off and start VM frequently
  
Actual results:


Expected results:


Additional info:

Comment 2 Michael Closson 2011-07-06 18:41:06 UTC
Here is some more information about this bug.  The root problem is the message "Could not allocate physical memory".

Running qemu-kvm directly (under strace) shows that qemu-kvm calls mmap() fails. Writing a simple C program that calls mmap() with the same args produces the same failure.

Here is the C program.

[mclosson@delint07 ~]$ cat mem.c
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>
#include <assert.h>

int
main()
{
    size_t s = 2168475648;
    char * ptr;
    assert(sizeof(size_t) == 8);
    ptr = mmap(NULL, s, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    memset(ptr, 0xff, s);
    munmap(ptr, s);

    return 0;
}

Running the program on a machine that doesn't exhibit the bug shows the following:

[mclosson@delint07 ~]$ gcc mem.c
[mclosson@delint07 ~]$ strace -tt ./a.out
10:37:36.760124 execve("./a.out", ["./a.out"], [/* 51 vars */]) = 0
10:37:36.761018 brk(0)                  = 0xffd6000
10:37:36.761075 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aead8ca5000
10:37:36.761136 uname({sys="Linux", node="delint07", ...}) = 0
10:37:36.761234 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) 10:37:36.761300 open("/etc/ld.so.cache", O_RDONLY) = 3
10:37:36.761355 fstat(3, {st_mode=S_IFREG|0644, st_size=106135, ...}) = 0
10:37:36.761426 mmap(NULL, 106135, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2aead8ca6000
10:37:36.761466 close(3)                = 0
10:37:36.761516 open("/lib64/libc.so.6", O_RDONLY) = 3
10:37:36.761562 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332\241\0033\0\0\0"..., 832) = 832
10:37:36.761614 fstat(3, {st_mode=S_IFREG|0755, st_size=1717800, ...}) = 0
10:37:36.761678 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aead8cc0000
10:37:36.761729 mmap(0x3303a00000, 3498328, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3303a00000
10:37:36.761771 mprotect(0x3303b4e000, 2093056, PROT_NONE) = 0
10:37:36.761814 mmap(0x3303d4d000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14d000) = 0x3303d4d000
10:37:36.761868 mmap(0x3303d52000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3303d52000
10:37:36.761913 close(3)                = 0
10:37:36.761963 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aead8cc1000
10:37:36.762008 arch_prctl(ARCH_SET_FS, 0x2aead8cc1210) = 0
10:37:36.762108 mprotect(0x3303d4d000, 16384, PROT_READ) = 0
10:37:36.762155 mprotect(0x330381b000, 4096, PROT_READ) = 0
10:37:36.762195 munmap(0x2aead8ca6000, 106135) = 0
10:37:36.762269 mmap(NULL, 2168475648, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aead8cc2000
10:37:38.680874 munmap(0x2aead8cc2000, 2168475648) = 0
10:37:38.840559 exit_group(0)           = ?


Running the same program on a machine that does reveal the bug shows the following:

[root@lxbst0501 ~]# strace -tt ./mem
10:35:32.122432 execve("./mem", ["./mem"], [/* 22 vars */]) = 0
10:35:32.125134 brk(0)                  = 0x17b53000
10:35:32.125294 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b53ab36d000
10:35:32.125386 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b53ab36e000
10:35:32.125455 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
10:35:32.125542 open("/etc/ld.so.cache", O_RDONLY) = 3
10:35:32.125607 fstat(3, {st_mode=S_IFREG|0644, st_size=29122, ...}) = 0
10:35:32.125707 mmap(NULL, 29122, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b53ab36f000
10:35:32.125770 close(3)                = 0
10:35:32.125837 open("/lib64/libc.so.6", O_RDONLY) = 3
10:35:32.125901 read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332\301\0174\0\0\0"...,
832) = 832
10:35:32.125971 fstat(3, {st_mode=S_IFREG|0755, st_size=1722304, ...}) = 0
10:35:32.126131 mmap(0x340fc00000, 3502424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x340fc00000
10:35:32.126197 mprotect(0x340fd4e000, 2097152, PROT_NONE) = 0 10:35:32.126260 mmap(0x340ff4e000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14e000) = 0x340ff4e000
10:35:32.126328 mmap(0x340ff53000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x340ff53000
10:35:32.126386 close(3)                = 0
10:35:32.126454 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b53ab377000
10:35:32.126517 arch_prctl(ARCH_SET_FS, 0x2b53ab3776e0) = 0
10:35:32.126644 mprotect(0x340ff4e000, 16384, PROT_READ) = 0
10:35:32.126709 mprotect(0x340fa1b000, 4096, PROT_READ) = 0
10:35:32.126763 munmap(0x2b53ab36f000, 29122) = 0
10:35:32.126848 mmap(NULL, 2168475648, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
10:35:32.126906 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
10:35:32.127056 +++ killed by SIGSEGV +++



It is unlikely that the machine is really out of memory:

[root@lxbst0501 ~]# cat /proc/meminfo
MemTotal:     24676304 kB
MemFree:      15206136 kB
Buffers:        329892 kB
Cached:        8340008 kB
SwapCached:          0 kB
Active:        5178052 kB
Inactive:      3828624 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:     24676304 kB
LowFree:      15206136 kB
SwapTotal:     4192956 kB
SwapFree:      4192956 kB
Dirty:             708 kB
Writeback:           0 kB
AnonPages:      336768 kB
Mapped:          24760 kB
Slab:           374172 kB
PageTables:       7644 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  16531108 kB
Committed_AS:   737512 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    302652 kB
VmallocChunk: 34359399243 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

Comment 5 Ronen Hod 2011-07-13 16:29:17 UTC
Closing, as the bug is reproduced using a simple host program
https://bugzilla.redhat.com/show_bug.cgi?id=703704#c2
And it looks like a limit on the virtual memory
https://bugzilla.redhat.com/show_bug.cgi?id=703704#c4

Comment 6 Michael Closson 2011-07-13 16:59:37 UTC
Ronen, can you explain why there is a limit on virtual memory?  From comment #4 there is 15GB of free memory and 4GB of swap.  But yet 2GB of mem cannot be mmap'd.

Comment 7 Ronen Hod 2011-07-13 18:08:14 UTC
In comment 4 it says
[virtual memory (kbytes, -v) 1536000]
This line is one of the outputs of "ulimit -a".
My guess is that the local sys-admin, puts a limit on the resources of the users. Somebody did.

Comment 8 Michael Closson 2011-07-13 20:18:37 UTC
Thanks!


Note You need to log in before you can comment on or make changes to this bug.