| Summary: | Can not start up VM when using 0.8.2 libvirtd | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Guangya Liu <tramper2008> |
| Component: | kvm | Assignee: | Virtualization Maintenance <virt-maint> |
| Status: | CLOSED NOTABUG | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | urgent | Docs Contact: | |
| Priority: | medium | ||
| Version: | 5.6 | CC: | closms, dallan, mkenneth, rdassen, rhod, virt-maint |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-07-13 16:29:17 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Here is some more information about this bug. The root problem is the message "Could not allocate physical memory".
Running qemu-kvm directly (under strace) shows that qemu-kvm calls mmap() fails. Writing a simple C program that calls mmap() with the same args produces the same failure.
Here is the C program.
[mclosson@delint07 ~]$ cat mem.c
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>
#include <assert.h>
int
main()
{
size_t s = 2168475648;
char * ptr;
assert(sizeof(size_t) == 8);
ptr = mmap(NULL, s, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
memset(ptr, 0xff, s);
munmap(ptr, s);
return 0;
}
Running the program on a machine that doesn't exhibit the bug shows the following:
[mclosson@delint07 ~]$ gcc mem.c
[mclosson@delint07 ~]$ strace -tt ./a.out
10:37:36.760124 execve("./a.out", ["./a.out"], [/* 51 vars */]) = 0
10:37:36.761018 brk(0) = 0xffd6000
10:37:36.761075 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aead8ca5000
10:37:36.761136 uname({sys="Linux", node="delint07", ...}) = 0
10:37:36.761234 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) 10:37:36.761300 open("/etc/ld.so.cache", O_RDONLY) = 3
10:37:36.761355 fstat(3, {st_mode=S_IFREG|0644, st_size=106135, ...}) = 0
10:37:36.761426 mmap(NULL, 106135, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2aead8ca6000
10:37:36.761466 close(3) = 0
10:37:36.761516 open("/lib64/libc.so.6", O_RDONLY) = 3
10:37:36.761562 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332\241\0033\0\0\0"..., 832) = 832
10:37:36.761614 fstat(3, {st_mode=S_IFREG|0755, st_size=1717800, ...}) = 0
10:37:36.761678 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aead8cc0000
10:37:36.761729 mmap(0x3303a00000, 3498328, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3303a00000
10:37:36.761771 mprotect(0x3303b4e000, 2093056, PROT_NONE) = 0
10:37:36.761814 mmap(0x3303d4d000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14d000) = 0x3303d4d000
10:37:36.761868 mmap(0x3303d52000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3303d52000
10:37:36.761913 close(3) = 0
10:37:36.761963 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aead8cc1000
10:37:36.762008 arch_prctl(ARCH_SET_FS, 0x2aead8cc1210) = 0
10:37:36.762108 mprotect(0x3303d4d000, 16384, PROT_READ) = 0
10:37:36.762155 mprotect(0x330381b000, 4096, PROT_READ) = 0
10:37:36.762195 munmap(0x2aead8ca6000, 106135) = 0
10:37:36.762269 mmap(NULL, 2168475648, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aead8cc2000
10:37:38.680874 munmap(0x2aead8cc2000, 2168475648) = 0
10:37:38.840559 exit_group(0) = ?
Running the same program on a machine that does reveal the bug shows the following:
[root@lxbst0501 ~]# strace -tt ./mem
10:35:32.122432 execve("./mem", ["./mem"], [/* 22 vars */]) = 0
10:35:32.125134 brk(0) = 0x17b53000
10:35:32.125294 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b53ab36d000
10:35:32.125386 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b53ab36e000
10:35:32.125455 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
10:35:32.125542 open("/etc/ld.so.cache", O_RDONLY) = 3
10:35:32.125607 fstat(3, {st_mode=S_IFREG|0644, st_size=29122, ...}) = 0
10:35:32.125707 mmap(NULL, 29122, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b53ab36f000
10:35:32.125770 close(3) = 0
10:35:32.125837 open("/lib64/libc.so.6", O_RDONLY) = 3
10:35:32.125901 read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332\301\0174\0\0\0"...,
832) = 832
10:35:32.125971 fstat(3, {st_mode=S_IFREG|0755, st_size=1722304, ...}) = 0
10:35:32.126131 mmap(0x340fc00000, 3502424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x340fc00000
10:35:32.126197 mprotect(0x340fd4e000, 2097152, PROT_NONE) = 0 10:35:32.126260 mmap(0x340ff4e000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14e000) = 0x340ff4e000
10:35:32.126328 mmap(0x340ff53000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x340ff53000
10:35:32.126386 close(3) = 0
10:35:32.126454 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b53ab377000
10:35:32.126517 arch_prctl(ARCH_SET_FS, 0x2b53ab3776e0) = 0
10:35:32.126644 mprotect(0x340ff4e000, 16384, PROT_READ) = 0
10:35:32.126709 mprotect(0x340fa1b000, 4096, PROT_READ) = 0
10:35:32.126763 munmap(0x2b53ab36f000, 29122) = 0
10:35:32.126848 mmap(NULL, 2168475648, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
10:35:32.126906 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
10:35:32.127056 +++ killed by SIGSEGV +++
It is unlikely that the machine is really out of memory:
[root@lxbst0501 ~]# cat /proc/meminfo
MemTotal: 24676304 kB
MemFree: 15206136 kB
Buffers: 329892 kB
Cached: 8340008 kB
SwapCached: 0 kB
Active: 5178052 kB
Inactive: 3828624 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 24676304 kB
LowFree: 15206136 kB
SwapTotal: 4192956 kB
SwapFree: 4192956 kB
Dirty: 708 kB
Writeback: 0 kB
AnonPages: 336768 kB
Mapped: 24760 kB
Slab: 374172 kB
PageTables: 7644 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 16531108 kB
Committed_AS: 737512 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 302652 kB
VmallocChunk: 34359399243 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
Hugepagesize: 2048 kB
Closing, as the bug is reproduced using a simple host program https://bugzilla.redhat.com/show_bug.cgi?id=703704#c2 And it looks like a limit on the virtual memory https://bugzilla.redhat.com/show_bug.cgi?id=703704#c4 Ronen, can you explain why there is a limit on virtual memory? From comment #4 there is 15GB of free memory and 4GB of swap. But yet 2GB of mem cannot be mmap'd. In comment 4 it says [virtual memory (kbytes, -v) 1536000] This line is one of the outputs of "ulimit -a". My guess is that the local sys-admin, puts a limit on the resources of the users. Somebody did. Thanks! |
Description of problem: I saw this behavior in the past but not with this frequency and failure rate. In the past no more than one VM failed in each compute node. Now all VMs failed to start in more than one compute node. We upgrade recently libvirt in our compute nodes. Now we are running the 0.8.2 version. But I'm not sure if it's related. We never had this problem with OpenNebula. In our production instance each compute node has 24GB of memory and each VM has 2GB of memory (8VMs x 2 GB). So enough space for them. [root@lxbst0501 ~]# free total used free shared buffers cached Mem: 24676304 8191072 16485232 0 305408 5255460 -/+ buffers/cache: 2630204 22046100 Swap: 4192956 0 4192956 If I try to start the VMs by hand, I get the same error: [root@lxbst0501 ~]# virsh start vmbst050107 error: Failed to start domain vmbst050107 error: internal error Process exited while reading console log output: Could not allocate physical memory Version-Release number of selected component (if applicable): How reproducible: Repeat power off and start VM frequently Steps to Reproduce: Repeat power off and start VM frequently Actual results: Expected results: Additional info: