Bug 903432
| Summary: | Qemu process is oom killed despite 11GB of free memory, cgroup limit needs to be relaxed | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | William Brown <william> |
| Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 18 | CC: | amit.shah, berrange, cfergeau, clalancette, crobinso, dwmw2, itamar, jforbes, jyang, laine, libvirt-maint, pbonzini, rjones, scottt.tw, veillard, virt-maint |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-03-04 17:16:51 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
It appears that setting disk cache mode to "None" corrects this issue. There was a bug in libvirt (fixed upstream) where it set
the cgroup memory limit too low causing qemu to die under
load. I believe the fix for this is:
commit 3c83df679e8feab939c08b1f97c48f9290a5b8cd
Author: Michal Privoznik <mprivozn>
Date: Tue Jan 8 10:15:49 2013 +0100
qemu: Relax hard RSS limit
Currently, if there's no hard memory limit defined for a domain,
libvirt tries to calculate one, based on domain definition and magic
equation and set it upon the domain startup. The rationale behind was,
if there's a memory leak or exploit in qemu, we should prevent the
host system trashing. However, the equation was too tightening, as it
didn't reflect what the kernel counts into the memory used by a
process. Since many hosts do have a swap, nobody hasn't noticed
anything, because if hard memory limit is reached, process can
continue allocating memory on a swap. However, if there is no swap on
the host, the process gets killed by OOM killer. In our case, the qemu
process it is.
To prevent this, we need to relax the hard RSS limit. Moreover, we
should reflect more precisely the kernel way of accounting the memory
for process. That is, even the kernel caches are counted within the
memory used by a process (within cgroups at least). Hence the magic
equation has to be changed:
limit = 1.5 * (domain memory + total video memory) + (32MB for cache
per each disk) + 200MB
(By the way -- the description of this bug is wrong. It *does*
affect you even if you have swap).
There's also bug 891324 in libvirtd (actually, in libselinux)
where a memory leak causes libvirtd to grow and grow in size
until there's no memory left to run guests; however that's
probably not the bug you are hitting.
Thanks Rich, that sounds like the culprit. libvirt-0.10.2.3-1.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/libvirt-0.10.2.3-1.fc18 libvirt-0.10.2.3-1.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. Possibly this bug has started to happen again (bug 966939). |
Description of problem: Jan 24 11:17:02 franky. kernel: qemu-kvm invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Jan 24 11:17:02 franky. kernel: qemu-kvm cpuset=emulator mems_allowed=0 Jan 24 11:17:02 franky. kernel: Pid: 13888, comm: qemu-kvm Tainted: GF W 3.7.2-204.fc18.x86_64 #1 Jan 24 11:17:02 franky. kernel: Call Trace: Jan 24 11:17:02 franky. kernel: [<ffffffff810d2a81>] ? cpuset_print_task_mems_allowed+0x91/0xa0 Jan 24 11:17:02 franky. kernel: [<ffffffff8162c8b4>] dump_header+0x79/0x1b3 Jan 24 11:17:02 franky. kernel: [<ffffffff811330e7>] ? find_lock_task_mm+0x27/0x70 Jan 24 11:17:02 franky. kernel: [<ffffffff8118af70>] ? try_get_mem_cgroup_from_mm+0x50/0x70 Jan 24 11:17:02 franky. kernel: [<ffffffff812ec783>] ? ___ratelimit+0xa3/0x120 Jan 24 11:17:02 franky. kernel: [<ffffffff811334b7>] oom_kill_process+0x1c7/0x310 Jan 24 11:17:02 franky. kernel: [<ffffffff8106acd5>] ? has_ns_capability_noaudit+0x15/0x20 Jan 24 11:17:02 franky. kernel: [<ffffffff8118dcdf>] mem_cgroup_out_of_memory+0x29f/0x2c0 Jan 24 11:17:02 franky. kernel: [<ffffffff8118e473>] __mem_cgroup_try_charge+0x773/0x8d0 Jan 24 11:17:02 franky. kernel: [<ffffffff814c265a>] ? dm_get_live_table+0x4a/0x60 Jan 24 11:17:02 franky. kernel: [<ffffffff8118d6c0>] ? mem_cgroup_write+0x3b0/0x3b0 Jan 24 11:17:02 franky. kernel: [<ffffffff8118ed9e>] mem_cgroup_charge_common+0x8e/0x120 Jan 24 11:17:02 franky. kernel: [<ffffffff8118f86a>] mem_cgroup_cache_charge+0x7a/0xa0 Jan 24 11:17:02 franky. kernel: [<ffffffff81130447>] add_to_page_cache_locked+0x67/0x1a0 Jan 24 11:17:02 franky. kernel: [<ffffffff8113059a>] add_to_page_cache_lru+0x1a/0x40 Jan 24 11:17:02 franky. kernel: [<ffffffff812fb7e1>] ? list_del+0x11/0x40 Jan 24 11:17:02 franky. kernel: [<ffffffff811d0845>] mpage_readpages+0xa5/0x140 Jan 24 11:17:02 franky. kernel: [<ffffffff81219b80>] ? noalloc_get_block_write+0x30/0x30 Jan 24 11:17:02 franky. kernel: [<ffffffff81219b80>] ? noalloc_get_block_write+0x30/0x30 Jan 24 11:17:02 franky. kernel: [<ffffffff8112f7c7>] ? __page_cache_alloc+0xd7/0x100 Jan 24 11:17:02 franky. kernel: [<ffffffff812158fd>] ext4_readpages+0x1d/0x20 Jan 24 11:17:02 franky. kernel: [<ffffffff8113c0e7>] __do_page_cache_readahead+0x1b7/0x240 Jan 24 11:17:02 franky. kernel: [<ffffffff8113c491>] ra_submit+0x21/0x30 Jan 24 11:17:02 franky. kernel: [<ffffffff8113c5b5>] ondemand_readahead+0x115/0x240 Jan 24 11:17:02 franky. kernel: [<ffffffff8113c7b3>] page_cache_sync_readahead+0x33/0x50 Jan 24 11:17:02 franky. kernel: [<ffffffff81131a10>] generic_file_aio_read+0x540/0x6c0 Jan 24 11:17:02 franky. kernel: [<ffffffff81194b27>] do_sync_read+0xa7/0xe0 Jan 24 11:17:02 franky. kernel: [<ffffffff81195279>] vfs_read+0xa9/0x180 Jan 24 11:17:02 franky. kernel: [<ffffffff8119552a>] sys_pread64+0x9a/0xb0 Jan 24 11:17:02 franky. kernel: [<ffffffff8163ded9>] system_call_fastpath+0x16/0x1b Jan 24 11:17:02 franky. kernel: Task in /libvirt/qemu/Server2008r2 killed as a result of limit of /libvirt/qemu/Server2008r2 Jan 24 11:17:02 franky. kernel: memory: usage 4549836kB, limit 4549836kB, failcnt 565605 Jan 24 11:17:02 franky. kernel: memory+swap: usage 0kB, limit 9007199254740991kB, failcnt 0 Jan 24 11:17:02 franky. kernel: Mem-Info: Jan 24 11:17:02 franky. kernel: Node 0 DMA per-cpu: Jan 24 11:17:02 franky. kernel: CPU 0: hi: 0, btch: 1 usd: 0 Jan 24 11:17:02 franky. kernel: CPU 1: hi: 0, btch: 1 usd: 0 Jan 24 11:17:02 franky. kernel: CPU 2: hi: 0, btch: 1 usd: 0 Jan 24 11:17:02 franky. kernel: CPU 3: hi: 0, btch: 1 usd: 0 Jan 24 11:17:02 franky. kernel: CPU 4: hi: 0, btch: 1 usd: 0 Jan 24 11:17:02 franky. kernel: CPU 5: hi: 0, btch: 1 usd: 0 Jan 24 11:17:02 franky. kernel: CPU 6: hi: 0, btch: 1 usd: 0 Jan 24 11:17:02 franky. kernel: CPU 7: hi: 0, btch: 1 usd: 0 Jan 24 11:17:02 franky. kernel: Node 0 DMA32 per-cpu: Jan 24 11:17:02 franky. kernel: CPU 0: hi: 186, btch: 31 usd: 195 Jan 24 11:17:02 franky. kernel: CPU 1: hi: 186, btch: 31 usd: 192 Jan 24 11:17:02 franky. kernel: CPU 2: hi: 186, btch: 31 usd: 0 Jan 24 11:17:02 franky. kernel: CPU 3: hi: 186, btch: 31 usd: 18 Jan 24 11:17:02 franky. kernel: CPU 4: hi: 186, btch: 31 usd: 30 Jan 24 11:17:02 franky. kernel: CPU 5: hi: 186, btch: 31 usd: 176 Jan 24 11:17:02 franky. kernel: CPU 6: hi: 186, btch: 31 usd: 0 Jan 24 11:17:02 franky. kernel: CPU 7: hi: 186, btch: 31 usd: 30 Jan 24 11:17:02 franky. kernel: Node 0 Normal per-cpu: Jan 24 11:17:02 franky. kernel: CPU 0: hi: 186, btch: 31 usd: 71 Jan 24 11:17:02 franky. kernel: CPU 1: hi: 186, btch: 31 usd: 91 Jan 24 11:17:02 franky. kernel: CPU 2: hi: 186, btch: 31 usd: 146 Jan 24 11:17:02 franky. kernel: CPU 3: hi: 186, btch: 31 usd: 130 Jan 24 11:17:02 franky. kernel: CPU 4: hi: 186, btch: 31 usd: 110 Jan 24 11:17:02 franky. kernel: CPU 5: hi: 186, btch: 31 usd: 130 Jan 24 11:17:02 franky. kernel: CPU 6: hi: 186, btch: 31 usd: 49 Jan 24 11:17:02 franky. kernel: CPU 7: hi: 186, btch: 31 usd: 52 Jan 24 11:17:02 franky. kernel: [291B blob data] Jan 24 11:17:02 franky. kernel: Node 0 DMA free:15888kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):...nreclaimable? no Jan 24 11:17:02 franky. kernel: lowmem_reserve[]: 0 2139 15997 15997 Jan 24 11:17:02 franky. kernel: Node 0 DMA32 free:2175560kB min:9028kB low:11284kB high:13540kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB iso...nreclaimable? no Jan 24 11:17:02 franky. kernel: lowmem_reserve[]: 0 0 13858 13858 Jan 24 11:17:02 franky. kernel: Node 0 Normal free:4780740kB min:58488kB low:73108kB high:87732kB active_anon:5275760kB inactive_anon:206408kB active_file:1650712kB inactive_file:1709500kB unevictable:4...nreclaimable? no Jan 24 11:17:02 franky. kernel: lowmem_reserve[]: 0 0 0 0 Jan 24 11:17:02 franky. kernel: Node 0 DMA: 2*4kB 1*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15888kB Jan 24 11:17:02 franky. kernel: Node 0 DMA32: 14*4kB 10*8kB 10*16kB 9*32kB 16*64kB 8*128kB 12*256kB 6*512kB 4*1024kB 6*2048kB 525*4096kB = 2175560kB Jan 24 11:17:02 franky. kernel: Node 0 Normal: 368*4kB 591*8kB 283*16kB 157*32kB 421*64kB 601*128kB 237*256kB 162*512kB 177*1024kB 23*2048kB 1047*4096kB = 4780104kB Jan 24 11:17:02 franky. kernel: 862236 total pagecache pages Jan 24 11:17:02 franky. kernel: 7631 pages in swap cache Jan 24 11:17:02 franky. kernel: Swap cache stats: add 18439, delete 10808, find 3479/3479 Jan 24 11:17:02 franky. kernel: Free swap = 8145088kB Jan 24 11:17:02 franky. kernel: Total swap = 8175612kB Jan 24 11:17:02 franky. kernel: 4193776 pages RAM Jan 24 11:17:02 franky. kernel: 107207 pages reserved Jan 24 11:17:02 franky. kernel: 2680404 pages shared Jan 24 11:17:02 franky. kernel: 1862358 pages non-shared Jan 24 11:17:02 franky. kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Jan 24 11:17:02 franky. kernel: [13623] 107 13623 1275536 1059312 2236 6839 0 qemu-kvm Jan 24 11:17:02 franky. kernel: Memory cgroup out of memory: Kill process 13993 (qemu-kvm) score 335 or sacrifice child Jan 24 11:17:02 franky. kernel: Killed process 13993 (qemu-kvm) total-vm:5102144kB, anon-rss:4226008kB, file-rss:11240kB Free reports: total used free shared buffers cached Mem: 16346276 5134592 11211684 0 276768 3143840 -/+ buffers/cache: 1713984 14632292 Swap: 8175612 0 8175612 The VM in question was allocated 4096MB of ram. Where is this cgroup limit coming from?