Description of problem: An Ubuntu guest predictably runs out of memory and (the whole qemu process) is OOM-killed when the guest is running the libguestfs test suite. It always happens at a specific stage in the test suite. Host kernel: [1075912.618514] qemu-system-x86 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 [1075912.618518] qemu-system-x86 cpuset=emulator mems_allowed=0 [1075912.618520] Pid: 13084, comm: qemu-system-x86 Not tainted 3.8.11-200.fc18.x86_64 #1 [1075912.618521] Call Trace: [1075912.618527] [<ffffffff810d57a6>] ? cpuset_print_task_mems_allowed+0x96/0xc0 [1075912.618530] [<ffffffff8164af13>] dump_header+0x7a/0x1b7 [1075912.618534] [<ffffffff81135b37>] ? find_lock_task_mm+0x27/0x70 [1075912.618537] [<ffffffff81192da8>] ? try_get_mem_cgroup_from_mm+0x48/0x60 [1075912.618540] [<ffffffff812fe1d3>] ? ___ratelimit+0xa3/0x120 [1075912.618542] [<ffffffff81135f17>] oom_kill_process+0x1c7/0x310 [1075912.618545] [<ffffffff8106a9d5>] ? has_ns_capability_noaudit+0x15/0x20 [1075912.618547] [<ffffffff81195d89>] __mem_cgroup_try_charge+0xb39/0xb90 [1075912.618549] [<ffffffff81196640>] ? mem_cgroup_charge_common+0x120/0x120 [1075912.618550] [<ffffffff811965ae>] mem_cgroup_charge_common+0x8e/0x120 [1075912.618552] [<ffffffff8119814a>] mem_cgroup_cache_charge+0x7a/0x90 [1075912.618554] [<ffffffff8117a5e8>] ? alloc_pages_current+0xb8/0x190 [1075912.618556] [<ffffffff81132fe7>] add_to_page_cache_locked+0x67/0x1a0 [1075912.618558] [<ffffffff8113313a>] add_to_page_cache_lru+0x1a/0x40 [1075912.618560] [<ffffffff811331f2>] grab_cache_page_write_begin+0x92/0xf0 [1075912.618562] [<ffffffff811d4290>] ? I_BDEV+0x10/0x10 [1075912.618564] [<ffffffff811d1598>] block_write_begin+0x38/0xa0 [1075912.618566] [<ffffffff811323c1>] ? unlock_page+0x31/0x50 [1075912.618568] [<ffffffff811d4723>] blkdev_write_begin+0x23/0x30 [1075912.618570] [<ffffffff81131d36>] generic_file_buffered_write+0x116/0x280 [1075912.618573] [<ffffffff8109afd2>] ? dequeue_task_fair+0x332/0x500 [1075912.618576] [<ffffffff811b6813>] ? file_update_time+0xa3/0xf0 [1075912.618578] [<ffffffff81133d71>] __generic_file_aio_write+0x1d1/0x3d0 [1075912.618580] [<ffffffff811d4e76>] blkdev_aio_write+0x56/0xc0 [1075912.618582] [<ffffffff811d4e20>] ? bd_may_claim+0x50/0x50 [1075912.618584] [<ffffffff8119e103>] do_sync_readv_writev+0xa3/0xe0 [1075912.618586] [<ffffffff8119e3e4>] do_readv_writev+0xd4/0x1e0 [1075912.618588] [<ffffffff8119e525>] vfs_writev+0x35/0x60 [1075912.618589] [<ffffffff8119e8d2>] sys_pwritev+0xc2/0xe0 [1075912.618592] [<ffffffff8165be19>] system_call_fastpath+0x16/0x1b [1075912.618594] Task in /machine/ubuntu1304.libvirt-qemu killed as a result of limit of /machine/ubuntu1304.libvirt-qemu [1075912.618596] memory: usage 3397120kB, limit 3397120kB, failcnt 635235 [1075912.618596] memory+swap: usage 0kB, limit 9007199254740991kB, failcnt 0 [1075912.618597] kmem: usage 0kB, limit 9007199254740991kB, failcnt 0 [1075912.618598] Mem-Info: [1075912.618599] Node 0 DMA per-cpu: [1075912.618600] CPU 0: hi: 0, btch: 1 usd: 0 [1075912.618601] CPU 1: hi: 0, btch: 1 usd: 0 [1075912.618602] CPU 2: hi: 0, btch: 1 usd: 0 [1075912.618603] CPU 3: hi: 0, btch: 1 usd: 0 [1075912.618603] Node 0 DMA32 per-cpu: [1075912.618605] CPU 0: hi: 186, btch: 31 usd: 9 [1075912.618605] CPU 1: hi: 186, btch: 31 usd: 3 [1075912.618606] CPU 2: hi: 186, btch: 31 usd: 3 [1075912.618607] CPU 3: hi: 186, btch: 31 usd: 3 [1075912.618608] Node 0 Normal per-cpu: [1075912.618608] CPU 0: hi: 186, btch: 31 usd: 131 [1075912.618609] CPU 1: hi: 186, btch: 31 usd: 100 [1075912.618610] CPU 2: hi: 186, btch: 31 usd: 134 [1075912.618611] CPU 3: hi: 186, btch: 31 usd: 131 [1075912.618613] active_anon:688815 inactive_anon:372401 isolated_anon:0 [1075912.618613] active_file:1098909 inactive_file:1003745 isolated_file:0 [1075912.618613] unevictable:875 dirty:157530 writeback:0 unstable:0 [1075912.618613] free:668807 slab_reclaimable:115948 slab_unreclaimable:28172 [1075912.618613] mapped:41634 shmem:89449 pagetables:15090 bounce:0 [1075912.618613] free_cma:0 [1075912.618615] Node 0 DMA free:15852kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15644kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:48kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [1075912.618618] lowmem_reserve[]: 0 3238 15813 15813 [1075912.618620] Node 0 DMA32 free:1475044kB min:13828kB low:17284kB high:20740kB active_anon:207532kB inactive_anon:178652kB active_file:653540kB inactive_file:634708kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3316504kB managed:3256344kB mlocked:0kB dirty:20kB writeback:0kB mapped:12176kB shmem:12896kB slab_reclaimable:130968kB slab_unreclaimable:7884kB kernel_stack:88kB pagetables:2964kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [1075912.618623] lowmem_reserve[]: 0 0 12574 12574 [1075912.618624] Node 0 Normal free:1184332kB min:53688kB low:67108kB high:80532kB active_anon:2547728kB inactive_anon:1310952kB active_file:3742096kB inactive_file:3380272kB unevictable:3500kB isolated(anon):0kB isolated(file):0kB present:12876192kB managed:12820908kB mlocked:3500kB dirty:630100kB writeback:0kB mapped:154360kB shmem:344900kB slab_reclaimable:332824kB slab_unreclaimable:104756kB kernel_stack:4032kB pagetables:57396kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [1075912.618627] lowmem_reserve[]: 0 0 0 0 [1075912.618628] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15852kB [1075912.618635] Node 0 DMA32: 54351*4kB (UEMR) 37143*8kB (UEMR) 26291*16kB (UEMR) 8352*32kB (UEMR) 1471*64kB (UEMR) 484*128kB (UEMR) 159*256kB (UEMR) 66*512kB (UM) 39*1024kB (UR) 1*2048kB (U) 0*4096kB = 1475044kB [1075912.618642] Node 0 Normal: 16981*4kB (UEM) 10296*8kB (UEM) 18767*16kB (UEM) 3257*32kB (UEM) 1091*64kB (UEM) 1177*128kB (UEM) 676*256kB (UEM) 287*512kB (UM) 87*1024kB (UEM) 0*2048kB 0*4096kB = 1184356kB [1075912.618648] 2220715 total pagecache pages [1075912.618649] 28062 pages in swap cache [1075912.618650] Swap cache stats: add 65922, delete 37860, find 351114/351217 [1075912.618651] Free swap = 3639684kB [1075912.618651] Total swap = 3751932kB [1075912.654042] 4122096 pages RAM [1075912.654044] 89948 pages reserved [1075912.654045] 2668760 pages shared [1075912.654045] 1848270 pages non-shared [1075912.654046] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [1075912.654092] [ 7499] 107 7499 1196253 535186 1317 3340 0 qemu-system-x86 [1075912.654097] Memory cgroup out of memory: Kill process 13089 (qemu-system-x86) score 302 or sacrifice child [1075912.654100] Killed process 13089 (qemu-system-x86) total-vm:4785012kB, anon-rss:2127684kB, file-rss:13060kB [1075921.413224] virbr0: port 2(vnet0) entered disabled state [1075921.413563] device vnet0 left promiscuous mode [1075921.413570] virbr0: port 2(vnet0) entered disabled state In the guest: make check-TESTS make[4]: Entering directory `/home/rjones/d/libguestfs/tests/md' 121 seconds: ./test-inspect-fstab.sh PASS: test-inspect-fstab.sh <-- guest killed here This looks like a re-occurrence of bug 903432. It might also be caused by a memory leak in qemu. Version-Release number of selected component (if applicable): libvirt-daemon-qemu-1.0.5.1-1.fc20.x86_64 (self-compiled from Fedora) qemu-1.5.0-1.fc18.x86_64 (self-compiled from Fedora) kernel 3.8.11-200.fc18.x86_64 Fedora 18-ish base with several packages upgraded to F19/Rawhide How reproducible: 100% Steps to Reproduce: 1. Ubuntu 13.04 guest. 2. Clone libguestfs from https://github.com/libguestfs/libguestfs/tree/ubuntu 3. Run 'debuild -i -uc -us -b' Actual results: Guest is killed.
So I can pretty reliably make qemu be killed by running `make -C tests/md check' over and over inside the guest. I'm not sure if it's this specific test that is causing it to die or just any use of libguestfs. Here is 'top' on the host showing qemu every 10 seconds until it died. It doesn't appear to grow very much. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14456 qemu 20 0 4641m 1.8g 12m S 101.9 11.7 2:44.89 qemu-system-x86 14456 qemu 20 0 4641m 1.8g 12m S 97.4 11.7 2:54.59 qemu-system-x86 14456 qemu 20 0 4641m 1.8g 12m S 97.3 11.7 3:04.50 qemu-system-x86 14456 qemu 20 0 4641m 1.8g 12m S 94.3 11.7 3:15.01 qemu-system-x86 14456 qemu 20 0 4641m 1.8g 12m S 6.5 11.7 3:23.54 qemu-system-x86 14456 qemu 20 0 4625m 1.8g 12m S 97.3 11.7 3:25.58 qemu-system-x86 14456 qemu 20 0 4625m 1.8g 12m S 102.2 11.7 3:34.90 qemu-system-x86 14456 qemu 20 0 4625m 1.8g 12m S 97.3 11.7 3:44.76 qemu-system-x86 14456 qemu 20 0 4625m 1.8g 12m S 64.8 11.7 3:54.48 qemu-system-x86 14456 qemu 20 0 4625m 1.8g 12m S 97.4 11.7 4:03.68 qemu-system-x86 14456 qemu 20 0 4625m 1.8g 12m S 97.3 11.7 4:13.14 qemu-system-x86 14456 qemu 20 0 4625m 1.8g 12m S 103.5 11.7 4:23.24 qemu-system-x86 14456 qemu 20 0 4625m 1.8g 12m S 97.3 11.7 4:32.95 qemu-system-x86 14456 qemu 20 0 4625m 1.8g 12m S 96.7 11.7 4:42.87 qemu-system-x86 14456 qemu 20 0 4625m 1.8g 12m S 97.0 11.7 4:52.02 qemu-system-x86 14456 qemu 20 0 4625m 1.8g 12m S 97.3 11.7 5:01.97 qemu-system-x86 [... omitted lots of uninteresting lines in the middle ...] 14456 qemu 20 0 4697m 2.0g 12m S 0.0 13.3 30:18.25 qemu-system-x86 14456 qemu 20 0 4673m 2.0g 12m S 0.0 13.3 30:18.33 qemu-system-x86 14456 qemu 20 0 4673m 2.0g 12m S 6.2 13.3 30:18.48 qemu-system-x86 14456 qemu 20 0 4649m 2.0g 12m S 0.0 13.3 30:18.63 qemu-system-x86 14456 qemu 20 0 4649m 2.0g 12m S 0.0 13.3 30:18.71 qemu-system-x86 14456 qemu 20 0 4625m 2.0g 12m S 0.0 13.3 30:18.85 qemu-system-x86 14456 qemu 20 0 4625m 2.0g 12m S 0.0 13.3 30:18.96 qemu-system-x86 14456 qemu 20 0 4625m 2.0g 12m S 0.0 13.3 30:19.11 qemu-system-x86 14456 qemu 20 0 4625m 2.0g 12m S 0.0 13.3 30:19.19 qemu-system-x86 14456 qemu 20 0 4665m 2.0g 12m S 0.0 13.3 30:19.26 qemu-system-x86 14456 qemu 20 0 4625m 2.0g 12m S 0.0 13.3 30:19.33 qemu-system-x86 14456 qemu 20 0 4625m 2.0g 12m S 0.0 13.3 30:19.46 qemu-system-x86 14456 qemu 20 0 4625m 2.0g 12m S 0.0 13.3 30:19.59 qemu-system-x86 14456 qemu 20 0 4625m 2.0g 12m S 0.0 13.3 30:19.73 qemu-system-x86 14456 qemu 20 0 4625m 2.0g 12m S 0.0 13.3 30:19.84 qemu-system-x86 14456 qemu 20 0 4649m 2.0g 12m S 97.0 13.3 30:25.40 qemu-system-x86 14456 qemu 20 0 4649m 2.0g 12m S 97.0 13.3 30:34.78 qemu-system-x86 14456 qemu 20 0 4649m 2.0g 12m S 12.4 13.3 30:44.45 qemu-system-x86 14456 qemu 20 0 4649m 2.0g 12m S 97.2 13.3 30:54.40 qemu-system-x86 14456 qemu 20 0 4649m 2.0g 12m S 103.9 13.3 31:04.36 qemu-system-x86 14456 qemu 20 0 4649m 2.0g 12m S 97.3 13.3 31:13.93 qemu-system-x86 14456 qemu 20 0 4649m 2.0g 12m S 97.3 13.3 31:23.94 qemu-system-x86 14456 qemu 20 0 4649m 2.0g 12m S 96.1 13.3 31:33.97 qemu-system-x86 14456 qemu 20 0 4649m 2.0g 12m S 103.3 13.3 31:43.98 qemu-system-x86 14456 qemu 20 0 4649m 2.0g 12m S 97.4 13.3 31:54.12 qemu-system-x86 14456 qemu 20 0 4649m 2.0g 12m S 97.3 13.3 32:03.28 qemu-system-x86 14456 qemu 20 0 4641m 2.0g 12m S 97.4 13.3 32:13.35 qemu-system-x86 14456 qemu 20 0 4689m 2.0g 12m S 103.4 13.3 32:23.58 qemu-system-x86 14456 qemu 20 0 4689m 2.0g 12m S 97.3 13.3 32:33.25 qemu-system-x86 14456 qemu 20 0 4689m 2.0g 12m S 97.3 13.3 32:43.46 qemu-system-x86 14456 qemu 20 0 0 0 0 Z 0.0 0.0 32:50.19 qemu-system-x86 <--- killed here by oom-killer
Created attachment 752549 [details] guest XML
Created attachment 752560 [details] /proc/<PID>/maps from qemu
I reran it, running top every 1 second instead of every 10 seconds. It's not very interesting, but here are the top results right before the crash: 16578 qemu 20 0 4643m 1.8g 12m S 103.9 12.0 12:30.27 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 114.8 12.0 12:31.64 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m R 96.1 12.0 12:32.89 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 102.2 12.0 12:34.20 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 96.6 12.0 12:35.44 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 102.1 12.0 12:36.60 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 102.1 12.0 12:37.76 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 97.3 12.0 12:38.91 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 102.4 12.0 12:40.07 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 95.2 12.0 12:41.22 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 102.7 12.0 12:42.38 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 103.8 12.0 12:43.55 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 97.3 12.0 12:44.70 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 97.3 12.0 12:45.76 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 97.4 12.0 12:46.90 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 103.5 12.0 12:48.06 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 97.3 12.0 12:49.21 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 103.7 12.0 12:50.37 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 12.6 12.0 12:51.25 qemu-system-x86 16578 qemu 20 0 4643m 1.8g 12m S 103.8 12.0 12:51.94 qemu-system-x86 16578 qemu 20 0 0 0 0 R 24.8 0.0 12:52.95 qemu-system-x86 16578 qemu 20 0 0 0 0 D 0.0 0.0 12:53.30 qemu-system-x86 16578 qemu 20 0 0 0 0 D 0.0 0.0 12:53.31 qemu-system-x86 16578 qemu 20 0 0 0 0 D 0.0 0.0 12:53.31 qemu-system-x86 16578 qemu 20 0 0 0 0 D 0.0 0.0 12:53.31 qemu-system-x86 16578 qemu 20 0 0 0 0 D 0.0 0.0 12:53.31 qemu-system-x86 16578 qemu 20 0 0 0 0 D 0.0 0.0 12:53.31 qemu-system-x86 16578 qemu 20 0 0 0 0 D 0.0 0.0 12:53.31 qemu-system-x86 16578 qemu 20 0 0 0 0 D 0.0 0.0 12:53.31 qemu-system-x86 16578 qemu 20 0 0 0 0 D 0.0 0.0 12:53.31 qemu-system-x86 16578 qemu 20 0 0 0 0 D 0.0 0.0 12:53.31 qemu-system-x86 16578 qemu 20 0 0 0 0 D 0.0 0.0 12:53.31 qemu-system-x86
Created attachment 752563 [details] /proc/<PID>/smaps from qemu
Created attachment 752564 [details] /proc/<PID>/status from qemu
The limits were introduced as a fix for bug 771424. Maybe we shouldn't introduce them at all.
This bug appears to have been reported against 'rawhide' during the Fedora 20 development cycle. Changing version to '20'. More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora20
The VIRT for the qemu process has always been *significantly* larger than the RAM limit I set for the VM (like, ~3500MiB vs. 2048MiB), and I don't know why, and I'd sure like to, but that's off topic for this ticket. :) What is on topic is that I'm running Fedora 19, and I've been overtaxing my poor hypervisor for years, and everything was fine until I noticed that it was swapping the qemus with plenty of RAM left and set swappiness to 0 (and later to 1; the following happened when it was at 1), thus leading to: Nov 5 14:20:18 basti kernel: [110395.859780] qemu-system-x86 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Nov 5 14:20:18 basti kernel: [110395.909095] qemu-system-x86 cpuset=emulator mems_allowed=0 Nov 5 14:20:18 basti kernel: [110395.956727] CPU: 1 PID: 15602 Comm: qemu-system-x86 Not tainted 3.10.3-300.fc19.x86_64 #1 Nov 5 14:20:18 basti kernel: [110396.004113] Hardware name: Dell Inc. Precision WorkStation 390 /0DN075, BIOS 2.1.2 11/30/2006 Nov 5 14:20:18 basti kernel: [110396.051614] ffff88021edc1000 ffff880209d0f9c8 ffffffff81643216 ffff880209d0fa30 Nov 5 14:20:18 basti kernel: [110396.098018] ffffffff81640274 ffff880209d0f9f8 ffffffff8118c058 ffff880230136320 Nov 5 14:20:18 basti kernel: [110396.142619] 0000000000000000 ffff880200000000 0000000000000206 ffff88022f4d16e0 Nov 5 14:20:18 basti kernel: [110396.186463] Call Trace: Nov 5 14:20:18 basti kernel: [110396.229494] [<ffffffff81643216>] dump_stack+0x19/0x1b Nov 5 14:20:18 basti kernel: [110396.272976] [<ffffffff81640274>] dump_header+0x7a/0x1b6 Nov 5 14:20:18 basti kernel: [110396.315418] [<ffffffff8118c058>] ? try_get_mem_cgroup_from_mm+0x28/0x60 Nov 5 14:20:18 basti kernel: [110396.358233] [<ffffffff811319ce>] oom_kill_process+0x1be/0x310 Nov 5 14:20:18 basti kernel: [110396.400675] [<ffffffff8118c227>] ? mem_cgroup_iter+0x197/0x2f0 Nov 5 14:20:18 basti kernel: [110396.442810] [<ffffffff8118f15e>] __mem_cgroup_try_charge+0xade/0xb60 Nov 5 14:20:18 basti kernel: [110396.484348] [<ffffffff8118fa40>] ? mem_cgroup_charge_common+0x120/0x120 Nov 5 14:20:18 basti kernel: [110396.525722] [<ffffffff8118f9a6>] mem_cgroup_charge_common+0x86/0x120 Nov 5 14:20:18 basti kernel: [110396.566974] [<ffffffff8119172a>] mem_cgroup_cache_charge+0x7a/0xa0 Nov 5 14:20:18 basti kernel: [110396.608311] [<ffffffff8112e958>] add_to_page_cache_locked+0x58/0x1d0 Nov 5 14:20:18 basti kernel: [110396.649999] [<ffffffff8112eaea>] add_to_page_cache_lru+0x1a/0x40 Nov 5 14:20:18 basti kernel: [110396.691548] [<ffffffff812fb61d>] ? list_del+0xd/0x30 Nov 5 14:20:18 basti kernel: [110396.732642] [<ffffffff8113a75d>] __do_page_cache_readahead+0x21d/0x240 Nov 5 14:20:18 basti kernel: [110396.773957] [<ffffffff8113aba6>] ondemand_readahead+0x126/0x250 Nov 5 14:20:18 basti kernel: [110396.815872] [<ffffffff8113ad03>] page_cache_sync_readahead+0x33/0x50 Nov 5 14:20:18 basti kernel: [110396.858487] [<ffffffff8112fcd5>] generic_file_aio_read+0x4b5/0x700 Nov 5 14:20:18 basti kernel: [110396.901378] [<ffffffff811cd5ec>] blkdev_aio_read+0x4c/0x70 Nov 5 14:20:18 basti kernel: [110396.943950] [<ffffffff81196fd0>] do_sync_read+0x80/0xb0 Nov 5 14:20:18 basti kernel: [110396.986213] [<ffffffff811975dc>] vfs_read+0x9c/0x170 Nov 5 14:20:18 basti kernel: [110397.028857] [<ffffffff81198202>] SyS_pread64+0x72/0xb0 Nov 5 14:20:18 basti kernel: [110397.071044] [<ffffffff81651819>] system_call_fastpath+0x16/0x1b Nov 5 14:20:18 basti kernel: [110397.113626] Task in /machine/stodi.libvirt-qemu killed as a result of limit of /machine/stodi.libvirt-qemu Nov 5 14:20:18 basti kernel: [110397.158316] memory: usage 2047992kB, limit 2048000kB, failcnt 208905 Nov 5 14:20:18 basti kernel: [110397.203441] memory+swap: usage 0kB, limit 9007199254740991kB, failcnt 0 Nov 5 14:20:18 basti kernel: [110397.248441] kmem: usage 0kB, limit 9007199254740991kB, failcnt 0 Nov 5 14:20:19 basti kernel: [110397.294061] Memory cgroup stats for /machine/stodi.libvirt-qemu: cache:6488KB rss:2041400KB rss_huge:651264KB mapped_file:40KB inactive_anon:579124KB active_anon:1462288KB inactiv e_file:3308KB active_file:3168KB unevictable:0KB Nov 5 14:20:19 basti kernel: [110397.392164] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Nov 5 14:20:19 basti kernel: [110397.442373] [ 9188] 107 9188 895249 512501 1208 0 0 qemu-system-x86 Nov 5 14:20:19 basti kernel: [110397.494000] Memory cgroup out of memory: Kill process 16089 (qemu-system-x86) score 1003 or sacrifice child Nov 5 14:20:19 basti kernel: [110397.546583] Killed process 16089 (qemu-system-x86) total-vm:3580996kB, anon-rss:2039868kB, file-rss:10136kB which I believe to be the issue under discussion. It does not strike me as OK that a silently-chosen limit I didn't know about should cause a VM to be OOMed when there's plenty of RAM left on the hypervisor, just because I tweaked swappiness. On the plus side, now I know about virsh memtune, and that seems to fix things nicely. -Robin
This issue is fixed upstream for a while (and I forgot to update this bug, sorry): commit 16bcb3b61675a88bff00317336b9610080c31000 Author: Michal Privoznik <mprivozn> AuthorDate: Fri Aug 9 14:46:54 2013 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Mon Aug 19 11:16:58 2013 +0200 qemu: Drop qemuDomainMemoryLimit This function is to guess the correct limit for maximal memory usage by qemu for given domain. This can never be guessed correctly, not to mention all the pains and sleepless nights this code has caused. Once somebody discovers algorithm to solve the Halting Problem, we can compute the limit algorithmically. But till then, this code should never see the light of the release again. $ git describe --contains 16bcb3b61675a88bff00317336b9610080c31000 v1.1.2-rc1~86 Therefore, I am closing this bug.
Since comment #9 is talking about fedora 19, we should backport that patch
libvirt-1.0.5.7-1.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/libvirt-1.0.5.7-1.fc19
Confirmed that no surprise memtune is added for a host that doesn't have one. Thanks! -Robin
Package libvirt-1.0.5.7-1.fc19: * should fix your issue, * was pushed to the Fedora 19 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing libvirt-1.0.5.7-1.fc19' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-20798/libvirt-1.0.5.7-1.fc19 then log in and leave karma (feedback).
libvirt-1.0.5.7-1.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.