Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 891653

Summary:	Cgroups memory limit are causing the virt to be terminated unexpectedly
Product:	Red Hat Enterprise Linux 6	Reporter:	Jaroslav Kortus <jkortus>
Component:	libvirt	Assignee:	Michal Privoznik <mprivozn>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	6.4	CC:	acathrow, ajia, dallan, dyasny, dyuan, hans100, itxx00, lsu, mcsontos, mjenner, mzhan, rjones, rwu, thomlee, tlavigne, zhpeng
Target Milestone:	rc	Keywords:	Regression
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	libvirt-0.10.2-15.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1013758 (view as bug list)		Environment:
Last Closed:	2013-02-21 07:29:30 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	895654, 1013758

Description Jaroslav Kortus 2013-01-03 14:41:24 UTC

Description of problem:
During normal VM operation the virtual machine is killed by kernel.

Jan  3 08:01:42 marathon-03 kernel: qemu-kvm invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
Jan  3 08:01:42 marathon-03 kernel: qemu-kvm cpuset=vcpu0 mems_allowed=0-1
Jan  3 08:01:42 marathon-03 kernel: Pid: 5740, comm: qemu-kvm Tainted: G           ---------------  T 2.6.32-343.el6.x86_64 #1
Jan  3 08:01:42 marathon-03 kernel: Call Trace:
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff810cb3c1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff8111caf0>] ? dump_header+0x90/0x1b0
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff81171ef1>] ? task_in_mem_cgroup+0xe1/0x120
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff8111cf72>] ? oom_kill_process+0x82/0x2a0
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff8111ce6e>] ? select_bad_process+0x9e/0x120
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff8111d6f2>] ? mem_cgroup_out_of_memory+0x92/0xb0
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff81173134>] ? mem_cgroup_handle_oom+0x274/0x2a0
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff81170b70>] ? memcg_oom_wake_function+0x0/0xa0
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff81173719>] ? __mem_cgroup_try_charge+0x5b9/0x5d0
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff81174a97>] ? mem_cgroup_charge_common+0x87/0xd0
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff81175250>] ? mem_cgroup_cache_charge+0xc0/0xd0
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff81119f2a>] ? add_to_page_cache_locked+0x4a/0x120
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff8111a02a>] ? add_to_page_cache_lru+0x2a/0x50
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff8111ac1d>] ? grab_cache_page_write_begin+0xbd/0xe0
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff811b7f18>] ? block_write_begin_newtrunc+0x88/0xd0
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff811b82f3>] ? block_write_begin+0x43/0x90
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff811bad50>] ? blkdev_get_block+0x0/0x70
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff811bbf5a>] ? blkdev_write_begin+0x2a/0x30
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff811bad50>] ? blkdev_get_block+0x0/0x70
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff8111a453>] ? generic_file_buffered_write+0x123/0x2e0
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff810756c7>] ? current_fs_time+0x27/0x30
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff8111bec0>] ? __generic_file_aio_write+0x260/0x490
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff811bd457>] ? blkdev_aio_write+0x77/0x130
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff811bd3e0>] ? blkdev_aio_write+0x0/0x130
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff811807fb>] ? do_sync_readv_writev+0xfb/0x140
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff81096aa0>] ? autoremove_wake_function+0x0/0x40
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff812285db>] ? selinux_file_permission+0xfb/0x150
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff8121b4b6>] ? security_file_permission+0x16/0x20
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff81181786>] ? do_readv_writev+0xd6/0x1f0
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff81087316>] ? group_send_sig_info+0x56/0x70
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff8108736f>] ? kill_pid_info+0x3f/0x60
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff811818e6>] ? vfs_writev+0x46/0x60
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff811819a2>] ? sys_pwritev+0xa2/0xc0
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff810dc345>] ? __audit_syscall_exit+0x265/0x290
Jan  3 08:01:42 marathon-03 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Jan  3 08:01:42 marathon-03 kernel: Task in /libvirt/qemu/marathon-03-node-002 killed as a result of limit of /libvirt/qemu/marathon-03-node-002
Jan  3 08:01:42 marathon-03 kernel: memory: usage 1274348kB, limit 1274348kB, failcnt 62487
Jan  3 08:01:42 marathon-03 kernel: memory+swap: usage 1274348kB, limit 9007199254740991kB, failcnt 0

It looks to me that the default limit is probably too strict. This was not happening on 6.3 but it's regularly happening on 6.4qq

Version-Release number of selected component (if applicable):
kernel-2.6.32-343.el6.x86_64
libvirt-0.10.2-9.el6.x86_64


How reproducible:
sometimes

Steps to Reproduce:
1. setup a VM
2. I was doing installation and/or updates as this happened
3.
  
Actual results:
VM gets killed as cgroup is out of memory

Expected results:
cgroup limits aligned so that this does not happen (or fix libvirt if it's misbehaving)

Additional info:

Comment 3 Michal Privoznik 2013-01-04 10:26:35 UTC

Jaroslav,

can you please provide full domain XML (virsh dumpxml <domain-name>) and the qemu version you are seeing this issue with? Thanks in advance!

Comment 4 Jaroslav Kortus 2013-01-04 11:12:27 UTC

Here it is:

$ rpm -qva | grep qem
qemu-img-0.12.1.2-2.337.el6.x86_64
qemu-kvm-0.12.1.2-2.337.el6.x86_64
gpxe-roms-qemu-0.9.7-6.9.el6.noarch

<domain type='kvm'>
  <name>marathon-03-node-002</name>
  <uuid>4bb31d29-1adc-5dfe-cb08-86e48c76e29e</uuid>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type arch='x86_64' machine='rhel6.3.0'>hvm</type>
    <boot dev='network'/>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback' io='native'/>
      <source dev='/dev/vg_free/beaker-root-disk-node002'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:00:01:02'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Comment 5 Michal Privoznik 2013-01-04 14:51:49 UTC

(In reply to comment #0)
> <snip/>
> Jan  3 08:01:42 marathon-03 kernel: Task in /libvirt/qemu/marathon-03-node-002 killed as a result of limit of /libvirt/qemu/marathon-03-node-002
> Jan  3 08:01:42 marathon-03 kernel: memory: usage 1274348kB, limit 1274348kB, failcnt 62487
> Jan  3 08:01:42 marathon-03 kernel: memory+swap: usage 1274348kB, limit 9007199254740991kB, failcnt 0
> 
> It looks to me that the default limit is probably too strict. This was not
> happening on 6.3 but it's regularly happening on 6.4qq
> 
> Version-Release number of selected component (if applicable):
> kernel-2.6.32-343.el6.x86_64
> libvirt-0.10.2-9.el6.x86_64
> 


I am not so sure this is a libvirt bug. What we can see here, is that RSS limit has been reached roughly 60k times. However, that is no reason for OOM killer to be invoked as memory should be continuously allocated on swap then. The limit there hasn't been reached not even once. In fact, this is how the whole concept should work (modulo OOM killer): we want to limit how much RSS can a qemu process use to prevent trashing of the whole system in case of mem leak / expoloit.

On the other hand, some other things are counted into the memory.limit_in_bytes, not just bare sum of malloc()-s made within the process, but system buffers and caches as well. And when speaking about qemu, that means virtual disk caches are counted in the RSS limit as well. This has not been taken into account when I introduced the automatic setting of the limit. So I guess we should relax it now.

BTW: we can easily strike libvirt out of the equation:

T1 is a root terminal, T2 is just a terminal (can be root, but doesn't have to)

1) T1: mkdir /sys/fs/cgroup/memory/test
   assuming memory cgroup is mounted under /sys/fs/cgroup
2) T1: echo pidof T2 into /sys/fs/cgroup/memory/test/tasks:
   echo $(pidof T2) >> /sys/fs/cgroup/memory/test/tasks
3) T1: echo 1024000 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
4) T2: dd if=/dev/urandom bs=1M count=5 | hexdump -C | less
5) T1: cat /sys/fs/cgroup/memory/test/memory{.,.memsw.}usage_in_bytes

The steps 1-3 are preparing the environment, in the 4th step a process needing more than 1MB is run; In the 5th step it can be seen if memory is allocated on swap or not. Maybe you need to run the command in 4) multiple times, so /sys/fs/cgroup/memory/test/memory.failcnt reach higher values. If any process in step 4 gets killed, then this is not a libvirt issue.

Comment 6 Jaroslav Kortus 2013-01-04 15:45:54 UTC

I don't have any swap on the machine, maybe that would explain OOM striking :)

Comment 8 Michal Privoznik 2013-01-08 09:38:46 UTC

(In reply to comment #6)
> I don't have any swap on the machine, maybe that would explain OOM striking
> :)

Oh, that's the missing piece which shows flaw in my argumentation. So yes, it is a libvirt bug then. I've proposed patch upstream:

https://www.redhat.com/archives/libvir-list/2013-January/msg00426.html

Comment 9 Michal Privoznik 2013-01-08 17:25:03 UTC

And this one goes to POST:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2013-January/msg00128.html

Comment 14 Richard W.M. Jones 2013-01-18 08:14:56 UTC

Same issue happened to me (F18 host, F19 guest):

[2038698.035269] qemu-kvm invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
[2038698.035273] qemu-kvm cpuset=emulator mems_allowed=0
[2038698.035276] Pid: 16772, comm: qemu-kvm Tainted: G        W    3.6.9-4.fc18.x86_64 #1
[2038698.035277] Call Trace:
[2038698.035283]  [<ffffffff810cfbb1>] ? cpuset_print_task_mems_allowed+0x91/0xa0
[2038698.035287]  [<ffffffff8161aa25>] dump_header+0x80/0x1ba
[2038698.035289]  [<ffffffff8112f457>] ? find_lock_task_mm+0x27/0x70
[2038698.035293]  [<ffffffff81186520>] ? try_get_mem_cgroup_from_mm+0x50/0x70
[2038698.035296]  [<ffffffff812e22e3>] ? ___ratelimit+0xa3/0x120
[2038698.035298]  [<ffffffff8112f827>] oom_kill_process+0x1c7/0x310
[2038698.035301]  [<ffffffff81069245>] ? has_ns_capability_noaudit+0x15/0x20
[2038698.035303]  [<ffffffff8118928f>] mem_cgroup_out_of_memory+0x29f/0x2c0
[2038698.035305]  [<ffffffff81189a24>] __mem_cgroup_try_charge+0x774/0x8e0
[2038698.035307]  [<ffffffff81188c80>] ? mem_cgroup_write+0x3c0/0x3c0
[2038698.035309]  [<ffffffff8118a35e>] mem_cgroup_charge_common+0x8e/0x120
[2038698.035311]  [<ffffffff8118ae1d>] mem_cgroup_cache_charge+0x7d/0xa0
[2038698.035314]  [<ffffffff81171780>] ? alloc_pages_current+0xb0/0x120
[2038698.035318]  [<ffffffff8112c797>] add_to_page_cache_locked+0x67/0x1a0
[2038698.035320]  [<ffffffff8112c8ea>] add_to_page_cache_lru+0x1a/0x40
[2038698.035322]  [<ffffffff8112c9a5>] grab_cache_page_write_begin+0x95/0x100
[2038698.035326]  [<ffffffff811c6e90>] ? blkdev_get_blocks+0xd0/0xd0
[2038698.035328]  [<ffffffff811c3558>] block_write_begin+0x38/0xa0
[2038698.035330]  [<ffffffff8112bb81>] ? unlock_page+0x31/0x50
[2038698.035333]  [<ffffffff811c65d3>] blkdev_write_begin+0x23/0x30
[2038698.035335]  [<ffffffff8112b4f6>] generic_file_buffered_write+0x116/0x280
[2038698.035338]  [<ffffffff811a9e73>] ? file_update_time+0xa3/0xf0
[2038698.035341]  [<ffffffff8112d52d>] __generic_file_aio_write+0x1cd/0x3d0
[2038698.035343]  [<ffffffff811c7242>] blkdev_aio_write+0x72/0xf0
[2038698.035346]  [<ffffffff811c71d0>] ? blkdev_mmap+0x60/0x60
[2038698.035349]  [<ffffffff81190dd3>] do_sync_readv_writev+0xa3/0xe0
[2038698.035352]  [<ffffffff811910b4>] do_readv_writev+0xd4/0x1e0
[2038698.035354]  [<ffffffff811911f5>] vfs_writev+0x35/0x60
[2038698.035356]  [<ffffffff8119156a>] sys_pwritev+0xba/0xd0
[2038698.035360]  [<ffffffff8162b9e9>] system_call_fastpath+0x16/0x1b
[2038698.035362] Task in /libvirt/qemu/f19rawhidex64 killed as a result of limit of /libvirt/qemu/f19rawhidex64
[2038698.035364] memory: usage 2353296kB, limit 2353296kB, failcnt 462213
[2038698.035365] memory+swap: usage 0kB, limit 9007199254740991kB, failcnt 0
[2038698.035366] Mem-Info:
[2038698.035368] Node 0 DMA per-cpu:
[2038698.035369] CPU    0: hi:    0, btch:   1 usd:   0
[2038698.035370] CPU    1: hi:    0, btch:   1 usd:   0
[2038698.035372] CPU    2: hi:    0, btch:   1 usd:   0
[2038698.035373] CPU    3: hi:    0, btch:   1 usd:   0
[2038698.035374] Node 0 DMA32 per-cpu:
[2038698.035375] CPU    0: hi:  186, btch:  31 usd:   1
[2038698.035376] CPU    1: hi:  186, btch:  31 usd:   0
[2038698.035377] CPU    2: hi:  186, btch:  31 usd:   0
[2038698.035378] CPU    3: hi:  186, btch:  31 usd:   1
[2038698.035379] Node 0 Normal per-cpu:
[2038698.035380] CPU    0: hi:  186, btch:  31 usd: 161
[2038698.035381] CPU    1: hi:  186, btch:  31 usd: 171
[2038698.035382] CPU    2: hi:  186, btch:  31 usd: 181
[2038698.035384] CPU    3: hi:  186, btch:  31 usd: 166
[2038698.035387] active_anon:2350253 inactive_anon:496834 isolated_anon:0
 active_file:446525 inactive_file:427315 isolated_file:28
 unevictable:879 dirty:31252 writeback:0 unstable:0
 free:113327 slab_reclaimable:111040 slab_unreclaimable:27377
 mapped:32425 shmem:49634 pagetables:18216 bounce:0
[2038698.035389] Node 0 DMA free:15900kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15644kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[2038698.035393] lowmem_reserve[]: 0 3226 15800 15800
[2038698.035395] Node 0 DMA32 free:165432kB min:13784kB low:17228kB high:20676kB active_anon:1234604kB inactive_anon:613552kB active_file:316780kB inactive_file:718552kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3303868kB mlocked:0kB dirty:0kB writeback:0kB mapped:10676kB shmem:3644kB slab_reclaimable:222140kB slab_unreclaimable:16244kB kernel_stack:168kB pagetables:3776kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[2038698.035400] lowmem_reserve[]: 0 0 12574 12574
[2038698.035402] Node 0 Normal free:271976kB min:53728kB low:67160kB high:80592kB active_anon:8166408kB inactive_anon:1373784kB active_file:1469320kB inactive_file:990708kB unevictable:3516kB isolated(anon):0kB isolated(file):112kB present:12876192kB mlocked:3516kB dirty:125008kB writeback:0kB mapped:119024kB shmem:194892kB slab_reclaimable:222020kB slab_unreclaimable:93264kB kernel_stack:4016kB pagetables:69088kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[2038698.035406] lowmem_reserve[]: 0 0 0 0
[2038698.035409] Node 0 DMA: 1*4kB 1*8kB 1*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15900kB
[2038698.035415] Node 0 DMA32: 4402*4kB 4096*8kB 1372*16kB 580*32kB 288*64kB 138*128kB 64*256kB 23*512kB 10*1024kB 0*2048kB 0*4096kB = 165384kB
[2038698.035420] Node 0 Normal: 6203*4kB 30885*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 271908kB
[2038698.035426] 1008158 total pagecache pages
[2038698.035427] 84128 pages in swap cache
[2038698.035429] Swap cache stats: add 255319, delete 171191, find 806489/807347
[2038698.035429] Free swap  = 3408684kB
[2038698.035430] Total swap = 3751932kB
[2038698.070373] 4122096 pages RAM
[2038698.070375] 89949 pages reserved
[2038698.070376] 828718 pages shared
[2038698.070377] 3188207 pages non-shared
[2038698.070379] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[2038698.070439] [ 7773]   107  7773  1069206   529215    1233     1478             0 qemu-kvm
[2038698.070447] Memory cgroup out of memory: Kill process 16782 (qemu-kvm) score 348 or sacrifice child
[2038698.070452] Killed process 16782 (qemu-kvm) total-vm:4276824kB, anon-rss:2106884kB, file-rss:9976kB
[2038701.959463] virbr0: port 1(vnet0) entered disabled state
[2038701.959791] virbr0: port 1(vnet0) entered disabled state
[2038701.961233] device vnet0 left promiscuous mode
[2038701.961256] virbr0: port 1(vnet0) entered disabled state

Comment 15 Marian Csontos 2013-01-23 09:14:42 UTC

Workaround is to use memtune after installing and before starting domain:

    virsh memtune $node --hard-limit=4GiB --config || :
    virsh start $node;

Comment 16 zhpeng 2013-01-29 08:52:26 UTC

(In reply to comment #15)
> Workaround is to use memtune after installing and before starting domain:
> 
>     virsh memtune $node --hard-limit=4GiB --config || :
>     virsh start $node;

With -18

[root@zhpeng libvirt]#  virsh memtune bug --hard-limit=4GiB --config || :
[root@zhpeng libvirt]# virsh start bug
Domain bug started
[root@zhpeng libvirt]# cat /cgroup/memory/libvirt/qemu/bug/memory.limit_in_bytes 
4294967296

So i think this can workaround the problem.

Comment 18 Luwen Su 2013-01-29 09:59:20 UTC

Hi Michal ,
I can reproduce a similar issue with latest pkgs
# rpm -q libvirt qemu-kvm-rhev kernel
libvirt-0.10.2-18.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6.x86_64
kernel-2.6.32-356.el6.x86_64

Bug 854552 - Host machine gets stuck while rebooting after libvirt failed killing a qemu process

Any sussegesion ?

steps:
two terminal:
The first one
for i in {1..1000} ; do virsh start test-1 ; sleep 50 ; virsh destroy test-1 ; done

The second excute:
for i in {1..20000} ;do echo $i ;  virsh memtune test-1 --hard-limit 100000 --soft-limit 100000 --swap-hard-limit 100021 --live;  virsh memtune test-1 --hard-limit 100000 --soft-limit 100000 --swap-hard-limit 100092 --live; virsh memtune test-1 --hard-limit 100000 --soft-limit 100000 --swap-hard-limit 100021 --live;done

Get:
message.log
localhost kernel: qemu-kvm invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
Jan 29 17:43:08 localhost kernel: qemu-kvm cpuset=vcpu0 mems_allowed=0
Jan 29 17:43:08 localhost kernel: Pid: 16047, comm: qemu-kvm Not tainted 2.6.32-356.el6.x86_64 #1
Jan 29 17:43:08 localhost kernel: Call Trace:
Jan 29 17:43:08 localhost kernel: [<ffffffff810cb5d1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Jan 29 17:43:08 localhost kernel: [<ffffffff8111cd10>] ? dump_header+0x90/0x1b0
Jan 29 17:43:08 localhost kernel: [<ffffffff81172211>] ? task_in_mem_cgroup+0xe1/0x120
Jan 29 17:43:08 localhost kernel: [<ffffffff8111d192>] ? oom_kill_process+0x82/0x2a0
Jan 29 17:43:08 localhost kernel: [<ffffffff8111d08e>] ? select_bad_process+0x9e/0x120
Jan 29 17:43:08 localhost kernel: [<ffffffff8111d912>] ? mem_cgroup_out_of_memory+0x92/0xb0
Jan 29 17:43:08 localhost kernel: [<ffffffff81173454>] ? mem_cgroup_handle_oom+0x274/0x2a0
Jan 29 17:43:08 localhost kernel: [<ffffffff81170e90>] ? memcg_oom_wake_function+0x0/0xa0
Jan 29 17:43:08 localhost kernel: [<ffffffff81173a39>] ? __mem_cgroup_try_charge+0x5b9/0x5d0
Jan 29 17:43:08 localhost kernel: [<ffffffff81174db7>] ? mem_cgroup_charge_common+0x87/0xd0
Jan 29 17:43:08 localhost kernel: [<ffffffff81174e48>] ? mem_cgroup_newpage_charge+0x48/0x50
Jan 29 17:43:08 localhost kernel: [<ffffffff81143d2c>] ? handle_pte_fault+0x79c/0xb50
Jan 29 17:43:08 localhost kernel: [<ffffffff8104baa7>] ? pte_alloc_one+0x37/0x50
Jan 29 17:43:08 localhost kernel: [<ffffffff8117b469>] ? do_huge_pmd_anonymous_page+0xb9/0x380
Jan 29 17:43:08 localhost kernel: [<ffffffff8114431a>] ? handle_mm_fault+0x23a/0x310
Jan 29 17:43:08 localhost kernel: [<ffffffff8114451a>] ? __get_user_pages+0x12a/0x430
Jan 29 17:43:08 localhost kernel: [<ffffffff811448b9>] ? get_user_pages+0x49/0x50
Jan 29 17:43:08 localhost kernel: [<ffffffff8104c307>] ? get_user_pages_fast+0x157/0x1c0
Jan 29 17:43:08 localhost kernel: [<ffffffffa0387343>] ? hva_to_pfn+0x33/0x1a0 [kvm]
Jan 29 17:43:08 localhost kernel: [<ffffffff8150f276>] ? down_read+0x16/0x30
Jan 29 17:43:08 localhost kernel: [<ffffffffa03a29cb>] ? mapping_level+0x17b/0x1d0 [kvm]
Jan 29 17:43:08 localhost kernel: [<ffffffffa03a74bd>] ? paging64_page_fault+0xbd/0x4b0 [kvm]
Jan 29 17:43:08 localhost kernel: [<ffffffffa03a5da8>] ? paging64_gva_to_gpa+0x48/0x90 [kvm]
Jan 29 17:43:08 localhost kernel: [<ffffffffa03988d1>] ? emulator_read_emulated+0x101/0x240 [kvm]
Jan 29 17:43:08 localhost kernel: [<ffffffffa03a3f08>] ? kvm_mmu_page_fault+0x28/0xc0 [kvm]
Jan 29 17:43:08 localhost kernel: [<ffffffffa03ee558>] ? handle_exception+0x2c8/0x390 [kvm_intel]
Jan 29 17:43:08 localhost kernel: [<ffffffff814e7644>] ? wireless_nlevent_process+0x24/0x80
Jan 29 17:43:08 localhost kernel: [<ffffffff814e7644>] ? wireless_nlevent_process+0x24/0x80
Jan 29 17:43:08 localhost kernel: [<ffffffffa03edef3>] ? vmx_handle_exit+0xc3/0x280 [kvm_intel]
Jan 29 17:43:08 localhost kernel: [<ffffffffa039cfb6>] ? kvm_arch_vcpu_ioctl_run+0x486/0x1040 [kvm]
Jan 29 17:43:08 localhost kernel: [<ffffffffa0385ff4>] ? kvm_vcpu_ioctl+0x434/0x580 [kvm]
Jan 29 17:43:08 localhost kernel: [<ffffffff8105e203>] ? perf_event_task_sched_out+0x33/0x80
Jan 29 17:43:08 localhost kernel: [<ffffffff81194eb2>] ? vfs_ioctl+0x22/0xa0
Jan 29 17:43:08 localhost kernel: [<ffffffff8119537a>] ? do_vfs_ioctl+0x3aa/0x580
Jan 29 17:43:08 localhost kernel: [<ffffffff811955d1>] ? sys_ioctl+0x81/0xa0
Jan 29 17:43:08 localhost kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Jan 29 17:43:08 localhost kernel: Task in /libvirt/qemu/test-1 killed as a result of limit of /libvirt/qemu/test-1
Jan 29 17:43:08 localhost kernel: memory: usage 100000kB, limit 100000kB, failcnt 50
Jan 29 17:43:08 localhost kernel: memory+swap: usage 100000kB, limit 100092kB, failcnt 0
Jan 29 17:43:08 localhost kernel: Mem-Info:
Jan 29 17:43:08 localhost kernel: Node 0 DMA per-cpu:
Jan 29 17:43:08 localhost kernel: CPU    0: hi:    0, btch:   1 usd:   0
Jan 29 17:43:08 localhost kernel: CPU    1: hi:    0, btch:   1 usd:   0
Jan 29 17:43:08 localhost kernel: CPU    2: hi:    0, btch:   1 usd:   0
Jan 29 17:43:08 localhost kernel: CPU    3: hi:    0, btch:   1 usd:   0
Jan 29 17:43:08 localhost kernel: Node 0 DMA32 per-cpu:
Jan 29 17:43:08 localhost kernel: CPU    0: hi:  186, btch:  31 usd: 169
Jan 29 17:43:08 localhost kernel: CPU    1: hi:  186, btch:  31 usd: 171
Jan 29 17:43:08 localhost kernel: CPU    2: hi:  186, btch:  31 usd: 167
Jan 29 17:43:08 localhost kernel: CPU    3: hi:  186, btch:  31 usd: 151
Jan 29 17:43:08 localhost kernel: Node 0 Normal per-cpu:
Jan 29 17:43:08 localhost kernel: CPU    0: hi:  186, btch:  31 usd:  51
Jan 29 17:43:08 localhost kernel: CPU    1: hi:  186, btch:  31 usd: 156
Jan 29 17:43:08 localhost kernel: CPU    2: hi:  186, btch:  31 usd:  41
Jan 29 17:43:08 localhost kernel: CPU    3: hi:  186, btch:  31 usd:  21
Jan 29 17:43:08 localhost kernel: active_anon:48476 inactive_anon:5 isolated_anon:0
Jan 29 17:43:08 localhost kernel: active_file:52647 inactive_file:1571866 isolated_file:0
Jan 29 17:43:08 localhost kernel: unevictable:4722 dirty:6 writeback:0 unstable:0
Jan 29 17:43:08 localhost kernel: free:103124 slab_reclaimable:59743 slab_unreclaimable:16092
Jan 29 17:43:08 localhost kernel: mapped:7192 shmem:71 pagetables:1669 bounce:0
Jan 29 17:43:08 localhost kernel: Node 0 DMA free:15720kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15320kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 29 17:43:08 localhost kernel: lowmem_reserve[]: 0 3510 7519 7519
Jan 29 17:43:08 localhost kernel: Node 0 DMA32 free:167996kB min:31492kB low:39364kB high:47236kB active_anon:1196kB inactive_anon:4kB active_file:5128kB inactive_file:3049956kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3595040kB mlocked:0kB dirty:0kB writeback:0kB mapped:24kB shmem:4kB slab_reclaimable:104224kB slab_unreclaimable:476kB kernel_stack:16kB pagetables:268kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 29 17:43:08 localhost kernel: lowmem_reserve[]: 0 0 4008 4008
Jan 29 17:43:08 localhost kernel: Node 0 Normal free:228780kB min:35956kB low:44944kB high:53932kB active_anon:192708kB inactive_anon:16kB active_file:205460kB inactive_file:3237508kB unevictable:18888kB isolated(anon):0kB isolated(file):0kB present:4104640kB mlocked:6628kB dirty:24kB writeback:0kB mapped:28744kB shmem:280kB slab_reclaimable:134748kB slab_unreclaimable:63892kB kernel_stack:1976kB pagetables:6408kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 29 17:43:08 localhost kernel: lowmem_reserve[]: 0 0 0 0
Jan 29 17:43:08 localhost kernel: Node 0 DMA: 2*4kB 0*8kB 2*16kB 2*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15720kB
Jan 29 17:43:08 localhost kernel: Node 0 DMA32: 687*4kB 456*8kB 204*16kB 112*32kB 60*64kB 27*128kB 14*256kB 15*512kB 9*1024kB 12*2048kB 25*4096kB = 167996kB
Jan 29 17:43:08 localhost kernel: Node 0 Normal: 220*4kB 756*8kB 649*16kB 482*32kB 347*64kB 173*128kB 58*256kB 43*512kB 38*1024kB 1*2048kB 18*4096kB = 228640kB
Jan 29 17:43:08 localhost kernel: 1625136 total pagecache pages
Jan 29 17:43:08 localhost kernel: 0 pages in swap cache
Jan 29 17:43:08 localhost kernel: Swap cache stats: add 0, delete 0, find 0/0
Jan 29 17:43:08 localhost kernel: Free swap  = 0kB
Jan 29 17:43:08 localhost kernel: Total swap = 0kB
Jan 29 17:43:08 localhost kernel: 1957887 pages RAM
Jan 29 17:43:08 localhost kernel: 80558 pages reserved
Jan 29 17:43:08 localhost kernel: 1607282 pages shared
Jan 29 17:43:08 localhost kernel: 178657 pages non-shared
Jan 29 17:43:08 localhost kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Jan 29 17:43:08 localhost kernel: [16027]   107 16027   345941    26166   2       0             0 qemu-kvm
Jan 29 17:43:08 localhost kernel: Memory cgroup out of memory: Kill process 16027 (qemu-kvm) score 1000 or sacrifice child
Jan 29 17:43:08 localhost kernel: Killed process 16027, UID 107, (qemu-kvm) total-vm:1383764kB, anon-rss:99984kB, file-rss:4680kB
Jan 29 17:43:08 localhost kernel: virbr0: port 2(vnet0) entering disabled state
Jan 29 17:43:08 localhost kernel: device vnet0 left promiscuous mode
Jan 29 17:43:08 localhost kernel: virbr0: port 2(vnet0) entering disabled state

Comment 19 zhpeng 2013-01-29 10:21:44 UTC

Confirm this with Michal, this bug only loosen default limit. and k=0.5 is expect, so change to VERIFIED.

Comment 20 Jaroslav Kortus 2013-01-30 10:40:26 UTC

Since updating to libvirt-0.10.2-16.el6.x86_64 I have not seeen the issue again.

Comment 21 errata-xmlrpc 2013-02-21 07:29:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html

Comment 22 Jaroslav Kortus 2013-06-13 14:45:10 UTC

So now I've hit this yet again. RHEL6.4 with many RHEL5 guests. It seems harder to hit, but it is still there.

My suggestion is to get rid of the limit completely and allow it as an opt-in feature. As the qemu process did not go wild, it still got forced to swap and after filling up the machine's swap it was killed.

For me this brings far more problems than leaking qemu would cause ;).

Comment 23 Jaroslav Kortus 2013-06-13 15:47:59 UTC

Use this as a workaround to avoid virts being killed by oom due to limits in cgroup (1TiB limit):
<domain type='kvm'>
  <memtune>
    <hard_limit unit='KiB'>1073741824</hard_limit>
  </memtune>
...
</domain>

Comment 24 xingxing 2013-09-30 07:19:34 UTC

libvirt-0.10.2-18.el6_4.14.x86_64
still there.

Comment 25 Thomas Lee 2013-09-30 17:11:53 UTC

We recently had what appears to be this bug occur on one of our critical production machines, running RHEL 6.4.  This issue is marked CLOSED, but if indeed what we've experienced is the same issue, I don't think it's truly resolved.

# rpm -q libvirt
libvirt-0.10.2-18.el6_4.9.x86_64

# virsh dumpxml oasis-replica.1
<domain type='kvm' id='12'>
  <name>oasis-replica.1</name>
  <uuid>bb6756a5-3396-3307-92ca-92e75d453a7b</uuid>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type arch='x86_64' machine='rhel6.3.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/libvirt/images/oasis-replica.1-hda.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/libvirt/images/oasis-replica.1-hdb.qcow2'/>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/net/nas01/srv/oasis-replica.1-hdc.qcow2'/>
      <target dev='hdc' bus='ide'/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:08:35:a4'/>
      <source bridge='br0'/>
      <target dev='vnet2'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='52:54:00:0a:61:6b'/>
      <source bridge='br1'/>
      <target dev='vnet3'/>
      <model type='virtio'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/3'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/3'>
      <source path='/dev/pts/3'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5901' autoport='yes' listen='127.0.0.1' keymap='en-us'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='none'/>
</domain>

Excerpt from /var/log/messages:

Sep 27 12:12:01 vm07 kernel: qemu-kvm invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
Sep 27 12:12:01 vm07 kernel: qemu-kvm cpuset=emulator mems_allowed=0-1
Sep 27 12:12:01 vm07 kernel: Pid: 56054, comm: qemu-kvm Not tainted 2.6.32-358.18.1.el6.x86_64 #1
Sep 27 12:12:01 vm07 kernel: Call Trace:
Sep 27 12:12:01 vm07 kernel: [<ffffffff810cb641>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Sep 27 12:12:01 vm07 kernel: [<ffffffff8111ce40>] ? dump_header+0x90/0x1b0
Sep 27 12:12:01 vm07 kernel: [<ffffffff811725e1>] ? task_in_mem_cgroup+0xe1/0x120
Sep 27 12:12:01 vm07 kernel: [<ffffffff8111d2c2>] ? oom_kill_process+0x82/0x2a0
Sep 27 12:12:01 vm07 kernel: [<ffffffff8111d1be>] ? select_bad_process+0x9e/0x120
Sep 27 12:12:01 vm07 kernel: [<ffffffff8111da42>] ? mem_cgroup_out_of_memory+0x92/0xb0
Sep 27 12:12:01 vm07 kernel: [<ffffffff81173824>] ? mem_cgroup_handle_oom+0x274/0x2a0
Sep 27 12:12:01 vm07 kernel: [<ffffffff81171260>] ? memcg_oom_wake_function+0x0/0xa0
Sep 27 12:12:01 vm07 kernel: [<ffffffff81173e09>] ? __mem_cgroup_try_charge+0x5b9/0x5d0
Sep 27 12:12:01 vm07 kernel: [<ffffffff81175187>] ? mem_cgroup_charge_common+0x87/0xd0
Sep 27 12:12:01 vm07 kernel: [<ffffffff81175218>] ? mem_cgroup_newpage_charge+0x48/0x50
Sep 27 12:12:01 vm07 kernel: [<ffffffff81142ac4>] ? do_wp_page+0x1a4/0x920
Sep 27 12:12:01 vm07 kernel: [<ffffffff81143a3d>] ? handle_pte_fault+0x2cd/0xb50
Sep 27 12:12:01 vm07 kernel: [<ffffffffa0441bdc>] ? nfs_direct_req_free+0x3c/0x50 [nfs]
Sep 27 12:12:01 vm07 kernel: [<ffffffff8127a9e7>] ? kref_put+0x37/0x70
Sep 27 12:12:01 vm07 kernel: [<ffffffffa0441f31>] ? nfs_file_direct_read+0x1c1/0x230 [nfs]
Sep 27 12:12:01 vm07 kernel: [<ffffffff811444fa>] ? handle_mm_fault+0x23a/0x310
Sep 27 12:12:01 vm07 kernel: [<ffffffff810474e9>] ? __do_page_fault+0x139/0x480
Sep 27 12:12:01 vm07 kernel: [<ffffffff810874f6>] ? group_send_sig_info+0x56/0x70
Sep 27 12:12:01 vm07 kernel: [<ffffffff8108754f>] ? kill_pid_info+0x3f/0x60
Sep 27 12:12:01 vm07 kernel: [<ffffffff81513b6e>] ? do_page_fault+0x3e/0xa0
Sep 27 12:12:01 vm07 kernel: [<ffffffff81510f25>] ? page_fault+0x25/0x30
Sep 27 12:12:01 vm07 kernel: Task in /libvirt/qemu/oasis-replica.1 killed as a result of limit of /libvirt/qemu/oasis-replica.1
Sep 27 12:12:01 vm07 kernel: memory: usage 1889792kB, limit 1889792kB, failcnt 90358
Sep 27 12:12:01 vm07 kernel: memory+swap: usage 2879872kB, limit 9007199254740991kB, failcnt 0
Sep 27 12:12:01 vm07 kernel: Mem-Info:
Sep 27 12:12:01 vm07 kernel: Node 0 DMA per-cpu:
Sep 27 12:12:01 vm07 kernel: CPU    0: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU    1: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU    2: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU    3: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU    4: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU    5: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU    6: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU    7: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU    8: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU    9: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU   10: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU   11: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU   12: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU   13: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU   14: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU   15: hi:    0, btch:   1 usd:   0
Sep 27 12:12:01 vm07 kernel: Node 0 DMA32 per-cpu:
Sep 27 12:12:01 vm07 kernel: CPU    0: hi:  186, btch:  31 usd: 184
Sep 27 12:12:01 vm07 kernel: CPU    1: hi:  186, btch:  31 usd: 169
Sep 27 12:12:01 vm07 kernel: CPU    2: hi:  186, btch:  31 usd: 164
Sep 27 12:12:01 vm07 kernel: CPU    3: hi:  186, btch:  31 usd: 160
Sep 27 12:12:01 vm07 kernel: CPU    4: hi:  186, btch:  31 usd: 117
Sep 27 12:12:01 vm07 kernel: CPU    5: hi:  186, btch:  31 usd:  48
Sep 27 12:12:01 vm07 kernel: CPU    6: hi:  186, btch:  31 usd:  60
Sep 27 12:12:01 vm07 kernel: CPU    7: hi:  186, btch:  31 usd:  30
Sep 27 12:12:01 vm07 kernel: CPU    8: hi:  186, btch:  31 usd:  56
Sep 27 12:12:01 vm07 kernel: CPU    9: hi:  186, btch:  31 usd:  23
Sep 27 12:12:01 vm07 kernel: CPU   10: hi:  186, btch:  31 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU   11: hi:  186, btch:  31 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU   13: hi:  186, btch:  31 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU   14: hi:  186, btch:  31 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU   15: hi:  186, btch:  31 usd:   0
Sep 27 12:12:01 vm07 kernel: Node 0 Normal per-cpu:
Sep 27 12:12:01 vm07 kernel: CPU    0: hi:  186, btch:  31 usd:  54
Sep 27 12:12:01 vm07 kernel: CPU    1: hi:  186, btch:  31 usd:  35
Sep 27 12:12:01 vm07 kernel: CPU    2: hi:  186, btch:  31 usd:  36
Sep 27 12:12:01 vm07 kernel: CPU    3: hi:  186, btch:  31 usd:  57
Sep 27 12:12:01 vm07 kernel: CPU    4: hi:  186, btch:  31 usd: 162
Sep 27 12:12:01 vm07 kernel: CPU    5: hi:  186, btch:  31 usd: 156
Sep 27 12:12:01 vm07 kernel: CPU    6: hi:  186, btch:  31 usd: 143
Sep 27 12:12:01 vm07 kernel: CPU    7: hi:  186, btch:  31 usd:  54
Sep 27 12:12:01 vm07 kernel: CPU    8: hi:  186, btch:  31 usd: 161
Sep 27 12:12:01 vm07 kernel: CPU    9: hi:  186, btch:  31 usd: 167
Sep 27 12:12:01 vm07 kernel: CPU   10: hi:  186, btch:  31 usd: 176
Sep 27 12:12:01 vm07 kernel: CPU   11: hi:  186, btch:  31 usd: 137
Sep 27 12:12:01 vm07 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU   13: hi:  186, btch:  31 usd:  96
Sep 27 12:12:01 vm07 kernel: CPU   14: hi:  186, btch:  31 usd: 173
Sep 27 12:12:01 vm07 kernel: CPU   15: hi:  186, btch:  31 usd:  29
Sep 27 12:12:01 vm07 kernel: Node 1 Normal per-cpu:
Sep 27 12:12:01 vm07 kernel: CPU    0: hi:  186, btch:  31 usd: 161
Sep 27 12:12:01 vm07 kernel: CPU    1: hi:  186, btch:  31 usd: 171
Sep 27 12:12:01 vm07 kernel: CPU    2: hi:  186, btch:  31 usd: 174
Sep 27 12:12:01 vm07 kernel: CPU    3: hi:  186, btch:  31 usd: 162
Sep 27 12:12:01 vm07 kernel: CPU    4: hi:  186, btch:  31 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU    5: hi:  186, btch:  31 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU    6: hi:  186, btch:  31 usd:   0
Sep 27 12:12:01 vm07 kernel: CPU    7: hi:  186, btch:  31 usd: 133
Sep 27 12:12:01 vm07 kernel: CPU    8: hi:  186, btch:  31 usd:  22
Sep 27 12:12:01 vm07 kernel: CPU    9: hi:  186, btch:  31 usd:  52
Sep 27 12:12:01 vm07 kernel: CPU   10: hi:  186, btch:  31 usd:  61
Sep 27 12:12:01 vm07 kernel: CPU   11: hi:  186, btch:  31 usd: 100
Sep 27 12:12:01 vm07 kernel: CPU   12: hi:  186, btch:  31 usd: 161
Sep 27 12:12:01 vm07 kernel: CPU   13: hi:  186, btch:  31 usd:  51
Sep 27 12:12:01 vm07 kernel: CPU   14: hi:  186, btch:  31 usd: 178
Sep 27 12:12:01 vm07 kernel: CPU   15: hi:  186, btch:  31 usd: 170
Sep 27 12:12:01 vm07 kernel: active_anon:4527685 inactive_anon:1525557 isolated_anon:0
Sep 27 12:12:01 vm07 kernel: active_file:294282 inactive_file:1265201 isolated_file:0
Sep 27 12:12:01 vm07 kernel: unevictable:0 dirty:2 writeback:0 unstable:0
Sep 27 12:12:01 vm07 kernel: free:315084 slab_reclaimable:22633 slab_unreclaimable:139023
Sep 27 12:12:01 vm07 kernel: mapped:7695 shmem:51 pagetables:17243 bounce:0
Sep 27 12:12:01 vm07 kernel: Node 0 DMA free:15672kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15288kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Sep 27 12:12:01 vm07 kernel: lowmem_reserve[]: 0 2990 16120 16120
Sep 27 12:12:01 vm07 kernel: Node 0 DMA32 free:572920kB min:8348kB low:10432kB high:12520kB active_anon:979480kB inactive_anon:577852kB active_file:86620kB inactive_file:345100kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3062596kB mlocked:0kB dirty:0kB writeback:0kB mapped:124kB shmem:0kB slab_reclaimable:5360kB slab_unreclaimable:100120kB kernel_stack:0kB pagetables:60kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Sep 27 12:12:01 vm07 kernel: lowmem_reserve[]: 0 0 13130 13130
Sep 27 12:12:01 vm07 kernel: Node 0 Normal free:53336kB min:36652kB low:45812kB high:54976kB active_anon:8752624kB inactive_anon:2641696kB active_file:525668kB inactive_file:1182224kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13445120kB mlocked:0kB dirty:8kB writeback:0kB mapped:12888kB shmem:72kB slab_reclaimable:39944kB slab_unreclaimable:169032kB kernel_stack:4640kB pagetables:36948kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Sep 27 12:12:01 vm07 kernel: lowmem_reserve[]: 0 0 0 0
Sep 27 12:12:01 vm07 kernel: Node 1 Normal free:618408kB min:45064kB low:56328kB high:67596kB active_anon:8378636kB inactive_anon:2882680kB active_file:564840kB inactive_file:3533480kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:16531680kB mlocked:0kB dirty:0kB writeback:0kB mapped:17768kB shmem:132kB slab_reclaimable:45228kB slab_unreclaimable:286940kB kernel_stack:824kB pagetables:31964kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Sep 27 12:12:01 vm07 kernel: lowmem_reserve[]: 0 0 0 0
Sep 27 12:12:01 vm07 kernel: Node 0 DMA: 0*4kB 1*8kB 1*16kB 1*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15672kB
Sep 27 12:12:01 vm07 kernel: Node 0 DMA32: 13618*4kB 5805*8kB 3368*16kB 1946*32kB 1058*64kB 395*128kB 265*256kB 176*512kB 76*1024kB 1*2048kB 0*4096kB = 573168kB
Sep 27 12:12:01 vm07 kernel: Node 0 Normal: 1951*4kB 502*8kB 179*16kB 58*32kB 17*64kB 12*128kB 6*256kB 19*512kB 22*1024kB 0*2048kB 0*4096kB = 52956kB
Sep 27 12:12:01 vm07 kernel: Node 1 Normal: 304*4kB 1496*8kB 5642*16kB 2618*32kB 1445*64kB 778*128kB 339*256kB 148*512kB 75*1024kB 0*2048kB 0*4096kB = 618656kB
Sep 27 12:12:01 vm07 kernel: 1824637 total pagecache pages
Sep 27 12:12:01 vm07 kernel: 264866 pages in swap cache
Sep 27 12:12:01 vm07 kernel: Swap cache stats: add 1685236, delete 1420370, find 798807/863738
Sep 27 12:12:01 vm07 kernel: Free swap  = 0kB
Sep 27 12:12:01 vm07 kernel: Total swap = 2097144kB
Sep 27 12:12:01 vm07 kernel: 8384511 pages RAM
Sep 27 12:12:01 vm07 kernel: 172522 pages reserved
Sep 27 12:12:01 vm07 kernel: 1130444 pages shared
Sep 27 12:12:01 vm07 kernel: 7610200 pages non-shared
Sep 27 12:12:01 vm07 kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Sep 27 12:12:01 vm07 kernel: [ 3670]   107  3670  1206371   239985   1       0             0 qemu-kvm
Sep 27 12:12:01 vm07 kernel: Memory cgroup out of memory: Kill process 3670 (qemu-kvm) score 611 or sacrifice child
Sep 27 12:12:01 vm07 kernel: Killed process 3670, UID 107, (qemu-kvm) total-vm:4825484kB, anon-rss:955344kB, file-rss:4596kB
Sep 27 12:12:01 vm07 kernel: Kill process 3694 (vhost-3670) sharing same memory
Sep 27 12:12:01 vm07 kernel: Kill process 3695 (vhost-3670) sharing same memory
Sep 27 12:12:01 vm07 kernel: br0: port 3(vnet2) entering disabled state
Sep 27 12:12:01 vm07 kernel: device vnet2 left promiscuous mode
Sep 27 12:12:01 vm07 kernel: br0: port 3(vnet2) entering disabled state
Sep 27 12:12:01 vm07 kernel: br1: port 3(vnet3) entering disabled state
Sep 27 12:12:01 vm07 kernel: device vnet3 left promiscuous mode
Sep 27 12:12:01 vm07 kernel: br1: port 3(vnet3) entering disabled state

Comment 26 Michal Privoznik 2013-09-30 17:41:58 UTC

The problem is, the heuristic for guessing the maximal amount of mem that qemu will ever use is just incorrect. In fact, it could never be correct. So in RHEL-7 we've decided to drop the heuristic for good (see bug 1001143). But it still lives in 6.4. Anyway, I've opened a new bug for this (seems like bugs in CLOSED ERRATA can't be reopened).

Comment 27 Hansa 2013-11-28 21:32:06 UTC

(In reply to Michal Privoznik from comment #26)

Any suggestions as to work around it?

   virsh memtune $node --hard-limit=8GiB 

Doesn't seem to do the trick here.

Comment 28 Jaroslav Kortus 2013-11-29 08:38:07 UTC

try setting something far beyond you could actually allocate from the VM (I set 1TiB in comment 23).