672216 – ksmd high CPU usage

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 672216 - ksmd high CPU usage

Summary: ksmd high CPU usage

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.1
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Andrea Arcangeli
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	541230
Blocks:
TreeView+	depends on / blocked

Reported:	2011-01-24 12:43 UTC by Ales Kozumplik
Modified:	2014-09-30 23:39 UTC (History)
CC List:	58 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	541230
Environment:
Last Closed:	2013-10-01 14:32:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Ales Kozumplik 2011-01-24 12:43:06 UTC

+++ This bug was initially created as a clone of Bug #541230 +++

Description of problem:
Running two KVM virtual machines and ksmd takes lot of CPU.

Version-Release number of selected component (if applicable):
kernel-2.6.31.5-127.fc12.x86_64

How reproducible:
Do not know currently problem exits, VMs are in use. 

Steps to Reproduce:
1. Start two kvm VMs.
2. Monitor CPU
  
Actual results:
High CPU usage. About 10-25% 

Expected results:
Well ksmd might less aggressively be searching for pages ??.

Additional info:
1)
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6546 qemu      20   0 1396m 1.0g 1468 S 180.1 33.9   2325:48 qemu-kvm
 6645 qemu      20   0 1392m 977m 1468 S 132.7 32.4   1402:57 qemu-kvm
   46 root      25   5     0    0    0 S 12.9  0.0 261:03.42 ksmd

2) Host is upgraded from FC11 to FC 12 with yum.

3) I also started ksmtuned

[root@porto-1-fedora trunk]# service ksmtuned status
ksmtuned (pid  1844) is running...

put no help after 15 minutes of running.

4) Server has Intel(R) Xeon(R) CPU E5405  @ 2.00GHz and both quest
have to VCPU allocated.

--- Additional comment from alexey.pushkin on 2009-11-26 17:31:18 


seeing the same stuff on a rhel6 host. three VMs, all low load, ksmd takes 20% to 40% of the host's CPU. The VMs are visibly slow, for instance the scrolling during boot is sluggish.

Comment 2 RHEL Program Management 2011-02-01 05:49:47 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 3 RHEL Program Management 2011-02-01 19:10:19 UTC

This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 4 RHEL Program Management 2011-04-04 02:25:46 UTC

Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 5 Ales Kozumplik 2011-04-13 12:04:29 UTC

Hello,

I am still seeing this with qemu-kvm-0.12.1.2-2.113.el6_0.6.x86_64.

The virtual machines running on the host do not do much, but ksmd is at around 30%.

top output:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                  
   35 root      25   5     0    0    0 S 31.5  0.0   4061:41 ksmd                                                                                     
11732 qemu      20   0 1301m 593m  604 S  5.9 15.5   1362:31 qemu-kvm                                                                                 
11792 qemu      20   0 1342m 655m  548 S  5.9 17.1   1318:23 qemu-kvm                                                                                 
11763 qemu      20   0 1301m 685m  548 S  3.9 17.9   1324:49 qemu-kvm                                                                                 
22102 qemu      20   0 1381m 860m  996 S  3.9 22.5  78:54.88 qemu-kvm                                                                                 
 6403 qemu      20   0 1035m 488m 1592 S  2.0 12.7   2:43.69 qemu-kvm                                                                                 
    1 root      20   0 19244  436  272 S  0.0  0.0   0:00.83 init                                                                                     
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.29 kthreadd                                                                        

Is this bug assigned correctly? The fedora bug this was cloned from is assigned to virtualization managers, not kernel.

Comment 6 Gary Gatling 2011-04-28 18:29:48 UTC

Hello,

I am seeing this problem also. I am using RHEL 6, with kernel 2.6.32-71.24.1.el6.x86_64. The version of qemu-kvm is: qemu-kvm-0.12.1.2-2.113.el6_0.8.x86_64.

When I first set up this system I had this problem with 4 VMs. I reduced the number of vms to 2 and the problem went away today until I rebooted. Now I see the problem with just 2 vms idleing. (one RHEL 5, one RHEL 6, both 64 bit)

It seems that ksmd is using about 25-30% of cpu.

I have another box (beefier CPU and more RAM) with more vms that does not appear to have this same problem. The machine with the high ksmd cpu percentage has 4 GB of RAM, one vm has 512 mb, the other has 1gb.

The machine not having problems has 8GB memory and has 7 vms on it. The vms are a mix of fedoras and RHELs of various versions. I am pretty sure all those vms have 512 mb.

Top on machine with high ksmd:

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
   35 root      25   5     0    0    0 S 26.3  0.0  55:36.99 ksmd               
 3178 gsgatlin  20   0 1016m 359m  16m S  5.6  9.2   5:01.71 firefox-bin        
 2371 qemu      20   0 1309m 701m 1288 R  4.7 18.0  11:39.93 qemu-kvm           
 2337 qemu      20   0  797m 422m 1304 S  1.3 10.8  19:38.93 qemu-kvm           
 2824 gsgatlin  20   0  240m  24m 7184 S  1.0  0.6   2:18.42 compiz             
 3525 root      20   0  602m  37m  10m S  1.0  1.0   1:29.96 python             
 2482 root      20   0  239m 107m 9480 S  0.7  2.8   3:01.39 Xorg               
 3353 gsgatlin  20   0  104m 5764 2148 S  0.7  0.1   1:15.99 plugin-containe

Comment 7 Andrew Haley 2011-05-03 08:17:08 UTC

The ksmd load can easily be reduced by editing /etc/ksmtuned.conf

See Comment 11 in https://bugzilla.redhat.com/show_bug.cgi?id=541230

Comment 8 Ales Kozumplik 2011-05-04 11:06:27 UTC

(In reply to comment #7)
> The ksmd load can easily be reduced by editing /etc/ksmtuned.conf
> 
> See Comment 11 in https://bugzilla.redhat.com/show_bug.cgi?id=541230

Yes, thanks for pointing that out, I'll update my config.

I still think this is weird, sometimes it looks like the high CPU usage doesn't happen until after few weeks while the number of running machines is constant.

Comment 9 RHEL Program Management 2011-10-07 15:21:18 UTC

Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 13 Andrea Arcangeli 2013-04-05 12:20:39 UTC

2% sounds not excessive to me as well, 400 minutes of ksmd runtime over 15 days is 1.8% of a single core. Not excessive in my view.

However if you've plenty of RAM available and you don't need KSM you can shut it off.

Ideally the KSM scan should slowdown more dynamically if the merging rate decreases (showing the KSM scan is less worthwhile for the workload), so if you don't need it, it will use less than 2% of one core, but that is material not for RHEL6 and it'd probably be better implemented with a ksmtuned feature in the kernel.

We can't know automatically if the admin will benefit from KSM or if there's plenty of RAM and KSM will never be needed, so it's up to the admin to shut it off if not needed. By default the above scan rate looks ok and not too aggressive.

Comment 14 Oliver Henshaw 2013-04-05 12:38:51 UTC

I can't see any reference to 400 minutes over 15 days in a casual perusal of this bug report, maybe I'm missing it? Nevertheless, ksmd takes a high proportion of cpu time when it is running.

So it is to be hoped that it's doing something useful with that cpu time - bug #541230 comment #35 suggests that this isn't always the case.

Comment 15 Andrea Arcangeli 2013-04-09 15:58:15 UTC

There's no guarantee that the work done by ksmd is beneficial in any way. But the point is that we should spend little CPU on it, because even if it's not beneficial now, it could be later.

The 400min over 15 days are decuted by these numbers:

[root@cobra03 ~]# uptime 
 14:20:06 up 15 days,  1:52,  1 user,  load average: 1.91, 2.51, 2.63

This is the ksmd's top:
   39 root      25   5     0    0    0 S  2.0  0.0 400:58.15 ksmd


uptime says 15 days. The number of minutes ksmd used 1 single core was 400min, 58sec, 15centsofsec.

400./(15*24*60) = 1.8% of a single core

That to me doesn't look excessive and it matches the "instant" 2% here:

   39 root      25   5     0    0    0 S  2.0  0.0 400:58.15 ksmd
                                          ^^^

The problem is with KSM there's no exact rule of how much CPU you want to spend. But it's all tunable in /etc/ksmtuned.conf.

If you don't benefit from KSM and you have plenty of memory, you can should disable ksmd.

Comment 16 Frank Ch. Eigler 2013-04-09 16:06:33 UTC

"If you don't benefit from KSM ..."

How does a user/admin know if they are benefitting from KSM?  The ksmtuned.log file doesn't appear to help inform the cost/benefit question.

Comment 17 Andrea Arcangeli 2013-04-09 17:19:06 UTC

KSM is a tradeoff between spending CPU to save memory, or not spending CPU and not saving memory. There is no hard rule in deciding if to turn KSM on and off. If you've many similar guests the amount of RAM saved can be massive with overall improved performance thanks to higher cache amounts available in the host or simply in allowing a more efficient consolidation.

2% of one core sounds a reasonable amount of CPU spent to save potentially dozen gigabytes of memory.

It also depends on if the guests are mostly idle or if they're massively computing. ksmtuned has a reasonable default but admin can ovverride if he knows the virtual machine requirements.

During CPU bound benchmarks for example KSM usually should be disabled to save a bit of CPU. With desktop virtualization it is very good idea to keep KSM enabled because of the low CPU load and similarity between the different virtual machine software (if they all run the same desktop OS at least).

Comment 18 Frank Ch. Eigler 2013-04-09 17:29:17 UTC

I believe that is understood.  The question is how does an administrator know how well it's working?  Can she do better than gross system-level stats like eyeballing /proc/meminfo?

Comment 19 Tobias Florek 2013-04-09 19:16:26 UTC

have a look at the files in /sys/kernel/mm/ksm/. at least for recent
kernels the information you need should be there. i don't have my rhel machine
handy to check.

Comment 20 Frank Ch. Eigler 2013-04-09 19:59:28 UTC

Thank you for the pointer.
On my F18 machine with ksmtuned / ksm running, these were typical results:

/sys/kernel/mm/ksm/full_scans:0
/sys/kernel/mm/ksm/pages_shared:0
/sys/kernel/mm/ksm/pages_sharing:0
/sys/kernel/mm/ksm/pages_to_scan:100
/sys/kernel/mm/ksm/pages_unshared:0
/sys/kernel/mm/ksm/pages_volatile:0
/sys/kernel/mm/ksm/run:0
/sys/kernel/mm/ksm/sleep_millisecs:20

With ksmtuned stopped but ksmd forced on more aggressively:

/sys/kernel/mm/ksm/full_scans:4
/sys/kernel/mm/ksm/pages_shared:32128
/sys/kernel/mm/ksm/pages_sharing:261406
/sys/kernel/mm/ksm/pages_to_scan:10000
/sys/kernel/mm/ksm/pages_unshared:789171
/sys/kernel/mm/ksm/pages_volatile:2947
/sys/kernel/mm/ksm/run:1
/sys/kernel/mm/ksm/sleep_millisecs:20

Is there some documentation to guide a sysadmin to these files, and how to interpret the numbers?  At that pages_to_scan rate, ksmd was consuming 80% cpu, in order to save some memory.  How much per unit cpu-second for example?

Comment 21 Tobias Florek 2013-04-09 20:12:30 UTC

from the documentation in the linux kernel's Documentation/vm/ksm.txt

pages_shared     - how many shared pages are being used
pages_sharing    - how many more sites are sharing them i.e. how much saved
pages_unshared   - how many pages unique but repeatedly checked for merging
pages_volatile   - how many pages changing too fast to be placed in a tree
full_scans       - how many times all mergeable areas have been scanned

A high ratio of pages_sharing to pages_shared indicates good sharing, but
a high ratio of pages_unshared to pages_sharing indicates wasted effort.
pages_volatile embraces several different kinds of activity, but a high
proportion there would also indicate poor use of madvise MADV_MERGEABLE.


your sample looks ok for my eyes (of course pages_to_scan is way too high).

Comment 22 Andrea Arcangeli 2013-04-09 20:49:20 UTC

ksmtuned doesn't turn on ksm until the system is low on memory. This is to save CPU as well. Eventually stock F18 should also reach the same amount of "pages_shared" and "pages_sharing" levels that you achieved with the ksmd forced on aggressively. It is just much quicker to get there with an aggressive ksmd setting with a very big pages_to_scan value, so it's good for testing, to be sure KSM is saving you tons of RAM. But then you can start ksmtuned again and lower the scan rate. There are also tweaks for ksmtuned to reduce its aggressiveness if needed, but again I don't see big issues in the current defaults.

Comment 23 Ademar Reis 2013-10-01 14:32:39 UTC

Based on the discussion above, looks like the current behavior is acceptable and the tradeoffs are well understood. Closing as WONTFIX in RHEL6. Please reopen if you still have concerns. Thanks.

Note You need to log in before you can comment on or make changes to this bug.

aarcange
alexey.pushkin
aph
areis
ari.tilli
athanasios.zorbas
balay
bjrosen
bloodhound
bugzilla.redhat.com
clasohm
cquike
ddick
dougsland
edgar.hoch
erik
fche
ffejes
fschwarz
gansalmon
gsgatlin
itamar
j.dodkins
jonathansteffan
jorton
jsmith.fedora
juzhang
jzeleny
kas
kernel-maint
kparal
kreucher
kronos
loganjerry
mdavis
me
mikey
mjw
mmccune
oliver.henshaw
paul
pcfe
plarsen
redhat2
rene.purcell
roland.friedwagner
ron
samuel-rhbugs
slavagt
soeren.grunewald
tadej.j
tiagomatos
tim
tjb
tmraz
vorgusa
wcohen
wtogami