Bug 1796356 - ksmd using heavy CPU on AMD sev
Summary: ksmd using heavy CPU on AMD sev
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: Dr. David Alan Gilbert
QA Contact: zixchen
URL:
Whiteboard:
Depends On:
Blocks: 1818024
TreeView+ depends on / blocked
 
Reported: 2020-01-30 09:22 UTC by Dr. David Alan Gilbert
Modified: 2020-11-17 17:47 UTC (History)
11 users (show)

Fixed In Version: qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-17 17:46:36 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Dr. David Alan Gilbert 2020-01-30 09:22:02 UTC
Description of problem:
When running a large SEV VM, ksmd can be seen chugginng away at a high CPU usage with no hope of ever actually merging pages

Version-Release number of selected component (if applicable):
qemu-kvm-common-4.1.0-14.module+el8.1.0+5346+c31201bb.1.x86_64

How reproducible:
100%?

Steps to Reproduce:
1. Start a SEV VM on a host with lots of RAM and give the guest lots of RAM (I used a 200GB guest in my case, but I doubt it needs to be that big)
2. start 'top' on the host, while leaving the guest idle

Actual results:
KSM is constantly using a considerable amount of CPU, it started off at abotu 20% for me, but rose to 70% (of a core) constantly for over half an hour.

Expected results:
Sane ksm usage

Additional info:
SEV encrypts pages, meaning that the host kernel never sees real page data, and the data looks random, so it can't really merge it. We  should probably turn off 'mem-merge' on SEV VMs.

Comment 3 Dr. David Alan Gilbert 2020-01-30 17:52:02 UTC
Posted upstream fix:
[PATCH] machine/memory encryption: Disable mem merge

Comment 4 Ademar Reis 2020-02-05 23:14:15 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 5 Dr. David Alan Gilbert 2020-03-20 10:53:19 UTC
This is now merged upstream as 4ba59be1d6d8c57941841a505cb4656628d582d0
Given that disabling ksm manually is an OK work around, I don't intend to
backport it unless someone requests it.
Moving to post and marking fixed in 5.0

Comment 8 zixchen 2020-07-21 13:36:56 UTC
Reproduce bug with qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64. As the tested machine has 64GB memory, I installed vm with 50G RAM. After 40 mins, it takes 13% CPU usage:

Version:
kernel-4.18.0-193.13.2.el8_2.x86_64
qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64

Steps:
1. start a vm with 50GB RAM, and leave it idle.
2. systemctl start/enable ksm
3. systemctl status ksm checked its status is enabled.
4. start top on host
5. wait for 40 mins


Results:
After Step 4, ksmd usage is around 1.6% CPU usage, rising up to approximately 13%.


Verified bug with qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420.x86_64, using top command didn't capture ksmd cpu usage.

Version:
kernel-4.18.0-224.el8.x86_64
qemu-kvm-5.0.0-2.module+el8.3.0+7379+0505d6ca.x86_64

Steps:
1. start a vm with 50GB RAM, and leave it idle.
2. systemctl start/enable ksm
3. systemctl status ksm checked its status is enabled.
4. start top on host
5. wait for 40 mins


Actual Result:
After Step 4 and Step 5, ksm service status is active,start top command can't see ksmd cpu usage both from the beginning and after 40 mins.

● ksm.service - Kernel Samepage Merging
   Loaded: loaded (/usr/lib/systemd/system/ksm.service; enabled; vendor preset: enabled)
   Active: active (exited) since Tue 2020-07-21 08:22:25 EDT; 1min 18s ago
 Main PID: 36612 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 407449)
   Memory: 0B
   CGroup: /system.slice/ksm.service

Jul 21 08:22:25 dell-per7425-02.khw.lab.eng.bos.redhat.com systemd[1]: Starting Kernel Samepage Merging...
Jul 21 08:22:25 dell-per7425-02.khw.lab.eng.bos.redhat.com systemd[1]: Started Kernel Samepage Merging.
***************************************************************************************************************
From the beginning:  
Tasks: 989 total,   2 running, 987 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  63865.6 total,   7913.6 free,  53241.7 used,   2710.3 buff/cache
MiB Swap:  32096.0 total,  32096.0 free,      0.0 used.   9981.1 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                             
  34197 qemu      20   0   55.0g  50.1g  22052 S   1.3  80.3   1:56.46 qemu-kvm                                                                                                                            
  36989 root      20   0   62684   5660   3784 R   1.0   0.0   0:01.74 top                                                                                                                                 
  34252 root      20   0       0      0      0 S   0.3   0.0   0:01.03 kvm-pit/34197                                                                                                                       
      1 root      20   0  247864  14756   9412 S   0.0   0.0   0:06.99 systemd                                                                                                                             
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.06 kthreadd       
****************************************************************************************************************
After 60 mins:
Tasks: 995 total,   1 running, 994 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  63865.6 total,   7879.5 free,  53266.7 used,   2719.4 buff/cache
MiB Swap:  32096.0 total,  32096.0 free,      0.0 used.   9954.3 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                             
  34197 qemu      20   0   55.0g  50.1g  22052 S   2.3  80.3   2:44.95 qemu-kvm                                                                                                                            
  38106 root      20   0   62684   5680   3796 R   0.7   0.0   0:14.98 top                                                                                                                                 
   2304 root      20   0  125380   6028   4908 S   0.3   0.0   0:04.59 irqbalance                                                                                                                          
      1 root      20   0  247864  14756   9412 S   0.0   0.0   0:07.01 systemd                                                                                                                             
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.07 kthreadd   


Expected results:
Sane ksm usage

Needinfo: could you please check the test steps and the actual result is as expected, as ksmd is not monitored in the cpu usage, thank you.

Comment 9 Dr. David Alan Gilbert 2020-07-22 09:41:47 UTC
(In reply to zixchen from comment #8)

> Needinfo: could you please check the test steps and the actual result is as
> expected, as ksmd is not monitored in the cpu usage, thank you.

The test needs to be running the VM with SEV enabled - are you doing that?

Comment 10 zixchen 2020-07-22 10:44:56 UTC
(In reply to Dr. David Alan Gilbert from comment #9)
The test needs to be running the VM with SEV enabled - are you doing that?

yes, sev is enabled in the VM.

Steps: 
1. ssh login to the VM.
2. dmesg | grep sev

After Step2, 
[    0.001000] AMD Secure Encrypted Virtualization (SEV) active

Comment 11 Dr. David Alan Gilbert 2020-07-22 10:58:04 UTC
OK, then great, that test is fine.

Comment 14 errata-xmlrpc 2020-11-17 17:46:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5137


Note You need to log in before you can comment on or make changes to this bug.