Bug 575585 - memory reported as used (by SwapCache and by Cache) though no process holds it.
Summary: memory reported as used (by SwapCache and by Cache) though no process holds it.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.5
Hardware: All
OS: Linux
high
low
Target Milestone: rc
: ---
Assignee: Rik van Riel
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 574348 576939 581764
TreeView+ depends on / blocked
 
Reported: 2010-03-21 16:05 UTC by Dan Kenigsberg
Modified: 2013-01-11 02:51 UTC (History)
15 users (show)

Fixed In Version: kvm-83-168.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 576939 (view as bug list)
Environment:
Last Closed: 2011-01-13 23:34:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
remove old page from swap cache if it was the last mapping (535 bytes, patch)
2010-03-22 17:38 UTC, Andrea Arcangeli
no flags Details | Diff
remove old page from swap cache if it was the last mapping rhel5 version (671 bytes, patch)
2010-03-22 17:57 UTC, Andrea Arcangeli
no flags Details | Diff
ksm.ko for rhel5 with attachment #401837 applied (340.13 KB, application/octet-stream)
2010-03-23 19:27 UTC, Andrea Arcangeli
no flags Details
rhel5 has ksm as module and the function is not exported, hence some more trick is needed (1.83 KB, application/octet-stream)
2010-03-24 19:47 UTC, Andrea Arcangeli
no flags Details
rebuild (loads with modprobe --force-vermagic) but I can rebuild also on proper host if not enough (340.52 KB, application/octet-stream)
2010-03-24 19:48 UTC, Andrea Arcangeli
no flags Details
using kprobes as workaround for lack of free_page_and_swap_cache export (343.04 KB, application/octet-stream)
2010-03-25 11:32 UTC, Andrea Arcangeli
no flags Details
use krpobes as workaround of kallsyms exports (2.41 KB, patch)
2010-03-25 11:35 UTC, Andrea Arcangeli
no flags Details | Diff
in rhel5 free_page_and_swap_cache must be called to release the last reference (2.21 KB, application/octet-stream)
2010-03-25 14:44 UTC, Andrea Arcangeli
no flags Details
ksm.ko to test (343.04 KB, application/octet-stream)
2010-03-25 14:45 UTC, Andrea Arcangeli
no flags Details
ksm.ko with microoptimization to trylock only if we replaced the page (343.59 KB, application/octet-stream)
2010-03-25 14:52 UTC, Andrea Arcangeli
no flags Details
rhel5 fix for ksm orphaned swapcache (2.30 KB, patch)
2010-03-25 14:53 UTC, Andrea Arcangeli
no flags Details | Diff
ksm fix for rhel5 (2.81 KB, patch)
2010-03-25 16:10 UTC, Andrea Arcangeli
no flags Details | Diff
ksm.ko to test for rhel5 (343.64 KB, application/octet-stream)
2010-03-25 16:11 UTC, Andrea Arcangeli
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0028 0 normal SHIPPED_LIVE Low: kvm security and bug fix update 2011-01-13 11:03:39 UTC

Description Dan Kenigsberg 2010-03-21 16:05:49 UTC
Description of problem:
At some occasions, after running multiple qemu-kvm, using swap, and destroying
all VMs, Linux still report some of the memory as not "free", since
it is accounted for as SwapCached. That memory is also accounted for as non-free Cache are (though the processes whose memory was swapped out are long dead).

# free
             total       used       free     shared    buffers     cached
Mem:      32934104    7681960   25252144          0      20096      97576
-/+ buffers/cache:    7564288   25369816
Swap:     20479992    7412272   13067720

# grep Swap /proc/meminfo
SwapCached:    7407508 kB
SwapTotal:    20479992 kB
SwapFree:     13067720 kB

and no real memory consuption by processes
# ps -e -o rss|awk '{s=s+$1} END {print s}'
72420

Version-Release number of selected component (if applicable):
2.6.18-183.el5

How reproducible:
no simple stand-alone reproducer yet.

Steps to Reproduce:
1. start many qemu-kvm processes on the host, each running Windows guest.
   ksm lets us over-commit memory without swapping.
2. let the VMs run, grow in memory usage, and swap out.
3. migrate the VMs away from the host.
  
Actual results:
Host reports a lot (7g) of ram used by SwapCache and Swap.

Expected results:
No process is running, no memory should be used.

Additional info:

Comment 1 Dor Laor 2010-03-22 08:18:20 UTC
It will also be helpful to reproduce it with a simpler scenario like without ksm in order to get to the root cause. Dan, can you try it?

Comment 2 Rik van Riel 2010-03-22 14:47:34 UTC
The memory will be reclaimed as the system needs it.  This is mostly a cosmetic bug.

Comment 3 Dor Laor 2010-03-22 15:00:25 UTC
The problem is that mgmt tools track these reports and relay on them. When the ram/swap are reported as occupied, mgmt will pick another server to run the VMs.

Comment 4 Itamar Heim 2010-03-22 15:35:21 UTC
reporting how much free memory a host has is not cosmetic - we make decisions based on the reported numbers

Comment 5 Rik van Riel 2010-03-22 15:50:07 UTC
When a process exits, all of the memory will be handled by zap_pte_range().  That code definately looks as if it does the right thing...

                if (pte_present(ptent)) {
...
                    tlb_remove_page(tlb, page);
                         ---> free_page_and_swap_cache(page);
...
                }

               if (!pte_file(ptent))
                        free_swap_and_cache(pte_to_swp_entry(ptent));

Either way, swap entries (and swapcache) get freed. The only way to prevent that would be if KVM (or KSM) had either an extra reference count on the page, or hold the page lock.

I'll look through the KVM & KSM code for such a problem.

Comment 6 Rik van Riel 2010-03-22 15:52:30 UTC
Btw, it is possible for the VM to temporarily "leak" a little bit of memory this way - pages that are currently under swap IO will be locked and not immediately freeable.

However, this should always be a small number of pages, since we only have so much IO in flight at a time.  It should not be an entire 7GB process.

Comment 7 Rik van Riel 2010-03-22 16:03:47 UTC
I can see a potential issue in KSM.  The function replace_page() will merge memory pages, but it will not free the old swap slot and swap cache entry.

Dan or Itamar, can you reproduce this problem without KSM?

Comment 8 Andrea Arcangeli 2010-03-22 17:38:27 UTC
Created attachment 401830 [details]
remove old page from swap cache if it was the last mapping

Comment 9 Andrea Arcangeli 2010-03-22 17:40:44 UTC
I agree with Rik on comment #7. Leave orhpaned swap entries has always been an issue that a long time ago was left unfixed. In more recent times, we always tried not to leave orphaned swap entries and the function that achieves it is to call free_page_and_swap_cache instead of put_page. That takes care of the job itself. So the hard work was to find the source of the leak which is what Rik did in comment #7, fix is actually trivial.

That will work unless the orhpaned swap cache is left from ksm pages, and not the merged anonymous pages.

Comment 10 Andrea Arcangeli 2010-03-22 17:57:03 UTC
Created attachment 401837 [details]
remove old page from swap cache if it was the last mapping rhel5 version

401830 was for rhel6.

Comment 12 Itamar Heim 2010-03-23 16:12:09 UTC
(In reply to comment #7)
> I can see a potential issue in KSM.  The function replace_page() will merge
> memory pages, but it will not free the old swap slot and swap cache entry.
> 
> Dan or Itamar, can you reproduce this problem without KSM?    

Dan disabled ksm on the host and i ran the load again.
after migrating guests of the machine, swap was freed.
(still has some leftover, but seems the issue doesn't happen without ksm)

             total       used       free     shared    buffers     cached
Mem:      32934104     301948   32632156          0      18344     102464
-/+ buffers/cache:     181140   32752964
Swap:     20479992      47568   20432424

Comment 13 Andrea Arcangeli 2010-03-23 19:27:58 UTC
Created attachment 402124 [details]
ksm.ko for rhel5 with attachment #401837 [details] applied

this is a build of ksm.ko for rhel5 x86_64 generic kernel with attachment #401837 [details] applied.

To install you need to overwrite the ksm.ko module in /opt and then run "rmmod ksm && insmod ksm".

Let me know if it doesn't load, it's not a proper brew build, I built it on my system.

Comment 14 Dan Kenigsberg 2010-03-24 14:58:56 UTC
(In reply to comment #13)

> Let me know if it doesn't load

I get

# insmod ksm.ko 
insmod: error inserting 'ksm.ko': -1 Invalid module format

# uname -r
2.6.18-183.el5

is it because of

# strings ksm.ko |grep verm
vermagic=2.6.18.4 SMP mod_unload gcc-4.4
# strings /lib/modules/2.6.18-190.el5/extra/kmod-kvm/ksm.ko |grep verm
vermagic=2.6.18-190.el5 SMP mod_unload gcc-4.1

?

Comment 15 Andrea Arcangeli 2010-03-24 19:47:21 UTC
Created attachment 402400 [details]
rhel5 has ksm as module and the function is not exported, hence some more trick is needed

Comment 16 Andrea Arcangeli 2010-03-24 19:48:49 UTC
Created attachment 402401 [details]
rebuild (loads with modprobe --force-vermagic) but I can rebuild also on proper host if not enough

Comment 17 Dan Kenigsberg 2010-03-25 07:23:57 UTC
Still having linkage problems, now with kallsyms_lookup_name.

ksm: no version magic, tainting kernel.
ksm: Unknown symbol kallsyms_lookup_name

Comment 18 Andrea Arcangeli 2010-03-25 11:32:04 UTC
Created attachment 402520 [details]
using kprobes as workaround for lack of free_page_and_swap_cache export

Comment 19 Andrea Arcangeli 2010-03-25 11:35:57 UTC
Created attachment 402521 [details]
use krpobes as workaround of kallsyms exports

you shall verify that the "free_page_and_swap_cache as put_page" is not shown in `dmesg` after loading the module...

Comment 20 Dan Kenigsberg 2010-03-25 13:22:38 UTC
ksm: no version magic, tainting kernel.
ksm loaded

but we still see

# free -m
             total       used       free     shared    buffers     cached
Mem:          7958       1477       6481          0          6        122
-/+ buffers/cache:       1348       6610
Swap:        19999       1264      18735

with only 34M of process-accounted memory

# ps -e -o rss | awk '{s=s+$1} END {print s}'
34380

and plenty of SwapCache

# grep Swap /proc/meminfo 
SwapCached:    1285872 kB
SwapTotal:    20479992 kB
SwapFree:     19185412 kB

Comment 21 Andrea Arcangeli 2010-03-25 14:44:51 UTC
Created attachment 402574 [details]
in rhel5 free_page_and_swap_cache must be called to release the last reference

Comment 22 Andrea Arcangeli 2010-03-25 14:45:47 UTC
Created attachment 402575 [details]
ksm.ko to test

Comment 23 Andrea Arcangeli 2010-03-25 14:52:12 UTC
Created attachment 402582 [details]
ksm.ko with microoptimization to trylock only if we replaced the page

Comment 24 Andrea Arcangeli 2010-03-25 14:53:10 UTC
Created attachment 402583 [details]
rhel5 fix for ksm orphaned swapcache

Comment 25 Andrea Arcangeli 2010-03-25 16:10:17 UTC
Created attachment 402602 [details]
ksm fix for rhel5

Comment 26 Andrea Arcangeli 2010-03-25 16:11:28 UTC
Created attachment 402603 [details]
ksm.ko to test for rhel5

Comment 27 Dan Kenigsberg 2010-03-25 16:50:38 UTC
I've got good results for what I believe was
https://bugzilla.redhat.com/attachment.cgi?id=402575
md5sum /lib/modules/2.6.18-190.el5/extra/kmod-kvm/ksm.ko da562c6f0a7962c2f925ed3051e64548  /lib/modules/2.6.18-190.el5/extra/kmod-kvm/ksm.ko

# grep Swap /proc/meminfo 
SwapCached:     104480 kB
SwapTotal:    20479992 kB
SwapFree:     20358784 kB

# free -m
             total       used       free     shared    buffers     cached
Mem:          7958        355       7602          0         14        154
-/+ buffers/cache:        186       7771
Swap:        19999        118      19881

no time to doublecheck, all versions...

Comment 28 Rik van Riel 2010-03-25 17:02:51 UTC
Changing component to KVM since that is where the fix needs to be applied in RHEL 5.

Comment 29 Dan Kenigsberg 2010-03-29 06:52:33 UTC
With the module of comment 26 I'm finally seeing better results when all qemus are gone:

# grep Swap /proc/meminfo 
SwapCached:      43312 kB
SwapTotal:    20479992 kB
SwapFree:     20412704 kB

# free -m
             total       used       free     shared    buffers     cached
Mem:          7958        287       7670          0          3        146
-/+ buffers/cache:        137       7820
Swap:        19999         65      19934

Comment 38 Miya Chen 2010-04-08 10:16:34 UTC
this bug cannot be verified because of bug 580410

Comment 41 Keqin Hong 2010-04-20 10:14:54 UTC
Summary:
Verified on kvm-83-169.el5, passed.

Test environment and steps were the same as https://bugzilla.redhat.com/show_bug.cgi?id=581764#c6, except for KVM version used here was kvm-83-169.el5.

Results:
1. host memory info when two 64-bit Windows guests (each with 8G physical mem) were running
# free -m
             total       used       free     shared    buffers     cached
Mem:          7718       1458       6260          0          7        442
-/+ buffers/cache:       1008       6710
Swap:         9727       1918       7809

# grep Swap /proc/meminfo 
SwapCached:     242316 kB
SwapTotal:     9961464 kB
SwapFree:      7997988 kB

2. host memory info after shutting down both guests
# free -m
             total       used       free     shared    buffers     cached
Mem:          7718        640       7078          0         22        526
-/+ buffers/cache:         90       7628
Swap:         9727         80       9647

# grep Swap /proc/meminfo 
SwapCached:       7596 kB
SwapTotal:     9961464 kB
SwapFree:      9878720 kB

Comment 43 Keqin Hong 2010-11-05 10:24:10 UTC
Verified again on kvm-83-206.el5, PASS.

1. host memory info when two win7-64 (each with 8G physical mem)
were running
# free -m
             total       used       free     shared    buffers     cached
Mem:          7718       5002       2715          0          6          5
-/+ buffers/cache:       4990       2727
Swap:         9727       2036       7691
# grep Swap /proc/meminfo 
SwapCached:     706204 kB
SwapTotal:     9961464 kB
SwapFree:      7901276 kB

2. host memory info after shutting down both guests
# free -m
             total       used       free     shared    buffers     cached
Mem:          7718        101       7616          0          7         15
-/+ buffers/cache:         79       7639
Swap:         9727         67       9660
# grep Swap /proc/meminfo 
SwapCached:       6384 kB
SwapTotal:     9961464 kB
SwapFree:      9892688 kB

Comment 46 errata-xmlrpc 2011-01-13 23:34:25 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0028.html


Note You need to log in before you can comment on or make changes to this bug.