Description of problem: At some occasions, after running multiple qemu-kvm, using swap, and destroying all VMs, Linux still report some of the memory as not "free", since it is accounted for as SwapCached. That memory is also accounted for as non-free Cache are (though the processes whose memory was swapped out are long dead). # free total used free shared buffers cached Mem: 32934104 7681960 25252144 0 20096 97576 -/+ buffers/cache: 7564288 25369816 Swap: 20479992 7412272 13067720 # grep Swap /proc/meminfo SwapCached: 7407508 kB SwapTotal: 20479992 kB SwapFree: 13067720 kB and no real memory consuption by processes # ps -e -o rss|awk '{s=s+$1} END {print s}' 72420 Version-Release number of selected component (if applicable): 2.6.18-183.el5 How reproducible: no simple stand-alone reproducer yet. Steps to Reproduce: 1. start many qemu-kvm processes on the host, each running Windows guest. ksm lets us over-commit memory without swapping. 2. let the VMs run, grow in memory usage, and swap out. 3. migrate the VMs away from the host. Actual results: Host reports a lot (7g) of ram used by SwapCache and Swap. Expected results: No process is running, no memory should be used. Additional info:
It will also be helpful to reproduce it with a simpler scenario like without ksm in order to get to the root cause. Dan, can you try it?
The memory will be reclaimed as the system needs it. This is mostly a cosmetic bug.
The problem is that mgmt tools track these reports and relay on them. When the ram/swap are reported as occupied, mgmt will pick another server to run the VMs.
reporting how much free memory a host has is not cosmetic - we make decisions based on the reported numbers
When a process exits, all of the memory will be handled by zap_pte_range(). That code definately looks as if it does the right thing... if (pte_present(ptent)) { ... tlb_remove_page(tlb, page); ---> free_page_and_swap_cache(page); ... } if (!pte_file(ptent)) free_swap_and_cache(pte_to_swp_entry(ptent)); Either way, swap entries (and swapcache) get freed. The only way to prevent that would be if KVM (or KSM) had either an extra reference count on the page, or hold the page lock. I'll look through the KVM & KSM code for such a problem.
Btw, it is possible for the VM to temporarily "leak" a little bit of memory this way - pages that are currently under swap IO will be locked and not immediately freeable. However, this should always be a small number of pages, since we only have so much IO in flight at a time. It should not be an entire 7GB process.
I can see a potential issue in KSM. The function replace_page() will merge memory pages, but it will not free the old swap slot and swap cache entry. Dan or Itamar, can you reproduce this problem without KSM?
Created attachment 401830 [details] remove old page from swap cache if it was the last mapping
I agree with Rik on comment #7. Leave orhpaned swap entries has always been an issue that a long time ago was left unfixed. In more recent times, we always tried not to leave orphaned swap entries and the function that achieves it is to call free_page_and_swap_cache instead of put_page. That takes care of the job itself. So the hard work was to find the source of the leak which is what Rik did in comment #7, fix is actually trivial. That will work unless the orhpaned swap cache is left from ksm pages, and not the merged anonymous pages.
Created attachment 401837 [details] remove old page from swap cache if it was the last mapping rhel5 version 401830 was for rhel6.
(In reply to comment #7) > I can see a potential issue in KSM. The function replace_page() will merge > memory pages, but it will not free the old swap slot and swap cache entry. > > Dan or Itamar, can you reproduce this problem without KSM? Dan disabled ksm on the host and i ran the load again. after migrating guests of the machine, swap was freed. (still has some leftover, but seems the issue doesn't happen without ksm) total used free shared buffers cached Mem: 32934104 301948 32632156 0 18344 102464 -/+ buffers/cache: 181140 32752964 Swap: 20479992 47568 20432424
Created attachment 402124 [details] ksm.ko for rhel5 with attachment #401837 [details] applied this is a build of ksm.ko for rhel5 x86_64 generic kernel with attachment #401837 [details] applied. To install you need to overwrite the ksm.ko module in /opt and then run "rmmod ksm && insmod ksm". Let me know if it doesn't load, it's not a proper brew build, I built it on my system.
(In reply to comment #13) > Let me know if it doesn't load I get # insmod ksm.ko insmod: error inserting 'ksm.ko': -1 Invalid module format # uname -r 2.6.18-183.el5 is it because of # strings ksm.ko |grep verm vermagic=2.6.18.4 SMP mod_unload gcc-4.4 # strings /lib/modules/2.6.18-190.el5/extra/kmod-kvm/ksm.ko |grep verm vermagic=2.6.18-190.el5 SMP mod_unload gcc-4.1 ?
Created attachment 402400 [details] rhel5 has ksm as module and the function is not exported, hence some more trick is needed
Created attachment 402401 [details] rebuild (loads with modprobe --force-vermagic) but I can rebuild also on proper host if not enough
Still having linkage problems, now with kallsyms_lookup_name. ksm: no version magic, tainting kernel. ksm: Unknown symbol kallsyms_lookup_name
Created attachment 402520 [details] using kprobes as workaround for lack of free_page_and_swap_cache export
Created attachment 402521 [details] use krpobes as workaround of kallsyms exports you shall verify that the "free_page_and_swap_cache as put_page" is not shown in `dmesg` after loading the module...
ksm: no version magic, tainting kernel. ksm loaded but we still see # free -m total used free shared buffers cached Mem: 7958 1477 6481 0 6 122 -/+ buffers/cache: 1348 6610 Swap: 19999 1264 18735 with only 34M of process-accounted memory # ps -e -o rss | awk '{s=s+$1} END {print s}' 34380 and plenty of SwapCache # grep Swap /proc/meminfo SwapCached: 1285872 kB SwapTotal: 20479992 kB SwapFree: 19185412 kB
Created attachment 402574 [details] in rhel5 free_page_and_swap_cache must be called to release the last reference
Created attachment 402575 [details] ksm.ko to test
Created attachment 402582 [details] ksm.ko with microoptimization to trylock only if we replaced the page
Created attachment 402583 [details] rhel5 fix for ksm orphaned swapcache
Created attachment 402602 [details] ksm fix for rhel5
Created attachment 402603 [details] ksm.ko to test for rhel5
I've got good results for what I believe was https://bugzilla.redhat.com/attachment.cgi?id=402575 md5sum /lib/modules/2.6.18-190.el5/extra/kmod-kvm/ksm.ko da562c6f0a7962c2f925ed3051e64548 /lib/modules/2.6.18-190.el5/extra/kmod-kvm/ksm.ko # grep Swap /proc/meminfo SwapCached: 104480 kB SwapTotal: 20479992 kB SwapFree: 20358784 kB # free -m total used free shared buffers cached Mem: 7958 355 7602 0 14 154 -/+ buffers/cache: 186 7771 Swap: 19999 118 19881 no time to doublecheck, all versions...
Changing component to KVM since that is where the fix needs to be applied in RHEL 5.
With the module of comment 26 I'm finally seeing better results when all qemus are gone: # grep Swap /proc/meminfo SwapCached: 43312 kB SwapTotal: 20479992 kB SwapFree: 20412704 kB # free -m total used free shared buffers cached Mem: 7958 287 7670 0 3 146 -/+ buffers/cache: 137 7820 Swap: 19999 65 19934
this bug cannot be verified because of bug 580410
Summary: Verified on kvm-83-169.el5, passed. Test environment and steps were the same as https://bugzilla.redhat.com/show_bug.cgi?id=581764#c6, except for KVM version used here was kvm-83-169.el5. Results: 1. host memory info when two 64-bit Windows guests (each with 8G physical mem) were running # free -m total used free shared buffers cached Mem: 7718 1458 6260 0 7 442 -/+ buffers/cache: 1008 6710 Swap: 9727 1918 7809 # grep Swap /proc/meminfo SwapCached: 242316 kB SwapTotal: 9961464 kB SwapFree: 7997988 kB 2. host memory info after shutting down both guests # free -m total used free shared buffers cached Mem: 7718 640 7078 0 22 526 -/+ buffers/cache: 90 7628 Swap: 9727 80 9647 # grep Swap /proc/meminfo SwapCached: 7596 kB SwapTotal: 9961464 kB SwapFree: 9878720 kB
Verified again on kvm-83-206.el5, PASS. 1. host memory info when two win7-64 (each with 8G physical mem) were running # free -m total used free shared buffers cached Mem: 7718 5002 2715 0 6 5 -/+ buffers/cache: 4990 2727 Swap: 9727 2036 7691 # grep Swap /proc/meminfo SwapCached: 706204 kB SwapTotal: 9961464 kB SwapFree: 7901276 kB 2. host memory info after shutting down both guests # free -m total used free shared buffers cached Mem: 7718 101 7616 0 7 15 -/+ buffers/cache: 79 7639 Swap: 9727 67 9660 # grep Swap /proc/meminfo SwapCached: 6384 kB SwapTotal: 9961464 kB SwapFree: 9892688 kB
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0028.html