I upgraded my machines from F8 to F10 and all of a sudden they got very low on memory. There are no processes consuming it, so it must be the kernel that has allocated something. I get this on a lightly loaded F10 machine: [root@thor ~]# free total used free shared buffers cached Mem: 507288 341436 165852 0 6732 79176 -/+ buffers/cache: 255528 251760 Swap: 524280 0 524280 But on a F8 the usage is _very_ different: [drzeus@loki]$ free total used free shared buffers cached Mem: 514264 346560 167704 0 98428 196028 -/+ buffers/cache: 52104 462160 Swap: 819192 0 819192 I didn't notice anything obvious in /proc/slabinfo. Any other place I should look?
There are two lines that stand out a bit, although they cannot explain all of the memory loss: scsi_data_buffer 259967 260100 24 170 1 : tunables 0 0 0 : slabdata 1530 1530 0 kmalloc-16 150752 151552 16 256 1 : tunables 0 0 0 : slabdata 592 592 0 Perhaps this rings a bell.
Any ideas? Losing 200 MB of RAM has a rather detrimental effect on performance. Particularly on the machines with just 256 MB of total memory.
You can track who allocates the memory by booting with kernel option "slub_debug=U".
This seems to have occurred when upgrading to 2.6.27. I had a look at a F9 machine and I get this after a fresh boot with 2.6.27.12-78.2.8.fc9.x86_64: [root@builder ~]# free total used free shared buffers cached Mem: 508016 259508 248508 0 4224 35628 -/+ buffers/cache: 219656 288360 Swap: 524280 0 524280 Same machine, same userspace, booted with 2.6.25.9-76.fc9.x86_64: [root@builder ~]# free total used free shared buffers cached Mem: 509612 92920 416692 0 4144 34500 -/+ buffers/cache: 54276 455336 Swap: 524280 0 524280 [root@builder ~]# uname -a I'd like those 170 MiB back. Pretty please. :) I'm doing some more comparisons to see if I can determine where the memory goes.
I've compared /proc/meminfo and /proc/slabinfo, and they are almost identical and the differences don't explain the big difference in memory usage. The 2.6.25 kernel has a very large numa_policy post: numa_policy 311440 311440 24 170 1 : tunables 0 0 0 : slabdata 1832 1832 0 2.6.27: numa_policy 59 60 136 30 1 : tunables 0 0 0 : slabdata 2 2 0 The only other big difference is the fasync_cache entry which only appears on .27: fasync_cache 311440 311440 24 170 1 : tunables 0 0 0 : slabdata 1832 1832 0 I'll see what slub_debug can give me.
I'm not sure how slub_debug is supposed to work. I had a go with slabinfo, but there is nothing it reports that comes even close to those missing 170 MB. So either I'm doing it wrong, or the missing memory is not in the slab.
I would have expected that such a serious regression would have resulted in more attention... Anyway, I've done some more testing to pinpoint where this bug was introduced: - The bug is upstream as far as I can tell. I compiled a vanilla 2.6.27.19 and it used as much memory as Fedora's. - The bug was introduced in 2.6.27. The memory usage is normal in kernel-2.6.26.6-79.fc9.x86_64, but high in kernel-2.6.27.4-19.fc9.x86_64.
Since this was reproducible in a vanilla kernel I've opened a bug in the kernel bugzilla: http://bugzilla.kernel.org/show_bug.cgi?id=12832
See Documentation/vm/slub.txt for details on how to use slab debugging. Trace the allocations and frees on the slabs you mentioned, and that should tell you if theres extra usage thats leaking. also, how exactly are you determining that you are "low" on memory, are you getting allocation failures somewhere, or lower performance in some subsystem? What you have above shows extra memory usage, but thats not necessecarily a bad thing. That could be attributable to more agressive pagecache use, or higher affinity to a slab cache, orsomething simmilar. If the usage of the data is in inactive slab objects, or in cleaned pagecache, the memory is easily reclaimable, and is being put to better use than just having it idle. So, lets start with a description of what problem you're actually seeing, rather than just an assumption that less free memory equates to leaked memory or increased memory usage.
(In reply to comment #9) > See Documentation/vm/slub.txt for details on how to use slab debugging. Trace > the allocations and frees on the slabs you mentioned, and that should tell you > if theres extra usage thats leaking. I don't think there's any leaking as the lost memory stays constant over time as far as I can tell. And as there is nothing in slabinfo that's of the magnitude of the lost memory, I don't see how normal slub tracing will tell me anything. Could you elaborate on what I should look for? > > also, how exactly are you determining that you are "low" on memory, are you > getting allocation failures somewhere, or lower performance in some subsystem? > What you have above shows extra memory usage, but thats not necessecarily a bad > thing. That could be attributable to more agressive pagecache use, or higher > affinity to a slab cache, orsomething simmilar. If the usage of the data is in > inactive slab objects, or in cleaned pagecache, the memory is easily > reclaimable, and is being put to better use than just having it idle. > Well, it's not in the page cache at least as that would show up under "cached". I don't know if the information "free" uses accounts for inactive slab objects though. As to what I'm experiencing, I'm getting a lot less use of the page cache which gives me reduced performance. And more importantly, I'm getting OOM events for workloads that previously ran just fine.
Btw, have you tried reproducing this on you own systems? I'm seeing it on several (probably all) machines here, so it does seem to be a general problem.
No, of course I've not tried it on systems here, I've only just seen this issue. Although I do have several F-10 systems that have a relatively high workload lots of the time, and they don't seem to have this problem, so I'm guessing its something driver specific. the fact that you are getting OOM events is somewhat telling. The output of the OOM stack traces would be helpful. If you don't see anything to account for all of the unused or missing memory is somewhat suggestive of a call to get_free_pages without a paired call to free_pages. A sosreport would be good so we could get some idea of which drivers you are using that use direct get_free_pages calls. what we can do is use the sosreport to build a stap script to trace get_free_pages calls to build a list of allocated pages that don't get freed.
Created attachment 334441 [details] sosreport Here's the sosreport from the system. Andrew Morton suggested checking the bootmem allocations, but unfortunately that didn't reveal anything. Neither did dumping the number of free pages after each initcall.
is this a paravirt guest your working with? that might explain why I've not seen this. A behavioral question: Is the free memory lower immediately on boot up, or does it decrease more quickly with use? First though would be to (if its still possible to replace the virtio_blk driver with the ide driver, if it still supports doing that), otherwise, I'd simply back up the virtio_blk driver to the F8 driver variant, and rebuild the kernel. If the memory usage returns to normal, we dig deeper there, if not, we move on to the virtio_net driver I think.
(In reply to comment #14) > is this a paravirt guest your working with? that might explain why I've not > seen this. It's a KVM with virtio_blk and virtio_net. > > A behavioral question: Is the free memory lower immediately on boot up, or does > it decrease more quickly with use? > It's gone directly. I've booted with init=/bin/sh and it's already missing there. I haven't booted tried without initrd (as that is a bit of a pain). > First though would be to (if its still possible to replace the virtio_blk > driver with the ide driver, if it still supports doing that), otherwise, I'd > simply back up the virtio_blk driver to the F8 driver variant, and rebuild the > kernel. If the memory usage returns to normal, we dig deeper there, if not, we > move on to the virtio_net driver I think. I have tried without virtio_blk, but not without virtio_net. I'll do that now and see what happens.
Oh, sorry. Forgot to update :) I tested without virtio_* and it didn't have any effect. Not even virtio_pci got loaded.
well, that just leaves virtio_ring and virtio, which I assume you can't remove (or can you)? It also begins to suggest that you would be seeing this on bare metal as well as in guest environments. Is that the case?
There were no virtio modules loaded whatsoever. And I have the same problem on real metal, it's just that the virtual instance is easier to poke and prod without any critical services going down. :)
Perhaps I should say I have the same symptoms on real metal. There is always a remote chance that it is another bug there. ;)
Well, I diffed the kernel spec for F-9 based on your comments in comment #7. The changeset is pretty large and vague unfortunately with nothing standing out in regards to memory consumption. Given that you can produce this on a guest, how do you feel about doing a bisect on this? Most of the builds between your two markers in comment #7 a binary search shouldn't take long, and I think most of the builds are in koji. Given that we're still in the dark, this is as good as playing guess and check with modules, and it will narrow down our suspect patch set
Way ahead of you. ;) I've been doing a bisect for the last couple of days (it takes two hours per compile, so it isn't exactly fast :/). Unfortunately it didn't really give me something decent. I got stuck on the changeset ec1bb60bb..6712e299. What's left there refuses to boot, so I cannot bisect the last few entries. Also, they're all ftrace patches, which is strange as CONFIG_FTRACE is off in Fedora kernels (and in my bisect kernels of course). The current lead is that the missing 170 MB seem to have been "found" by using /proc/pageflags (with some extra patches). See the kernel bugzilla entry for the full details.
The ftrace stuff might be a good lead. Do you have a list of the patches to the kernel. It entirely possible that the addition of the patches is building an extra file that isn't sensitive to the config options. Do you have a list of those patches?
It's just patches to fs/proc to make it dump more info from the page structs. I have made another breakthrough though, and that's that when I disable all tracing (everything under kernel/trace), the missing memory returns. I tried reenabling CONFIG_SYSPROF_TRACER and that made half of the relevant memory disappear again. I'll see if I can find what steals the other half.
I bet thats it. I'm looking at the trace buffer setup routines, and they appear to pre-allocate a huge number of pages when enabled. Its kind of nuts honestly. The (trace_buf_size is the controlling variable if you want to look) Fortunately you can tune it on the command line. Try setting trace_buf_size=1 on the kernel command line and see if your missing pages come back.
Unfortunately not. Setting trace_buf_size=1 did not have any noticeable effect on the memory usage. :/
do you see any notes in the message log regarding allocations during ftrace initalization?
The issue has been found (and explained by Steven Rostedt). The fundamental problem is that the tracing stuff allocates large buffers for each possible cpu. And I mean possible from a kernel POV, not what's really possible to add in the machine. In 2.6.27, it allocates 11 MB of data for each of these possible cpus, so it's a lot of memory. Now I don't know why this isn't appearing anywhere, but at least it can be correlated to the number of possible cpus. If I check /sys/devices/system/cpu/possible on the different machines, the machines that are fine have "0-1" (two cpus), the real metal with a problem has "0-3" and the virtual with the big problem has "0-63". And 64 times 11 MB is precisely the missing memory. In 2.6.28 the memory used drops from 11 MB per cpu to 3 MB, and Steven just posted patches to LKML that makes the kernel allocate only for online cpus. So the problem is solved long term, but what to do for F10? Can you disable the tracing until you can push a 2.6.29 with Steven's fixes?
(In reply to comment #27) > ...and the > virtual with the big problem has "0-63". And 64 times 11 MB is precisely the > missing memory. Before anyone objects to my funky math, the virtual machine had "0-15", not "0-63". Sorry. :)
(In reply to comment #24) > Fortunately you can tune it on the command line. Try setting trace_buf_size=1 > on the kernel command line and see if your missing pages come back. For anyone else reading this entry, apparently trace_entries=1 is the correct parameter to get your memory back.
I assume this is the patch series: http://lkml.indiana.edu/hypermail/linux/kernel/0903.1/01730.html I can look into doing a backport to F-10
I've got a build here: http://koji.fedoraproject.org/koji/taskinfo?taskID=1246562 for F-10 with some of the memory consumption backports. Give it a spin if you would and confirm that your memory usage is improved please. Thanks!
Result BuildError: error building package (arch i686), mock exited with status 1; see build.log for more information ;)
grr, sorry, stupid forward reference error. I'll fix it up and resubmit it shortly. Thanks
http://koji.fedoraproject.org/koji/taskinfo?taskID=1248051 New build in progress.
ping?
Sorry, forgot about this after the trace_entries workaround. The task no longer has any RPMS. Will the latest 2.6.29 F10 kernel do?
I'm not sure, I've not looked at the contents of the build, but if the kernel version in F-10 is later than the tag that the above patches got added under, it should be sufficient.
I tried kernel-2.6.29.1-30.fc10.x86_64 and it still eats up a lot of tracing memory. Sprinkling some command line parameters freed up the memory, so it's definitely the same problem.
Created attachment 340424 [details] patch to reduce ftrace memory before I forget, heres the patch that I backported
http://koji.fedoraproject.org/koji/taskinfo?taskID=1309987 I'm rebuilding the pacakge for you to test on.
Seems to be working. Memory usage is low, and the command line parameters have no effect. I think this bug is squashed. :) When will the .29 kernels be pushed for f10?
Not sure, thats up to the kernel maintainers. I'll check this in and try to ping them about it.
FWIW, I just checked and the last kernel update that was submitted was 4/14/09, so you just missed the last update, and this will have to wait for the next one.
kernel-2.6.29.1-42.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/kernel-2.6.29.1-42.fc10
kernel-2.6.29.1-42.fc10 has been pushed to the Fedora 10 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-3999
kernel-2.6.29.2-52.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/kernel-2.6.29.2-52.fc10
kernel-2.6.29.2-52.fc10 has been pushed to the Fedora 10 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-4132
kernel-2.6.29.3-60.fc10,hal-0.5.12-15.20081027git.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/kernel-2.6.29.3-60.fc10,hal-0.5.12-15.20081027git.fc10
kernel-2.6.29.3-60.fc10, hal-0.5.12-15.20081027git.fc10 has been pushed to the Fedora 10 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel hal'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-4818
kernel-2.6.29.4-75.fc10,hal-0.5.12-15.20081027git.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/kernel-2.6.29.4-75.fc10,hal-0.5.12-15.20081027git.fc10
kernel-2.6.29.4-75.fc10, hal-0.5.12-15.20081027git.fc10 has been pushed to the Fedora 10 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel hal'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-5527
kernel-2.6.29.5-84.fc10,hal-0.5.12-15.20081027git.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/kernel-2.6.29.5-84.fc10,hal-0.5.12-15.20081027git.fc10
kernel-2.6.29.5-84.fc10, hal-0.5.12-15.20081027git.fc10 has been pushed to the Fedora 10 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel hal'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-6717
kernel-2.6.29.6-93.fc10,hal-0.5.12-15.20081027git.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/kernel-2.6.29.6-93.fc10,hal-0.5.12-15.20081027git.fc10
kernel-2.6.29.6-93.fc10, hal-0.5.12-15.20081027git.fc10 has been pushed to the Fedora 10 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel hal'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-7573
kernel-2.6.29.6-97.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/kernel-2.6.29.6-97.fc10
kernel-2.6.29.6-97.fc10 has been pushed to the Fedora 10 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-8697
kernel-2.6.29.6-99.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/kernel-2.6.29.6-99.fc10
When are this kernels actually going to become updates? They seem to get stuck in updates-testing.
kernel-2.6.29.6-99.fc10 has been pushed to the Fedora 10 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-8870
This message is a reminder that Fedora 10 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 10. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '10'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 10's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 10 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.