Created attachment 432499 [details] /proc/meminfo output right before rebooting Description of problem: After running the system for 22 days, approx 1.5GB of memory had slowly vanished. Version-Release number of selected component (if applicable): kernel-2.6.33.5-124.fc13.x86_64 seems to apply to all recent versions How reproducible: always Steps to Reproduce: 1. boot and wait 2. 3. Actual results: Expected results: Additional info: Here is the best I can give you, unfortunately after 22 days my system was using far more memory to carry the same process load than it had immediately after booting. So I shut down X and killed every process one-by-one, manually unloaded all unloadable modules. This still showed 1.5GB used, not counting buffers or cache. I captured: /proc/meminfo /proc/slabinfo 'lsmod' and 'ps' after rebooting, I repeated the experiment, killing all processes etc and recaptured the files. The first set of files is labeled "afterunload" because it was after unloading modules, the second set is "afterboot" indicating after the reboot. This probem is real, I received the following error yesterday: Jul 15 15:35:44 gandalf kernel: Xorg: page allocation failure. order:0, mode:0x2 Jul 15 15:35:44 gandalf kernel: Pid: 2266, comm: Xorg Not tainted 2.6.33.5-124.fc13.x86_64 #1 The system also becomes slow and unresponsive, the GNOME memory meter shows full memory utilization. This system is x86_64 with 4GB ram installed. I'm happy to do anything to try to track this down, but I realize it may be impossible. I shall attach the 8 documents now.
Created attachment 432500 [details] /proc/meminfo output right after rebooting
Created attachment 432502 [details] /proc/slabinfo before rebooting
Created attachment 432503 [details] /proc/slabinfo after rebooting
Created attachment 432504 [details] lsmod output before reboot
Created attachment 432505 [details] lsmod output after reboot
This is almost certainly happening because you're running with no swap space enabled.
i've added swap (swap was disabled because i was unable to install w/swap due to anaconda bugs) i'll let you know in 25 days or so ;-)
Created attachment 433766 [details] slabinfo
Created attachment 433767 [details] meminfo
Created attachment 433768 [details] lsmod
Created attachment 433769 [details] ps
I am having the same issue and have added the files that David mentions.
this memory leak has repeated. i have been running a script from cron that tracks vm statistics and process statistics every minute (smem -w, free and ps -auwx) unfortunately the memory leak of 900MB is "instantaneous", i.e. it occurs between two snapshots 1 minute apart. unfortunately, the "ps" output before and after show nothing except that many processes were swapped out to make room for the 900MB "gulp" i was using the computer at the time and was either: - clicking around in "rhythmbox" or - clicking around in firefox on "images.google.com" I can see some cache files in firefox directory during the minute the leak occurred, which seem to be some google javascript ajax type calls. i now have set up an every second monitor which will alert me within a second when the kernel dynamic memory usage grows over 1GB so I can know which process is responsible. i will update this bug when i have more info.
this is now entirely reproducible on two different machines with radeon cards, and not reproducible on one machine with nvidia. all running fedora 13 x86_64. i have set the "component" to the xorg ati, but the bug is most likely in radeon.ko, which is in the "kernel" package, so I'll leave it to you guys to get this right. both systems fully updated as of today: kernel: 2.6.33.8-149.fc13.x86_64 xorg: xorg-x11-drv-ati-6.13.0-1.fc13.x86_64 1st machine, PCI:*(0:1:0:0) 1002:95c5:1028:9018 ATI Technologies Inc RV620 LE [Radeon HD 3450] rev 0, Mem @ 0xd0000000/268435456, 0xfdee0000/65536, I/O @ 0x0000de00/256, BIOS @ 0x????????/131072 2nd machine, PCI:*(0:1:5:0) 1002:9610:1028:02e2 ATI Technologies Inc Radeon HD 3200 Graphics rev 0, Mem @ 0xd0000000/268435456, 0xfeaf0000/65536, 0xfe900000/1048576, I/O @ 0x0000d000/256, BIOS @ 0x????????/131072 steps to reproduce. 1.reboot. 2.login to gnome 2a. run "while true; do smem -w; sleep 1; done" in a terminal, watch the "kernel dynamic memory" row, in the "noncache" column. 3.open firefox 4.go to "images.google.com" 5.search for "penico" ;-) 6.using the scrollbar, scroll to the bottom BOOM there goes 200 megs, scroll to the top BOOM 200 megs. repeat as necessary If you close firefox, or even X, or unload modules or anything, you never get the memory back until you reboot
i found a leak in r600_cs.c. p->track is allocated at line 763 but never freed. i'll attach a patch, but I have no idea if this could be causing my problem or not. can someone help?
Created attachment 441998 [details] fix apparent memory leak for r600 based radeon
(In reply to comment #16) > Created attachment 441998 [details] > fix apparent memory leak for r600 based radeon Can you build a kernel with your fix and try it?
are you using nomodeset on the command line? if so why? the fix makes sense but running nomodeset doesn't for r600 hardware.
in re #17: i'm going to try the latest in updates-testing, kernel-2.6.34.6-47.fc13.x86_64 examining that code, the p->track is not leaking (the r600_cs.c is very much updated in this kernel) so if that's my problem, it'll be fixed just via that update. as to #18: i don't have nomodeset on the command line, cmd line is: ro root=/dev/mapper/VolGroup00-root_f13_fs rd_LVM_LV=VolGroup00/root_f13_fs rd_NO_LUKS LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rhgb quiet I'll let you all know if the new kernel fixes it. keep your fingers crossed.
bad news. still happens in the latest kernel as per comment#19 above i have some more info, though this is subjective: it seems the problem is with the actual scrolling. the page i have has a ton of images on it (images.google.com), and when I scroll the screen "tears" as it scrolls, and it is at this point that 300M or so of dynamic memory is consumed (permanently). if there's any way to turn on debugging, or strace xorg, or compile the kernel with some flags or anything like that, I'm game.
this debian bug seems absolutely identical http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=591061
Can you try scrolling with the keyboard as opposed to with the mouse? does it react any different? I suspect we are missing a cleanup on a signal error path in this case.
it still happens exactly the same way. there is still the very noticable "tearing" happening in the images as they scroll and about 1gb memory in 5 seconds gone. I was looking at the HTML source for the webpage, (images.google.com) and I noticed it's using some strange html, it's using a "canvas" and also img with a "data:blah" url (i.e. base64 encoded image in the url). should I be looking at the r600_*.c or the radeon_*.c code (or both), e.g. which *_cs.c should I be looking at? based on this hardware: PCI:*(0:1:0:0) 1002:95c5:1028:9018 ATI Technologies Inc RV620 LE [Radeon HD 3450] rev 0, Mem @ 0xd0000000/268435456, 0xfdee0000/65536, I/O @ 0x0000de00/256, BIOS @ 0x????????/131072
if you are using KMS you should only need to look at both, but you can ignore the legacy functions in r600_cs.c
since KMS keeps coming up, I just verified the bug does NOT occur if I boot with nomodeset. hth
is there any way to get strace to show better ioctl tracing for the drm calls?
ioctl tracing won't really help us here, can you mount -t debugfs none /sys/kernel/debug and see what /sys/kernel/debug/dri/0/gem_objects contents is?
doesn't seem to show a leak. before firefox is started (clean boot): 125 objects 103731200 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total after firefox/x/drm has leaked 1gb and firefox is closed again: 196 objects 104062976 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total
I have the same problem with a radeon HD3200. Scrolling the uri mentioned in the debian bug report http://code.google.com/p/chromium/issues/detail?id=8991 or google images seems to be the best way to trigger it. I have tested both 32 and 64-bit versions of Fedora 13. The memory leak is much bigger in 64-bit version but both version seems to stop leaking memory after a certain limit is reached. For 32-bit version leaking will stop here after noncached kernel dynamic memory reaches around 380M, for 64-bit version it will stop around 790M. Also looking at /proc/vmallocinfo I see a lot of entries like this: 0xffffc9000032d000-0xffffc90000332000 20480 ttm_tt_create+0xfc/0x15b [ttm] pages=4 vmalloc N0=4 these entries only seem to appear after scrolling the mentioned web pages and do not seem to be reclaimed.
I also see many of the ttm_tt_create lines, as well as other drm related lines. I just leaked about 1.7GB on a different website - this one had a javascript animated horizontal scrolling "headline". I'm about to attach a text file of the contents of: cat /proc/vmallocinfo |egrep '(ttm|drm)'
Created attachment 446053 [details] contents of /proc/vmallocinfo related to ttm or drm
for the record, the leak does not occur when viewing the sites with google chrome, only with firefox. not that it's a firefox bug, but it's just an FYI. this bug forces me to reboot everytime i accidentally hit a "bad" webpage. pretty disastrous
Hi, I'm the reporter at debian.org. I'm also seeing a growing list of ttm/drm related entries in vmallocinfo after triggering this bug. You can find my vmallocinfo at three different times here: http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=61;filename=vmallocinfo.txt;att=1;bug=591061 Regards.
Created attachment 449707 [details] output of /sys/kernel/debug/dri/0/radeon_vram_mm as requested on IRC, ouput of output of /sys/kernel/debug/dri/0/radeon_vram_mm, radeon_gtt_mm and ttm_page_pool
Created attachment 449708 [details] output of radeon_gtt_mm
Created attachment 449709 [details] ttm_page_pool
Created attachment 449837 [details] diff from vanilla 2.6.35 against Ubuntu 2.6.35 kernel (only relevant part) The memory leak does not appear to be present in Ubuntu Maverick. I did a diff of drivers/gpu/drm/radeon/* from the Ubuntu kernel against vanilla v2.6.35, and after filtering out irrelevant parts I ended up with the attached diff. After applying attached diff to vanilla 2.6.35 (but should also apply to newer versions) I have not seen any memory leak so far. I have no idea of this patch is correct or not, but maybe it can point the developers in the right direction.
yeah! this fixes it here also. I found the complete patch at https://patchwork.kernel.org/patch/95248/
yeah its something in the eviction code that is leaking, I worked out that much today before I ran into another small leak I wanted to fix first, this patch changes the buffer allocation enough that the eviction path is probably not getting hit as much, hopefully tomorrow I can find the actual problem.
http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commitdiff_plain;h=0fbecd400dd0a82d465b3086f209681e8c54cb0f anyone care to try a kernel with that fix on it? without the workaround fix that is in the other kernel
Hi Dave, I just applied your patch onto linux-2.6.36-rc4 and unfortunately the leak still seems to be there - can anyone else confirm this? I'm seeing the exact same behaviour when scrolling in Firefox, etc. I uploaded my latest vmallocinfo to the report on bugs.debian.org. In case someone finds it useful I can share my debian kpkg.
today i tried my drm-fixes and couldn't reproduce with the google link that I was reproducing with fine yesterday with an older krenel I backed out the most likely patches but can't figure it out, tomorrow I'll go back to yesterdays kernel and try again, hopefully the chromium page hasn't changed content or something.
I'm always running these testing kernels (drm-radeon-testing, drm-fixes) which contained the fix already and the problem is still there.
I just tried with a patched clean -rc5 just in case (I had previously applied some other patches to -rc4) and the problem is still there.
Created attachment 450370 [details] proposed patch to remove a race condition. Can you try this patch please?
It seems fixed! Thank you very much for your effort. I'm using -rc5 patched with both your latest attachment and the previous fix (comment #40). Let me know if you would like me to retest without the previous patch if necessary.
confirmed - thanks!
Just to confirm the latest patch is all that's needed. Applied it to the current fc13 kernel 2.6.34.7-56.fc13 and the problem's gone. Thanks! Claimed back about half a gig on my machine.
Created attachment 450906 [details] alternate fix can someone test this fix instead of the one I posted earlier?
Just tested attachment 450906 [details] and the memory leak is still gone, but I seem to be getting sporadic screen corruption. I'm falling back to the previous patch to compare.
False alarm, I guess - I'm getting the same corruption with both patches, I just hadn't noticed it before. Is anyone else getting this? I'm seeing it on gkrellm2 specifically, in case anyone else is using it. I'm guessing it could be a different bug though. Anyway, the latest patch seems as good as the previous.
to the poster in comment #50 and #51, is it possible your recompiled modules aren't loading properly? I experience corruption when KMS is not enabled, but also the memory leak does not occur without KMS. can you verify the patch ttm.ko is loading? dmesg | grep ttm
Hi David, nox:/home/perseguidor# dmesg | grep -i ttm [ 4.375781] [TTM] Zone kernel: Available graphics memory: 1965756 kiB. [ 4.375783] [TTM] Initializing pool allocator. ttm is also shown on lsmod: nox:/home/perseguidor# lsmod | grep -i ttm ttm 54479 1 radeon drm 191010 4 radeon,ttm,drm_kms_helper Corruption is still there, but is very sporadic and happens on very specific apps. Perhaps this belongs in another bug report?
i believe a fix for this has been incorporated into the vanilla source, see: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1df6a2ebd75067aefbdf07482bf8e3d0584e04ee is it possible to get a fix incorporated into a fedora-13 errata kernel?
this is still not fixed in F14 latest kernel.
Thanks for identifying the commit, made my life much easier. Committed to F-14 now, sorry for taking so long.
kernel-2.6.35.9-64.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/kernel-2.6.35.9-64.fc14
Please post comments, and karma, about your experience with kernel-2.6.35.9-64.fc14 in bohdi (see link in comment 57).
kernel-2.6.35.9-64.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report.