Description of problem: After a few days slab allocation is 750MB and still increasing. Most of it is in bio and biovec-1 Version-Release number of selected component (if applicable): at least from 2.6.14-1.1773_FC5 up to kernel-2.6.15-1.1955_FC5 Additional info: My system is on logical volumes on the top of MD RAID1 arrays including the swap. I'll attach the list of loaded modules and a dump of both meminfo and slabinfo.
Created attachment 125207 [details] list of loaded modules
Created attachment 125208 [details] /proc/meminfo
Created attachment 125209 [details] /proc/slabinfo
*** Bug 183017 has been marked as a duplicate of this bug. ***
FYI, to do some testing, I upgrade FC5-test to vanilla kernel 2.6.16-rc4 and the problem remains. Then downgraded to vanilla kernel 2.6.14.7 and the problems disappeared. So perhaps is a problem with bio on mainstream kernel?
of course 2.6.14.7 doesn't work well on FC5 (mainly for a newer udev). I did some tests on 2.6.15-rcX kernels, and found out that until 2.6.15-rc5 all is ok, the things break on 2.6.15-rc6. Hope this can be useful.
This patch: commit 3795bb0fc52fe2af2749f3ad2185cb9c90871ef8 Author: NeilBrown <neilb> Date: Mon Dec 12 02:39:16 2005 -0800 [PATCH] md: fix a use-after-free bug in raid1 Who would submit code with a FIXME like that in it !!!! Signed-off-by: Neil Brown <neilb> Signed-off-by: Andrew Morton <akpm> Signed-off-by: Linus Torvalds <torvalds> Causes the problem. I think this now should go to kernel developers. The patch correctly release the bio later in the function (before was freed and then used), but adds a IF(blah==NULL) to fire the bio_put ... I'm by no means a kernel expert, but why the IF is needed?
The bio_put after the if statement doesn't look bad. You want to do it in the cases where the command "r1_bio->bios[mirror] = NULL;" was called previously. I believe the problem maybe due an exit point prior to the bio_put. Here's the piece of code: if (test_bit(R1BIO_BarrierRetry, &r1_bio->state)) { reschedule_retry(r1_bio); /* Don't dec_pending yet, we want to hold * the reference over the retry */ return 0; } A bio_put (with the test?) may be needed just before return. I'll give it a try tonight.
mmmh so why previously wasn't done? I mean, previously bio_put was called and the bio used again, so they simply moved bio_put after bios usage (use after free problem). But why they added the IF? what if the statement != NULL ? bio_put is never called... so slab leaking. I rebuilt latest 2.6.15 kernel, just deleting the if statement, and the machine seems to not have a single problem right now. but again, I'm not a kernel expert, so perhaps the If is right but some other cases must be considered.
should be fixed in tomorrows rawhide. (Grab it early from http://people.redhat.com/davej/kernels/Fedora/devel)
*** Bug 183555 has been marked as a duplicate of this bug. ***
It is definitely fixed in 2008, but today's rawhide still has 1996.