Bug 182970 - slab leaking (bio and biovec-1)
Summary: slab leaking (bio and biovec-1)
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
(Show other bugs)
Version: rawhide
Hardware: x86_64 Linux
medium
medium
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Brian Brock
URL:
Whiteboard:
Keywords:
: 183017 183555 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-02-24 20:12 UTC by Charles Lopes
Modified: 2015-01-04 22:25 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-02 04:12:55 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
list of loaded modules (4.35 KB, text/plain)
2006-02-24 20:13 UTC, Charles Lopes
no flags Details
/proc/meminfo (676 bytes, text/plain)
2006-02-24 20:14 UTC, Charles Lopes
no flags Details
/proc/slabinfo (28.72 KB, text/plain)
2006-02-24 20:15 UTC, Charles Lopes
no flags Details

Description Charles Lopes 2006-02-24 20:12:20 UTC
Description of problem:
After a few days slab allocation is 750MB and still increasing. Most of it is in
bio and biovec-1

Version-Release number of selected component (if applicable):
at least from 2.6.14-1.1773_FC5 up to kernel-2.6.15-1.1955_FC5

Additional info:
My system is on logical volumes on the top of MD RAID1 arrays including the
swap. I'll attach the list of loaded modules and a dump of both meminfo and
slabinfo.

Comment 1 Charles Lopes 2006-02-24 20:13:30 UTC
Created attachment 125207 [details]
list of loaded modules

Comment 2 Charles Lopes 2006-02-24 20:14:20 UTC
Created attachment 125208 [details]
/proc/meminfo

Comment 3 Charles Lopes 2006-02-24 20:15:11 UTC
Created attachment 125209 [details]
/proc/slabinfo

Comment 4 Dave Jones 2006-02-27 07:21:53 UTC
*** Bug 183017 has been marked as a duplicate of this bug. ***

Comment 5 Matteo Brancaleoni 2006-02-27 20:39:08 UTC
FYI, to do some testing, I upgrade FC5-test to vanilla kernel 2.6.16-rc4 and the
problem remains. Then downgraded to vanilla kernel 2.6.14.7 and the problems
disappeared. 

So perhaps is a problem with bio on mainstream kernel?

Comment 6 Matteo Brancaleoni 2006-02-28 14:36:03 UTC
of course 2.6.14.7 doesn't work well on FC5 (mainly for a newer udev).
I did some tests on 2.6.15-rcX kernels, and found out that until 2.6.15-rc5 all
is ok, the things break on 2.6.15-rc6.
Hope this can be useful.

Comment 7 Matteo Brancaleoni 2006-02-28 16:15:29 UTC
This patch:

commit 3795bb0fc52fe2af2749f3ad2185cb9c90871ef8
Author: NeilBrown <neilb@suse.de>
Date:   Mon Dec 12 02:39:16 2005 -0800

    [PATCH] md: fix a use-after-free bug in raid1
    
    Who would submit code with a FIXME like that in it !!!!
    
    Signed-off-by: Neil Brown <neilb@suse.de>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Causes the problem. I think this now should go to kernel developers.
The patch correctly release the bio later in the function (before was freed and
then used), but adds a IF(blah==NULL) to fire the bio_put ...
I'm by no means a kernel expert, but why the IF is needed?



Comment 8 Charles Lopes 2006-02-28 17:01:49 UTC
The bio_put after the if statement doesn't look bad. You want to do it in the
cases where the command "r1_bio->bios[mirror] = NULL;" was called previously. I
believe the problem maybe due an exit point prior to the bio_put.

Here's the piece of code:

                if (test_bit(R1BIO_BarrierRetry, &r1_bio->state)) {
                        reschedule_retry(r1_bio);
                        /* Don't dec_pending yet, we want to hold
                         * the reference over the retry
                         */
                        return 0;
                }

A bio_put (with the test?) may be needed just before return. I'll give it a try
tonight.


Comment 9 Matteo Brancaleoni 2006-02-28 22:22:16 UTC
mmmh
so why previously wasn't done?

I mean, previously bio_put was called and the bio used again,
so they simply moved bio_put after bios usage (use after free problem).

But why they added the IF?
what if the statement != NULL ? bio_put is never called... so slab leaking.

I rebuilt latest 2.6.15 kernel, just deleting the if statement, and the machine
seems to not have a single problem right now.

but again, I'm not a kernel expert, so perhaps the If is right but some other
cases must be considered.

Comment 10 Dave Jones 2006-03-02 04:12:55 UTC
should be fixed in tomorrows rawhide. (Grab it early from
http://people.redhat.com/davej/kernels/Fedora/devel)

Comment 11 Dave Jones 2006-03-02 04:16:25 UTC
*** Bug 183555 has been marked as a duplicate of this bug. ***

Comment 12 Alexandre Oliva 2006-03-02 13:30:34 UTC
It is definitely fixed in 2008, but today's rawhide still has 1996.


Note You need to log in before you can comment on or make changes to this bug.