Bug 182970 - slab leaking (bio and biovec-1)
slab leaking (bio and biovec-1)
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Dave Jones
Brian Brock
:
: 183017 183555 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-02-24 15:12 EST by Charles Lopes
Modified: 2015-01-04 17:25 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-01 23:12:55 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
list of loaded modules (4.35 KB, text/plain)
2006-02-24 15:13 EST, Charles Lopes
no flags Details
/proc/meminfo (676 bytes, text/plain)
2006-02-24 15:14 EST, Charles Lopes
no flags Details
/proc/slabinfo (28.72 KB, text/plain)
2006-02-24 15:15 EST, Charles Lopes
no flags Details

  None (edit)
Description Charles Lopes 2006-02-24 15:12:20 EST
Description of problem:
After a few days slab allocation is 750MB and still increasing. Most of it is in
bio and biovec-1

Version-Release number of selected component (if applicable):
at least from 2.6.14-1.1773_FC5 up to kernel-2.6.15-1.1955_FC5

Additional info:
My system is on logical volumes on the top of MD RAID1 arrays including the
swap. I'll attach the list of loaded modules and a dump of both meminfo and
slabinfo.
Comment 1 Charles Lopes 2006-02-24 15:13:30 EST
Created attachment 125207 [details]
list of loaded modules
Comment 2 Charles Lopes 2006-02-24 15:14:20 EST
Created attachment 125208 [details]
/proc/meminfo
Comment 3 Charles Lopes 2006-02-24 15:15:11 EST
Created attachment 125209 [details]
/proc/slabinfo
Comment 4 Dave Jones 2006-02-27 02:21:53 EST
*** Bug 183017 has been marked as a duplicate of this bug. ***
Comment 5 Matteo Brancaleoni 2006-02-27 15:39:08 EST
FYI, to do some testing, I upgrade FC5-test to vanilla kernel 2.6.16-rc4 and the
problem remains. Then downgraded to vanilla kernel 2.6.14.7 and the problems
disappeared. 

So perhaps is a problem with bio on mainstream kernel?
Comment 6 Matteo Brancaleoni 2006-02-28 09:36:03 EST
of course 2.6.14.7 doesn't work well on FC5 (mainly for a newer udev).
I did some tests on 2.6.15-rcX kernels, and found out that until 2.6.15-rc5 all
is ok, the things break on 2.6.15-rc6.
Hope this can be useful.
Comment 7 Matteo Brancaleoni 2006-02-28 11:15:29 EST
This patch:

commit 3795bb0fc52fe2af2749f3ad2185cb9c90871ef8
Author: NeilBrown <neilb@suse.de>
Date:   Mon Dec 12 02:39:16 2005 -0800

    [PATCH] md: fix a use-after-free bug in raid1
    
    Who would submit code with a FIXME like that in it !!!!
    
    Signed-off-by: Neil Brown <neilb@suse.de>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Causes the problem. I think this now should go to kernel developers.
The patch correctly release the bio later in the function (before was freed and
then used), but adds a IF(blah==NULL) to fire the bio_put ...
I'm by no means a kernel expert, but why the IF is needed?

Comment 8 Charles Lopes 2006-02-28 12:01:49 EST
The bio_put after the if statement doesn't look bad. You want to do it in the
cases where the command "r1_bio->bios[mirror] = NULL;" was called previously. I
believe the problem maybe due an exit point prior to the bio_put.

Here's the piece of code:

                if (test_bit(R1BIO_BarrierRetry, &r1_bio->state)) {
                        reschedule_retry(r1_bio);
                        /* Don't dec_pending yet, we want to hold
                         * the reference over the retry
                         */
                        return 0;
                }

A bio_put (with the test?) may be needed just before return. I'll give it a try
tonight.
Comment 9 Matteo Brancaleoni 2006-02-28 17:22:16 EST
mmmh
so why previously wasn't done?

I mean, previously bio_put was called and the bio used again,
so they simply moved bio_put after bios usage (use after free problem).

But why they added the IF?
what if the statement != NULL ? bio_put is never called... so slab leaking.

I rebuilt latest 2.6.15 kernel, just deleting the if statement, and the machine
seems to not have a single problem right now.

but again, I'm not a kernel expert, so perhaps the If is right but some other
cases must be considered.
Comment 10 Dave Jones 2006-03-01 23:12:55 EST
should be fixed in tomorrows rawhide. (Grab it early from
http://people.redhat.com/davej/kernels/Fedora/devel)
Comment 11 Dave Jones 2006-03-01 23:16:25 EST
*** Bug 183555 has been marked as a duplicate of this bug. ***
Comment 12 Alexandre Oliva 2006-03-02 08:30:34 EST
It is definitely fixed in 2008, but today's rawhide still has 1996.

Note You need to log in before you can comment on or make changes to this bug.