Bug 496847 - [Patch] jbd slab cache creation/deletion is racey
[Patch] jbd slab cache creation/deletion is racey
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
urgent Severity high
: rc
: ---
Assigned To: Josef Bacik
Red Hat Kernel QE team
: ZStream
: 542832 (view as bug list)
Depends On:
Blocks: 533192 5.5TechNotes-Updates 553132 645653
  Show dependency treegraph
 
Reported: 2009-04-21 09:04 EDT by Issue Tracker
Modified: 2010-11-22 18:17 EST (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-03-30 02:53:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Issue Tracker 2009-04-21 09:04:10 EDT
Escalated to Bugzilla from IssueTracker
Comment 1 Issue Tracker 2009-04-21 09:04:12 EDT
The jbd slab cache creation/deletion is racey.  If multiple jbd based filesystems are mounted concurrently, and there are no other jbd based filesystems already mounted.  Then we can race creating the slab caches since jbd_slab[] is not locked.  This is not commonly observed because typically /root is mounted early with a jbd based filesystem making the race impossible.  On our diskless systems /root does not use the jbd but we do have attached storage which does, and which is mounted in parallel.  Basically our setup is similar to what may be found in a NAS style appliance.

The upstream patch that appears to fix it is:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c089d490dfbf53bc0893dc9ef57cf3ee6448314d

Another RHEL5 specific approach to fix the problem is attached. 
This event sent from IssueTracker by dejohnso  [Support Engineering Group]
 issue 286416
Comment 2 Issue Tracker 2009-04-21 09:04:13 EDT
Issue Registered (Severity: 3)
This event sent from IssueTracker by dejohnso  [Support Engineering Group]
 issue 286416
Comment 3 Issue Tracker 2009-04-21 09:04:14 EDT
Date: Wed, 15 Apr 2009 16:37:13 -0700
From: Ben Woodard <bwoodard@llnl.gov>
Reply-To: bwoodard@llnl.gov
Organization: Red Hat Inc.
User-Agent: Thunderbird 2.0.0.21 (X11/20090310)
MIME-Version: 1.0
To: Brian Behlendorf <behlendorf1@llnl.gov>
CC: "Mark A. Grondona" <mgrondona@llnl.gov>
Subject: Re: JBD Race
References: <200904151226.09733.behlendorf1@llnl.gov>
In-Reply-To: <200904151226.09733.behlendorf1@llnl.gov>
X-Enigmail-Version: 0.95.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brian Behlendorf wrote:
> Can you open a issue with redhat for the following RHEL jbd issue. 
I've 
> proposed a minimal impact fix in the follow lustre bugzilla, and Andreas
was 
> nice enough to provide a reference to where this was fixed properly
upstream.  
> Which ever way redhat wants to fix it is fine with me.  This is pretty
much a 
> no brainer but it would be very nice to have it fixed in the next
errata.
> 
> https://bugzilla.lustre.org/show_bug.cgi?id=19184
> 

Done. I sent that up and bounced it off of one of our FS engineers which
happens to be my friend. I don't know what the preferred approach is
going to be. Since it is a one time cost the extra locks wouldn't be
much of a penalty.

The upstream fix is obviously a different way to fix the problem. I kind
of like those fixes for problems where they get rid of the code with the
bug in it.

I don't know if they will go for the small targeted fix with no future
or if they will want the bigger potentially more disruptive upstream
fix. However, since that path gets exercised on each and every boot of a
practically every machine it is fairly easy to qualify and test.

One thing that I wonder about is if the same problem exists in the jbd2
used by ext4.

- -ben


woodard assigned to issue for LLNL (HPC).
Status set to: Waiting on Tech

This event sent from IssueTracker by dejohnso  [Support Engineering Group]
 issue 286416
Comment 4 Issue Tracker 2009-04-21 09:04:16 EDT
Date problem reported: 4-15-09

Problem: jbd slab creation / deletion is racy.  Customer has put together
a patch to fix this based on upstream and would like for us to consider
adding the fix to a future kernel errata. 

SEG,

Can you guys do a sanity check on the attached patch from LLNL?  

What this patch is is a RHEL5-specific approach to fixing some racy
behavior with jbd slab cache creation/deletion.

The upstream approach to fixing the problem, as mentioned in the summary,
is here:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c089d490dfbf53bc0893dc9ef57cf3ee6448314d

If everything looks good, can we get this into a BZ for the kernel guys?

Thanks.




Issue escalated to Support Engineering Group by: kbaxley.
Internal Status set to 'Waiting on SEG'
Summary edited.

This event sent from IssueTracker by dejohnso  [Support Engineering Group]
 issue 286416
Comment 5 Issue Tracker 2009-04-21 09:04:18 EDT
Hi,

I have looked at this patch and do not see any issues as far as it working
as designed however development obviously needs to look into this as we are
changing where the journaling is being allocated in memory.  I will elevate
this to BZ.

Debbie


This event sent from IssueTracker by dejohnso  [Support Engineering Group]
 issue 286416
Comment 6 Phil Knirsch 2009-05-05 07:50:37 EDT
Reassigning to kernel component.

Thanks & regards, Phil
Comment 9 RHEL Product and Program Management 2009-09-25 13:38:54 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 10 Josef Bacik 2009-10-08 15:47:36 EDT
Any response to my question in c#8?
Comment 11 Ben Woodard 2009-10-08 16:04:00 EDT
checking again.
Comment 12 Ben Woodard 2009-10-08 16:12:55 EDT
Turns out we have a reproducer. However, it isn't with RHEL's kernel it includes mounting Lustre FSs therefore the race is real and not theoretical. As for the locking being gratuitous it does fix the problem. So somewhere your analysis is flawed. I can probably get you a back trace of a non RHEL kernel if you would like.
Comment 14 Ben Woodard 2009-10-08 16:21:30 EDT
On the other hand he does say that they really only saw the bug appear during creation. So the locking may be more heavy handed than needed but he said "it seemed right"
Comment 15 Josef Bacik 2009-10-08 22:38:38 EDT
Yes the locking during the creation/deletion is obviously needed, I'm talking about the locking for actually making the allocations, which is what seems gratuitous since its already properly protected.  So if its a matter of "seemed right" then we'd rather not take that part just in case of a performance regression.  If he had some other reason for protecting allocations and free's we'd like to hear what it was, becaues it doesn't seem necessary, just the locking for the actual slab creation/deletion would be what is necessary.
Comment 18 Josef Bacik 2009-12-01 09:08:59 EST
*** Bug 542832 has been marked as a duplicate of this bug. ***
Comment 19 Don Zickus 2009-12-11 14:26:45 EST
in kernel-2.6.18-179.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.
Comment 24 Chris Ward 2010-02-11 05:30:51 EST
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.
Comment 30 errata-xmlrpc 2010-03-30 02:53:48 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html
Comment 32 Bernd Schubert 2010-06-14 12:53:23 EDT
While I just compare the patch that landed in -164.15.1 compared to the patch in the Lustre bugzilla, I notice that chunks in jbd_slab_alloc() and jbd_slab_free() in the RedHat patch are missing. Was that intentional?

 void * jbd_slab_alloc(size_t size, gfp_t flags)
 {
+       void *ptr;
        int idx;

+       down_read(&jbd_slab_lock);
        idx = JBD_SLAB_INDEX(size);
        BUG_ON(jbd_slab[idx] == NULL);
-       return kmem_cache_alloc(jbd_slab[idx], flags | __GFP_NOFAIL);
+       ptr = kmem_cache_alloc(jbd_slab[idx], flags | __GFP_NOFAIL);
+       up_read(&jbd_slab_lock);
+
+       return ptr;
 }

 void jbd_slab_free(void *ptr,  size_t size)
 {
        int idx;

+       down_read(&jbd_slab_lock);
        idx = JBD_SLAB_INDEX(size);
        BUG_ON(jbd_slab[idx] == NULL);
        kmem_cache_free(jbd_slab[idx], ptr);
+       up_read(&jbd_slab_lock);
 }

 /*


Thanks,
Bernd
Comment 33 Josef Bacik 2010-06-14 13:48:01 EDT
Yes, that locking was too much, we didn't want to adversely affect performance, so it was removed.  It doesn't make it any less safe, as we only want to serialize the creation/deletion of the slabs.
Comment 34 Bernd Schubert 2010-06-18 14:07:25 EDT
Thanks for your help Josef!

Note You need to log in before you can comment on or make changes to this bug.