496847 – [Patch] jbd slab cache creation/deletion is racey

Bug 496847 - [Patch] jbd slab cache creation/deletion is racey

Summary: [Patch] jbd slab cache creation/deletion is racey

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Josef Bacik
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	542832 (view as bug list)
Depends On:
Blocks:	533192 5.5TechNotes-Updates 553132 645653
TreeView+	depends on / blocked

Reported:	2009-04-21 13:04 UTC by Issue Tracker
Modified:	2018-10-27 15:37 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-03-30 06:53:48 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2010:0178	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update	2010-03-29 12:18:21 UTC

Description Issue Tracker 2009-04-21 13:04:10 UTC

Escalated to Bugzilla from IssueTracker

Comment 1 Issue Tracker 2009-04-21 13:04:12 UTC

The jbd slab cache creation/deletion is racey.  If multiple jbd based filesystems are mounted concurrently, and there are no other jbd based filesystems already mounted.  Then we can race creating the slab caches since jbd_slab[] is not locked.  This is not commonly observed because typically /root is mounted early with a jbd based filesystem making the race impossible.  On our diskless systems /root does not use the jbd but we do have attached storage which does, and which is mounted in parallel.  Basically our setup is similar to what may be found in a NAS style appliance.

The upstream patch that appears to fix it is:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c089d490dfbf53bc0893dc9ef57cf3ee6448314d

Another RHEL5 specific approach to fix the problem is attached. 
This event sent from IssueTracker by dejohnso  [Support Engineering Group]
 issue 286416

Comment 2 Issue Tracker 2009-04-21 13:04:13 UTC

Issue Registered (Severity: 3)
This event sent from IssueTracker by dejohnso  [Support Engineering Group]
 issue 286416

Comment 3 Issue Tracker 2009-04-21 13:04:14 UTC

Date: Wed, 15 Apr 2009 16:37:13 -0700
From: Ben Woodard <bwoodard>
Reply-To: bwoodard
Organization: Red Hat Inc.
User-Agent: Thunderbird 2.0.0.21 (X11/20090310)
MIME-Version: 1.0
To: Brian Behlendorf <behlendorf1>
CC: "Mark A. Grondona" <mgrondona>
Subject: Re: JBD Race
References: <200904151226.09733.behlendorf1>
In-Reply-To: <200904151226.09733.behlendorf1>
X-Enigmail-Version: 0.95.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brian Behlendorf wrote:
> Can you open a issue with redhat for the following RHEL jbd issue. 
I've 
> proposed a minimal impact fix in the follow lustre bugzilla, and Andreas
was 
> nice enough to provide a reference to where this was fixed properly
upstream.  
> Which ever way redhat wants to fix it is fine with me.  This is pretty
much a 
> no brainer but it would be very nice to have it fixed in the next
errata.
> 
> https://bugzilla.lustre.org/show_bug.cgi?id=19184
> 

Done. I sent that up and bounced it off of one of our FS engineers which
happens to be my friend. I don't know what the preferred approach is
going to be. Since it is a one time cost the extra locks wouldn't be
much of a penalty.

The upstream fix is obviously a different way to fix the problem. I kind
of like those fixes for problems where they get rid of the code with the
bug in it.

I don't know if they will go for the small targeted fix with no future
or if they will want the bigger potentially more disruptive upstream
fix. However, since that path gets exercised on each and every boot of a
practically every machine it is fairly easy to qualify and test.

One thing that I wonder about is if the same problem exists in the jbd2
used by ext4.

- -ben

woodard assigned to issue for LLNL (HPC).
Status set to: Waiting on Tech

This event sent from IssueTracker by dejohnso  [Support Engineering Group]
 issue 286416

Comment 4 Issue Tracker 2009-04-21 13:04:16 UTC

Date problem reported: 4-15-09

Problem: jbd slab creation / deletion is racy.  Customer has put together
a patch to fix this based on upstream and would like for us to consider
adding the fix to a future kernel errata. 

SEG,

Can you guys do a sanity check on the attached patch from LLNL?  

What this patch is is a RHEL5-specific approach to fixing some racy
behavior with jbd slab cache creation/deletion.

The upstream approach to fixing the problem, as mentioned in the summary,
is here:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c089d490dfbf53bc0893dc9ef57cf3ee6448314d

If everything looks good, can we get this into a BZ for the kernel guys?

Thanks.




Issue escalated to Support Engineering Group by: kbaxley.
Internal Status set to 'Waiting on SEG'
Summary edited.

This event sent from IssueTracker by dejohnso  [Support Engineering Group]
 issue 286416

Comment 5 Issue Tracker 2009-04-21 13:04:18 UTC

Hi,

I have looked at this patch and do not see any issues as far as it working
as designed however development obviously needs to look into this as we are
changing where the journaling is being allocated in memory.  I will elevate
this to BZ.

Debbie


This event sent from IssueTracker by dejohnso  [Support Engineering Group]
 issue 286416

Comment 6 Phil Knirsch 2009-05-05 11:50:37 UTC

Reassigning to kernel component.

Thanks & regards, Phil

Comment 9 RHEL Program Management 2009-09-25 17:38:54 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Josef Bacik 2009-10-08 19:47:36 UTC

Any response to my question in c#8?

Comment 11 Ben Woodard 2009-10-08 20:04:00 UTC

checking again.

Comment 12 Ben Woodard 2009-10-08 20:12:55 UTC

Turns out we have a reproducer. However, it isn't with RHEL's kernel it includes mounting Lustre FSs therefore the race is real and not theoretical. As for the locking being gratuitous it does fix the problem. So somewhere your analysis is flawed. I can probably get you a back trace of a non RHEL kernel if you would like.

Comment 13 Ben Woodard 2009-10-08 20:15:40 UTC

https://bugzilla.lustre.org/show_bug.cgi?id=19184

Comment 14 Ben Woodard 2009-10-08 20:21:30 UTC

On the other hand he does say that they really only saw the bug appear during creation. So the locking may be more heavy handed than needed but he said "it seemed right"

Comment 15 Josef Bacik 2009-10-09 02:38:38 UTC

Yes the locking during the creation/deletion is obviously needed, I'm talking about the locking for actually making the allocations, which is what seems gratuitous since its already properly protected.  So if its a matter of "seemed right" then we'd rather not take that part just in case of a performance regression.  If he had some other reason for protecting allocations and free's we'd like to hear what it was, becaues it doesn't seem necessary, just the locking for the actual slab creation/deletion would be what is necessary.

Comment 18 Josef Bacik 2009-12-01 14:08:59 UTC

*** Bug 542832 has been marked as a duplicate of this bug. ***

Comment 19 Don Zickus 2009-12-11 19:26:45 UTC

in kernel-2.6.18-179.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.

Comment 24 Chris Ward 2010-02-11 10:30:51 UTC

~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 30 errata-xmlrpc 2010-03-30 06:53:48 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Comment 32 Bernd Schubert 2010-06-14 16:53:23 UTC

While I just compare the patch that landed in -164.15.1 compared to the patch in the Lustre bugzilla, I notice that chunks in jbd_slab_alloc() and jbd_slab_free() in the RedHat patch are missing. Was that intentional?

 void * jbd_slab_alloc(size_t size, gfp_t flags)
 {
+       void *ptr;
        int idx;

+       down_read(&jbd_slab_lock);
        idx = JBD_SLAB_INDEX(size);
        BUG_ON(jbd_slab[idx] == NULL);
-       return kmem_cache_alloc(jbd_slab[idx], flags | __GFP_NOFAIL);
+       ptr = kmem_cache_alloc(jbd_slab[idx], flags | __GFP_NOFAIL);
+       up_read(&jbd_slab_lock);
+
+       return ptr;
 }

 void jbd_slab_free(void *ptr,  size_t size)
 {
        int idx;

+       down_read(&jbd_slab_lock);
        idx = JBD_SLAB_INDEX(size);
        BUG_ON(jbd_slab[idx] == NULL);
        kmem_cache_free(jbd_slab[idx], ptr);
+       up_read(&jbd_slab_lock);
 }

 /*


Thanks,
Bernd

Comment 33 Josef Bacik 2010-06-14 17:48:01 UTC

Yes, that locking was too much, we didn't want to adversely affect performance, so it was removed.  It doesn't make it any less safe, as we only want to serialize the creation/deletion of the slabs.

Comment 34 Bernd Schubert 2010-06-18 18:07:25 UTC

Thanks for your help Josef!

Note You need to log in before you can comment on or make changes to this bug.