Bug 837155 - jbd can't process 512B block size correctly, make system crash. [NEEDINFO]
jbd can't process 512B block size correctly, make system crash.
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Red Hat Kernel Manager
Red Hat Kernel QE team
Depends On:
  Show dependency treegraph
Reported: 2012-07-02 22:44 EDT by xiaowei.hu
Modified: 2014-06-02 09:21 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2014-06-02 09:21:53 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
pm-rhel: needinfo? (xiaowei.hu)

Attachments (Terms of Use)
make jbd fit for 512B block size. (899 bytes, patch)
2012-07-02 22:44 EDT, xiaowei.hu
no flags Details | Diff

  None (edit)
Description xiaowei.hu 2012-07-02 22:44:19 EDT
Created attachment 595861 [details]
make jbd fit for 512B block size.

Description of problem:
Random crash system , when testing ocfs2 with 512B block size , even between 2 different mounts.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Need only a single node to reproduce this
1.mkfs.ocfs2 -b 512
2.mount , then dd some file on ocfs2 volume, then umount
3.do the test again with mkfs.ocfs2 -b 1024
Actual results:
I am sure this bug is about jbd.
I installed an OS with ext2 as root fs, so that I could remove the jbd module
together with ocfs2 module when unload .then do the test:
1.mkfs.ocfs2 -b 512
2.mount , then dd , then umount
3./etc/init.d/o2cb unload , this will also remove jbd
4.do the test again with mkfs.ocfs2 -b 1024

Expected results:

Additional info:
I hunt down the root cause:
1. JBD created a new slab queue when the first mount of ocfs2 with 512B blocksize, it name was calculated by 512 >> 11, that's 0, it used jbd_1k as name but 512 as the slab size without this patch.
2. this slab won't be destroied until the jbd module got removed.
3.Next time we mount the ocfs2 volume with 1K blocksize, it name still 1024 >> 11 , got "jbd_1k" , the same name with setp 1 , and  the slab already created , but it's 512B, jbd continue use this slab as 1k this time, sure it will over write the memory, destroy the pointers. here it lead to the crash. 

patch attached.
Comment 1 Ric Wheeler 2012-07-03 14:26:03 EDT
Red Hat does not support OCFS2, but this jbd bug might show up in ext3/4 as well so it is worth investigating.
Comment 2 Zach Brown 2012-07-03 14:35:26 EDT
The analysis of the bug certainly looks plausible.

For what it's worth, these slabs allocations were removed upstream so it's probably reasonable to put a little fix for this bad slab naming bug in RHEL.

commit c089d490dfbf53bc0893dc9ef57cf3ee6448314d
Author: Mingming Cao <cmm@us.ibm.com>
Date:   Tue Oct 16 18:38:25 2007 -0400

    JBD: JBD slab allocation cleanups
    JBD: Replace slab allocations with page allocations
Comment 3 Zach Brown 2012-07-03 15:47:14 EDT
Hmm, and Eric and I realized that ext* has a minimum block size of 1k which is almost certainly why jbd didn't correctly support 512B blocks.
Comment 5 RHEL Product and Program Management 2014-03-07 08:53:18 EST
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.
Comment 6 RHEL Product and Program Management 2014-06-02 09:21:53 EDT
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).

Note You need to log in before you can comment on or make changes to this bug.