Bug 837155 - jbd can't process 512B block size correctly, make system crash.
Summary: jbd can't process 512B block size correctly, make system crash.
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.10
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-07-03 02:44 UTC by xiaowei.hu
Modified: 2023-09-14 01:30 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-02 13:21:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
make jbd fit for 512B block size. (899 bytes, patch)
2012-07-03 02:44 UTC, xiaowei.hu
no flags Details | Diff

Description xiaowei.hu 2012-07-03 02:44:19 UTC
Created attachment 595861 [details]
make jbd fit for 512B block size.

Description of problem:
Random crash system , when testing ocfs2 with 512B block size , even between 2 different mounts.

Version-Release number of selected component (if applicable):


How reproducible:

Steps to Reproduce:
Need only a single node to reproduce this
1.mkfs.ocfs2 -b 512
2.mount , then dd some file on ocfs2 volume, then umount
3.do the test again with mkfs.ocfs2 -b 1024
  
Actual results:
I am sure this bug is about jbd.
I installed an OS with ext2 as root fs, so that I could remove the jbd module
together with ocfs2 module when unload .then do the test:
1.mkfs.ocfs2 -b 512
2.mount , then dd , then umount
3./etc/init.d/o2cb unload , this will also remove jbd
4.do the test again with mkfs.ocfs2 -b 1024


Expected results:


Additional info:
I hunt down the root cause:
1. JBD created a new slab queue when the first mount of ocfs2 with 512B blocksize, it name was calculated by 512 >> 11, that's 0, it used jbd_1k as name but 512 as the slab size without this patch.
2. this slab won't be destroied until the jbd module got removed.
3.Next time we mount the ocfs2 volume with 1K blocksize, it name still 1024 >> 11 , got "jbd_1k" , the same name with setp 1 , and  the slab already created , but it's 512B, jbd continue use this slab as 1k this time, sure it will over write the memory, destroy the pointers. here it lead to the crash. 

patch attached.

Comment 1 Ric Wheeler 2012-07-03 18:26:03 UTC
Red Hat does not support OCFS2, but this jbd bug might show up in ext3/4 as well so it is worth investigating.

Comment 2 Zach Brown 2012-07-03 18:35:26 UTC
The analysis of the bug certainly looks plausible.

For what it's worth, these slabs allocations were removed upstream so it's probably reasonable to put a little fix for this bad slab naming bug in RHEL.

commit c089d490dfbf53bc0893dc9ef57cf3ee6448314d
Author: Mingming Cao <cmm.com>
Date:   Tue Oct 16 18:38:25 2007 -0400

    JBD: JBD slab allocation cleanups
    
    JBD: Replace slab allocations with page allocations

Comment 3 Zach Brown 2012-07-03 19:47:14 UTC
Hmm, and Eric and I realized that ext* has a minimum block size of 1k which is almost certainly why jbd didn't correctly support 512B blocks.

Comment 5 RHEL Program Management 2014-03-07 13:53:18 UTC
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Comment 6 RHEL Program Management 2014-06-02 13:21:53 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).

Comment 7 Red Hat Bugzilla 2023-09-14 01:30:18 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.