Bug 837155

Summary: jbd can't process 512B block size correctly, make system crash.
Product: Red Hat Enterprise Linux 5 Reporter: xiaowei.hu <xiaowei.hu>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 5.10CC: dchinner, esandeen, lczerner, rwheeler, xiaowei.hu, zab
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-02 13:21:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
make jbd fit for 512B block size. none

Description xiaowei.hu 2012-07-03 02:44:19 UTC
Created attachment 595861 [details]
make jbd fit for 512B block size.

Description of problem:
Random crash system , when testing ocfs2 with 512B block size , even between 2 different mounts.

Version-Release number of selected component (if applicable):


How reproducible:

Steps to Reproduce:
Need only a single node to reproduce this
1.mkfs.ocfs2 -b 512
2.mount , then dd some file on ocfs2 volume, then umount
3.do the test again with mkfs.ocfs2 -b 1024
  
Actual results:
I am sure this bug is about jbd.
I installed an OS with ext2 as root fs, so that I could remove the jbd module
together with ocfs2 module when unload .then do the test:
1.mkfs.ocfs2 -b 512
2.mount , then dd , then umount
3./etc/init.d/o2cb unload , this will also remove jbd
4.do the test again with mkfs.ocfs2 -b 1024


Expected results:


Additional info:
I hunt down the root cause:
1. JBD created a new slab queue when the first mount of ocfs2 with 512B blocksize, it name was calculated by 512 >> 11, that's 0, it used jbd_1k as name but 512 as the slab size without this patch.
2. this slab won't be destroied until the jbd module got removed.
3.Next time we mount the ocfs2 volume with 1K blocksize, it name still 1024 >> 11 , got "jbd_1k" , the same name with setp 1 , and  the slab already created , but it's 512B, jbd continue use this slab as 1k this time, sure it will over write the memory, destroy the pointers. here it lead to the crash. 

patch attached.

Comment 1 Ric Wheeler 2012-07-03 18:26:03 UTC
Red Hat does not support OCFS2, but this jbd bug might show up in ext3/4 as well so it is worth investigating.

Comment 2 Zach Brown 2012-07-03 18:35:26 UTC
The analysis of the bug certainly looks plausible.

For what it's worth, these slabs allocations were removed upstream so it's probably reasonable to put a little fix for this bad slab naming bug in RHEL.

commit c089d490dfbf53bc0893dc9ef57cf3ee6448314d
Author: Mingming Cao <cmm.com>
Date:   Tue Oct 16 18:38:25 2007 -0400

    JBD: JBD slab allocation cleanups
    
    JBD: Replace slab allocations with page allocations

Comment 3 Zach Brown 2012-07-03 19:47:14 UTC
Hmm, and Eric and I realized that ext* has a minimum block size of 1k which is almost certainly why jbd didn't correctly support 512B blocks.

Comment 5 RHEL Program Management 2014-03-07 13:53:18 UTC
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Comment 6 RHEL Program Management 2014-06-02 13:21:53 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).

Comment 7 Red Hat Bugzilla 2023-09-14 01:30:18 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days