Bug 468547

Summary: RHEL5.3: Regression in ext3/jbd
Product: Red Hat Enterprise Linux 5 Reporter: Eric Sandeen <esandeen>
Component: kernelAssignee: Eric Sandeen <esandeen>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: medium    
Version: 5.3CC: ltroan, mgahagan, noboru.obata.ar, syeghiay, tyasui
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 20:11:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 439581    
Bug Blocks:    

Description Eric Sandeen 2008-10-25 18:58:20 UTC
Just reported on the ext4 list, no patch yet but should be straightforward I think.  If I read it right, this is likely to happen if the journal is aborted.  kmemcheck makes it obvious here, but in rhel5 the problem is we'll be using freed memory, so if it gets reused quickly it'll lead to problems.
-------
Hi,

This commit:

commit 2d7c820e56ce83b23daee9eb5343730fb309418e
Author: Hidehiro Kawai <hidehiro.kawai.ez>
Date:   Wed Oct 22 14:15:01 2008 -0700

    ext3: add checks for errors from jbd

introduces a regression which was discovered by kmemcheck:

WARNING: kmemcheck: Caught 32-bit read from freed memory (f4f1b804)
00b0f1f4fbffffff404439ef008830f20200000097970000ad4eaddeffffffff
 i i i i f f f f f f f f f f f f f f f f f f f f f f f f f f f f
         ^

Pid: 9550, comm: umount Not tainted (2.6.28-rc1 #58) 945P-A
EIP: 0060:[<c05bdf38>] EFLAGS: 00010246 CPU: 0
EIP is at __journal_abort_soft+0x18/0xa0
EAX: f4f1b800 EBX: f4f1b800 ECX: c0462799 EDX: fffffffb
ESI: fffffffb EDI: f4f1a800 EBP: f145dea8 ESP: c25699c8
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: f6c1d704 CR3: 31448000 CR4: 00000650
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff4ff0 DR7: 00000400
 [<c05bdfc8>] journal_abort+0x8/0x10
 [<c0589eb5>] ext3_abort+0xb5/0xc0
 [<c058a300>] ext3_put_super+0x160/0x230
 [<c04ec02a>] generic_shutdown_super+0x5a/0xe0

In particular, this hunk is guilty:

-       journal_destroy(sbi->s_journal);
+       if (journal_destroy(sbi->s_journal) < 0)
+               ext3_abort(sb, __func__, "Couldn't clean up the journal");

because journal_destroy() will free the journal regardless of whether
it returned < 0 or not. And then ext3_abort() makes some calls that
dereference the (freed) journal. These are the line numbers for the
backtrace:

addr2line -e vmlinux -i c05bdf38 c05bdfc8 c0589eb5 c058a300 c04ec02a
fs/jbd/journal.c:1502
fs/jbd/journal.c:1560
fs/ext3/super.c:284
fs/ext3/super.c:397
fs/super.c:307

(as of e013e13bf605b9e6b702adffbe2853cfc60e7806 in Linus's tree).

I hope this helps.


Vegard

Comment 2 Eric Sandeen 2008-10-27 15:24:23 UTC
Author of the original patch which caused the regression has posted a fix:

http://marc.info/?l=linux-ext4&m=122510792614385&w=2

Comment 5 Noboru OBATA 2008-11-04 02:57:19 UTC
Hi,

Linus has merged the proposed fix into 2.6.28-rc3:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=eff2502801e9a3a34882c6bd720470d65394522e

I would like to know if the bug can be fixed during the 5.3 beta.

Comment 6 Don Zickus 2008-11-04 16:51:28 UTC
in kernel-2.6.18-122.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 8 Larry Troan 2008-11-20 23:03:43 UTC
In snapshot 3: kernel-2.6.18-123.el5

Comment 10 errata-xmlrpc 2009-01-20 20:11:02 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html