Bug 468547 - RHEL5.3: Regression in ext3/jbd
RHEL5.3: Regression in ext3/jbd
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
medium Severity high
: rc
: ---
Assigned To: Eric Sandeen
Martin Jenner
: Regression
Depends On: 439581
  Show dependency treegraph
Reported: 2008-10-25 14:58 EDT by Eric Sandeen
Modified: 2009-01-20 15:11 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-01-20 15:11:02 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Eric Sandeen 2008-10-25 14:58:20 EDT
Just reported on the ext4 list, no patch yet but should be straightforward I think.  If I read it right, this is likely to happen if the journal is aborted.  kmemcheck makes it obvious here, but in rhel5 the problem is we'll be using freed memory, so if it gets reused quickly it'll lead to problems.

This commit:

commit 2d7c820e56ce83b23daee9eb5343730fb309418e
Author: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Date:   Wed Oct 22 14:15:01 2008 -0700

    ext3: add checks for errors from jbd

introduces a regression which was discovered by kmemcheck:

WARNING: kmemcheck: Caught 32-bit read from freed memory (f4f1b804)
 i i i i f f f f f f f f f f f f f f f f f f f f f f f f f f f f

Pid: 9550, comm: umount Not tainted (2.6.28-rc1 #58) 945P-A
EIP: 0060:[<c05bdf38>] EFLAGS: 00010246 CPU: 0
EIP is at __journal_abort_soft+0x18/0xa0
EAX: f4f1b800 EBX: f4f1b800 ECX: c0462799 EDX: fffffffb
ESI: fffffffb EDI: f4f1a800 EBP: f145dea8 ESP: c25699c8
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: f6c1d704 CR3: 31448000 CR4: 00000650
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff4ff0 DR7: 00000400
 [<c05bdfc8>] journal_abort+0x8/0x10
 [<c0589eb5>] ext3_abort+0xb5/0xc0
 [<c058a300>] ext3_put_super+0x160/0x230
 [<c04ec02a>] generic_shutdown_super+0x5a/0xe0

In particular, this hunk is guilty:

-       journal_destroy(sbi->s_journal);
+       if (journal_destroy(sbi->s_journal) < 0)
+               ext3_abort(sb, __func__, "Couldn't clean up the journal");

because journal_destroy() will free the journal regardless of whether
it returned < 0 or not. And then ext3_abort() makes some calls that
dereference the (freed) journal. These are the line numbers for the

addr2line -e vmlinux -i c05bdf38 c05bdfc8 c0589eb5 c058a300 c04ec02a

(as of e013e13bf605b9e6b702adffbe2853cfc60e7806 in Linus's tree).

I hope this helps.

Comment 2 Eric Sandeen 2008-10-27 11:24:23 EDT
Author of the original patch which caused the regression has posted a fix:

Comment 5 Noboru OBATA 2008-11-03 21:57:19 EST

Linus has merged the proposed fix into 2.6.28-rc3:

I would like to know if the bug can be fixed during the 5.3 beta.
Comment 6 Don Zickus 2008-11-04 11:51:28 EST
in kernel-2.6.18-122.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 8 Larry Troan 2008-11-20 18:03:43 EST
In snapshot 3: kernel-2.6.18-123.el5
Comment 10 errata-xmlrpc 2009-01-20 15:11:02 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.