Red Hat Bugzilla – Bug 468547
RHEL5.3: Regression in ext3/jbd
Last modified: 2009-01-20 15:11:02 EST
Just reported on the ext4 list, no patch yet but should be straightforward I think. If I read it right, this is likely to happen if the journal is aborted. kmemcheck makes it obvious here, but in rhel5 the problem is we'll be using freed memory, so if it gets reused quickly it'll lead to problems.
Author: Hidehiro Kawai <email@example.com>
Date: Wed Oct 22 14:15:01 2008 -0700
ext3: add checks for errors from jbd
introduces a regression which was discovered by kmemcheck:
WARNING: kmemcheck: Caught 32-bit read from freed memory (f4f1b804)
i i i i f f f f f f f f f f f f f f f f f f f f f f f f f f f f
Pid: 9550, comm: umount Not tainted (2.6.28-rc1 #58) 945P-A
EIP: 0060:[<c05bdf38>] EFLAGS: 00010246 CPU: 0
EIP is at __journal_abort_soft+0x18/0xa0
EAX: f4f1b800 EBX: f4f1b800 ECX: c0462799 EDX: fffffffb
ESI: fffffffb EDI: f4f1a800 EBP: f145dea8 ESP: c25699c8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: f6c1d704 CR3: 31448000 CR4: 00000650
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff4ff0 DR7: 00000400
In particular, this hunk is guilty:
+ if (journal_destroy(sbi->s_journal) < 0)
+ ext3_abort(sb, __func__, "Couldn't clean up the journal");
because journal_destroy() will free the journal regardless of whether
it returned < 0 or not. And then ext3_abort() makes some calls that
dereference the (freed) journal. These are the line numbers for the
addr2line -e vmlinux -i c05bdf38 c05bdfc8 c0589eb5 c058a300 c04ec02a
(as of e013e13bf605b9e6b702adffbe2853cfc60e7806 in Linus's tree).
I hope this helps.
Author of the original patch which caused the regression has posted a fix:
Linus has merged the proposed fix into 2.6.28-rc3:
I would like to know if the bug can be fixed during the 5.3 beta.
You can download this test kernel from http://people.redhat.com/dzickus/el5
In snapshot 3: kernel-2.6.18-123.el5
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.