Bug 66663

Summary: EXT3 bug in kernel-smp-2.4.18-4
Product: [Retired] Red Hat Linux Reporter: David Carter <dpc22>
Component: kernelAssignee: Stephen Tweedie <sct>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2002-07-31 14:13:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Carter 2002-06-13 09:22:55 UTC
Description of Problem:

Got the following error from EXT3 using the 2.4.18-4 kernel on a dual
CPU box with 4 disks (arranged as a software RAID mirrored pairs).

Jun 10 10:30:41 purple kernel: Assertion failure in
journal_write_metadata_buffer() at journal.c:406: "buffer_jdirty(jh2bh(jh_in))"
Jun 10 10:30:41 purple kernel: ------------[ cut here ]------------
Jun 10 10:30:41 purple kernel: kernel BUG at journal.c:406!
Jun 10 10:30:41 purple kernel: invalid operand: 0000
Jun 10 10:30:41 purple kernel: autofs eepro100 ipchains ext3 jbd raid1 aic7xxx
sd_mod scsi_mod  
Jun 10 10:30:41 purple kernel: CPU:    0
Jun 10 10:30:41 purple kernel: EIP:    0010:[<f885d9b4>]    Not tainted
Jun 10 10:30:41 purple kernel: EFLAGS: 00010282
Jun 10 10:30:41 purple kernel: 
Jun 10 10:30:41 purple kernel: EIP is at journal_write_metadata_buffer [jbd]
0x74 (2.4.18-4smp)
Jun 10 10:30:41 purple kernel: eax: 0000001d   ebx: 00000000   ecx: c02eeec0  
edx: 000aa050
Jun 10 10:30:41 purple kernel: esi: 00000000   edi: ee129a40   ebp: f7dc8490  
esp: f79afe44
Jun 10 10:30:41 purple kernel: ds: 0018   es: 0018   ss: 0018
Jun 10 10:30:41 purple kernel: Process kjournald (pid: 141, stackpage=f79af000)
Jun 10 10:30:41 purple kernel: Stack: f8862181 00000196 000016dc f772a600 000000
00 00000000 f7dc8b20 00000000 
Jun 10 10:30:41 purple kernel:        ee129a40 f7dc8490 f885adf4 ee129a40 f7dc8b
20 f79afe98 000018e8 00000000 
Jun 10 10:30:41 purple kernel:        00000001 00000ff4 d3e9e00c 00000001 ee129a
40 f6afd8b0 000018e8 00000001 
Jun 10 10:30:41 purple kernel: Call Trace: [<f8862181>] .rodata.str1.1 [jbd] 0x4
e1 
Jun 10 10:30:42 purple kernel: [<f885adf4>] journal_commit_transaction [jbd] 0x7
e4 
Jun 10 10:30:42 purple kernel: [<c0119048>] schedule [kernel] 0x348 
Jun 10 10:30:42 purple kernel: [<f885d806>] kjournald [jbd] 0x136 
Jun 10 10:30:42 purple kernel: [<f885d6b0>] commit_timeout [jbd] 0x0 
Jun 10 10:30:42 purple kernel: [<c0107286>] kernel_thread [kernel] 0x26 
Jun 10 10:30:42 purple kernel: [<f885d6d0>] kjournald [jbd] 0x0 
Jun 10 10:30:42 purple kernel: 
Jun 10 10:30:42 purple kernel: 
Jun 10 10:30:42 purple kernel: Code: 0f 0b 5e 5f 8b 7c 24 28 8b 4f 0c 85 c9 74 2
e c7 44 24 0c 01 

The machine in question was still accepting input at the command line, however
shutdown command failed (because of broken filesystem?) and required
powercycle. No errors reported by fsck -f on any filesystem.

Version-Release number of selected component (if applicable):

kernel-smp-2.4.18-4

How Reproducible:

Not very I'm afraid. We have three identical machines all running Redhat 7.3
for a number of weeks without any problems. The machine in question has been
fine since it was rebooted on Monday. Just thought that I ought to report
a potential problem with EXT3.

I found the following on the ext3 developers list. Its the same assertion
failure but a different kernel version: I don't know if this is relevant.

  https://listman.redhat.com/pipermail/ext3-users/2002-May/003587.html

Comment 1 Stephen Tweedie 2002-06-13 17:10:38 UTC
Thanks.  There has been one other possible report of this on 2.4.18-4smp, but I
don't have it tracked down yet.  If this becomes reproducible, would you be
willing to try a debugging kernel to help catch the problem?

Comment 2 David Carter 2002-06-13 19:03:59 UTC
Yes, certainly.

Comment 3 Stephen Tweedie 2002-07-29 15:06:14 UTC
I have found a possible cause for this problem, and it will be in a forthcoming
errata.  If you want, I can give you a copy of the patch to try it out; but if
you are not able to reproduce the problem, it's probably not worth it.

Comment 4 Stephen Tweedie 2002-07-31 14:12:48 UTC
There is a patched kernel on

   http://people.redhat.com/arjanv/testkernels/

if you want to try it --- it's 2.4.18-7.

Comment 5 Stephen Tweedie 2002-07-31 14:51:30 UTC

*** This bug has been marked as a duplicate of 68026 ***