Bug 104526 - oops in journaling code (journal.c:372)
Summary: oops in journaling code (journal.c:372)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel
Version: 2.1
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Stephen Tweedie
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 106054
TreeView+ depends on / blocked
 
Reported: 2003-09-16 18:35 UTC by Neil Horman
Modified: 2014-01-15 20:57 UTC (History)
4 users (show)

Fixed In Version: QU3
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-01-09 00:14:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Neil Horman 2003-09-16 18:35:47 UTC
Description of problem:
Two oopses on an AS box (e.25 kernel) while trying to write to the journal -
failing assertion of locked buffer - comments in journal.c at line 372 suggest
that the assertion is possibly incorrect - oopses can be viewed in issue tracker
as number 27638

Version-Release number of selected component (if applicable):


How reproducible:
Unknown - waiting on customer for details on reproducability

Steps to Reproduce:
1.
2.
3.
    
Actual results:


Expected results:


Additional info:

Comment 1 Bill Nottingham 2003-09-16 18:37:17 UTC
Please post that oops information here.

Comment 2 Neil Horman 2003-09-16 19:51:17 UTC
Assertion failure in journal_write_metadata_buffer() at journal.c:372:
"buffer_jdirty(jh2bh(jh_in))"
------------[ cut here ]------------
kernel BUG at journal.c:372!
invalid operand: 0000
Kernel 2.4.9-e.25enterprise
CPU:    3
EIP:    0010:[<f8899674>]    Tainted: P
EFLAGS: 00010286
EIP is at journal_write_metadata_buffer [jbd] 0x74
eax: 00000020   ebx: 00000000   ecx: c02f7844   edx: 0001e30e
esi: 00000000   edi: e34eb420   ebp: d0b27160   esp: f3323e08
ds: 0018   es: 0018   ss: 0018
Process kjournald (pid: 38, stackpage=f3323000)
Stack: f889e3e1 00000174 0000013a f3348400 00000000 00000000 f2b37730 00000000
      e34eb420 d0b27160 f8896c73 e34eb420 f2b37730 f3323e58 0000030d 00000000
      00000fd4 e3c5302c 00000003 e34eb420 eb663550 0000030d f6314a00 00000000
Call Trace: [<f889e3e1>] .LC63 [jbd] 0x28b
[<f8896c73>] journal_commit_transaction [jbd] 0x773
[<f8820d8f>] rw_intr [sd_mod] 0x20f
[<c01255d0>] process_timeout [kernel] 0x0
[<c0118c7b>] wake_up_process [kernel] 0xb
[<c0124d41>] __run_timers [kernel] 0xd1
[<c0125384>] run_local_timers [kernel] 0x94
[<c0114288>] smp_apic_timer_interrupt [kernel] 0xb8
[<c0119945>] schedule [kernel] 0x385
[<f88994a6>] kjournald [jbd] 0x146
[<f8899340>] commit_timeout [jbd] 0x0
[<c0105836>] arch_kernel_thread [kernel] 0x26
[<f8899360>] kjournald [jbd] 0x0


Comment 3 Stephen Tweedie 2003-10-02 15:50:32 UTC
A patch in later 2.4 kernels (for transaction.c:612 oopses) introduced a bug
that triggered this specific assert failure rather frequently.  We identified a
flaw in the way that the buffer_jdirty state was being handled, and that is
fixed in upstream kernels.

I'm not certain whether the same flaw could be triggered differently by AS-2.1
ext3 --- certainly, the window is much smaller in that kernel, as the flaw was
reported very frequently when the later-2.4 kernel transaction.c:612 fix was
added, but we have only a few isolated reports of it on AS-2.1.  But there's a
good chance that the later buffer_jdirty fix will fix this on AS-2.1 too.

We have patches back-ported to AS-2.1 to fix both of these issues, and those are
in testing internally.

Comment 4 Kevin Krafthefer 2003-10-02 18:15:20 UTC
when will this update be through QA?

Comment 5 Stephen Tweedie 2003-10-02 20:14:21 UTC
The fully-supported, fully-QAed release is scheduled to be part of the
forthcoming U3 major AS-2.1 update release.

We've got the kernel in testing, though, and we hope that a beta, engineering
build will be available as soon as Monday for customers to evaluate.

Comment 6 Kevin Krafthefer 2003-10-06 18:06:55 UTC
Is the beta now available?

Comment 8 Stephen Tweedie 2003-10-08 15:58:32 UTC
An unsupported engineering kernel containing this fix is now available for
testing and evaluation at

http://people.redhat.com/~jbaron/.private/testing/2.4.9-e.27.18.test/



Note You need to log in before you can comment on or make changes to this bug.