Description of problem:
Two oopses on an AS box (e.25 kernel) while trying to write to the journal -
failing assertion of locked buffer - comments in journal.c at line 372 suggest
that the assertion is possibly incorrect - oopses can be viewed in issue tracker
as number 27638
Version-Release number of selected component (if applicable):
Unknown - waiting on customer for details on reproducability
Steps to Reproduce:
Please post that oops information here.
Assertion failure in journal_write_metadata_buffer() at journal.c:372:
------------[ cut here ]------------
kernel BUG at journal.c:372!
invalid operand: 0000
EIP: 0010:[<f8899674>] Tainted: P
EIP is at journal_write_metadata_buffer [jbd] 0x74
eax: 00000020 ebx: 00000000 ecx: c02f7844 edx: 0001e30e
esi: 00000000 edi: e34eb420 ebp: d0b27160 esp: f3323e08
ds: 0018 es: 0018 ss: 0018
Process kjournald (pid: 38, stackpage=f3323000)
Stack: f889e3e1 00000174 0000013a f3348400 00000000 00000000 f2b37730 00000000
e34eb420 d0b27160 f8896c73 e34eb420 f2b37730 f3323e58 0000030d 00000000
00000fd4 e3c5302c 00000003 e34eb420 eb663550 0000030d f6314a00 00000000
Call Trace: [<f889e3e1>] .LC63 [jbd] 0x28b
[<f8896c73>] journal_commit_transaction [jbd] 0x773
[<f8820d8f>] rw_intr [sd_mod] 0x20f
[<c01255d0>] process_timeout [kernel] 0x0
[<c0118c7b>] wake_up_process [kernel] 0xb
[<c0124d41>] __run_timers [kernel] 0xd1
[<c0125384>] run_local_timers [kernel] 0x94
[<c0114288>] smp_apic_timer_interrupt [kernel] 0xb8
[<c0119945>] schedule [kernel] 0x385
[<f88994a6>] kjournald [jbd] 0x146
[<f8899340>] commit_timeout [jbd] 0x0
[<c0105836>] arch_kernel_thread [kernel] 0x26
[<f8899360>] kjournald [jbd] 0x0
A patch in later 2.4 kernels (for transaction.c:612 oopses) introduced a bug
that triggered this specific assert failure rather frequently. We identified a
flaw in the way that the buffer_jdirty state was being handled, and that is
fixed in upstream kernels.
I'm not certain whether the same flaw could be triggered differently by AS-2.1
ext3 --- certainly, the window is much smaller in that kernel, as the flaw was
reported very frequently when the later-2.4 kernel transaction.c:612 fix was
added, but we have only a few isolated reports of it on AS-2.1. But there's a
good chance that the later buffer_jdirty fix will fix this on AS-2.1 too.
We have patches back-ported to AS-2.1 to fix both of these issues, and those are
in testing internally.
when will this update be through QA?
The fully-supported, fully-QAed release is scheduled to be part of the
forthcoming U3 major AS-2.1 update release.
We've got the kernel in testing, though, and we hope that a beta, engineering
build will be available as soon as Monday for customers to evaluate.
Is the beta now available?
An unsupported engineering kernel containing this fix is now available for
testing and evaluation at