110533 – Assertion failure in journal_flush() at journal.c:1250: "!journal->j_running_transaction"

Bug 110533 - Assertion failure in journal_flush() at journal.c:1250: "!journal->j_running_transaction"

Summary: Assertion failure in journal_flush() at journal.c:1250: "!journal->j_running_...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 2.1
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	2.1
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Stephen Tweedie
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	133089
TreeView+	depends on / blocked

Reported:	2003-11-20 20:41 UTC by Wendy Cheng
Modified:	2007-11-30 22:06 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-11-17 19:34:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Wendy Cheng 2003-11-20 20:41:24 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1)
Gecko/20020830

Description of problem:
The do_emergency_sync() within sysrq.c does a remount-readonly journal
flushing without setting up a barrier against new fs updates coming
in. This ends up crashing the system (IT#29619) in the following route:

Assertion failure in journal_flush_Rsmp_ee97f7e7() at journal.c:1250:
"!journal->j_running_transaction"
------------[ cut here ]------------
kernel BUG at journal.c:1250!
invalid operand: 0000
Kernel 2.4.9-e.12enterprise
CPU:    2
EIP:    0010:[<f880682b>]    Tainted: P
EFLAGS: 00010282
EIP is at journal_flush_Rsmp_ee97f7e7 [jbd] 0x10b
eax: 00000021   ebx: 00001965   ecx: c02f61a4   edx: 004c3eb5
esi: f39f6c00   edi: 00000000   ebp: 0008e000   esp: f7539f60
ds: 0018   es: 0018   ss: 0018
Process bdflush (pid: 13, stackpage=f7539000)
Stack: f880a381 000004e2 f39fa800 f39fa800 f39fa8e8 f881a1d1 f39f6c00
f397c400
      f881a58b f39fa800 f397c400 0000c703 f39fa870 f39fa800 00000001
c019831f
      f39fa800 f7539fac 00000000 00000001 f39fa800 00000001 00000002
c019843f
Call Trace: [<f880a381>] .LC63 [jbd] 0x28b
[<f881a1d1>] ext3_mark_recovery_complete [ext3] 0x11
[<f881a58b>] ext3_remount [ext3] 0xab
[<c019831f>] go_sync [kernel] 0xef
[<c019843f>] do_emergency_sync [kernel] 0xaf
[<c014b96f>] bdflush [kernel] 0x8f
[<c0105000>] stext [kernel] 0x0
[<c0105000>] stext [kernel] 0x0
[<c0105836>] kernel_thread [kernel] 0x26
[<c014b8e0>] bdflush [kernel] 0x0


Code: 0f 0b 58 5a 8b 5e 30 85 db 74 34 68 20 9b 80 f8 68 e3 04 00
<0>Kernel panic: not continuing
<0>Rebooting in 120 seconds..

Version-Release number of selected component (if applicable):
e.12 and above 

How reproducible:
Didn't try


Additional info:

Stephen Tweedie is aware of this problem.

Comment 1 Arjan van de Ven 2003-11-20 21:06:18 UTC

which kernel modules are in use here ?

Comment 2 Wendy Cheng 2003-11-20 21:34:44 UTC

drivers/char/sysrq.c and ext3.

Comment 3 Arjan van de Ven 2003-11-20 21:36:27 UTC

ext3 doesn't taint the kernel; something else does.
Is there a sysreport of this machine available ?

Comment 4 Wendy Cheng 2003-11-20 22:40:25 UTC

The kernel is tainted but it is not relevant here (though I have to
admire your good eyes). This crash is obvious.

The journal_flush() initiated by sysrq bypasses the expected fs
barrier logic to allow new ext3-fs requests to clobber the journal
control block (in this case,it is j_running_transaction). According to
âsctâ, the current journal_flush() only expects the following three
routes:

1.âremount-roâ (it checks for writable file descriptors before
   flushing).
2.âunmountâ (it checks potential activities on the fs before flushing)
3."LVM" (the VFSlock for LVM snapshot quiescing sets a journal barrier
   before the flush).

The âforced remount-roâ initiated by sysrq doesn't do any of these.
Fix the bug, please.

Comment 9 Wendy Cheng 2004-02-13 03:04:31 UTC

Why do we need to do this "sync" when the sysadm hit the "sysrq" key ?
Isn't it overkill ?

Comment 10 Stephen Tweedie 2004-02-13 11:30:48 UTC

We don't, in general.  But "sysrq-s" is the documented combination to
force an emergency sync, so in that case we don't have much choice
about it!

Note You need to log in before you can comment on or make changes to this bug.