From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830 Description of problem: The do_emergency_sync() within sysrq.c does a remount-readonly journal flushing without setting up a barrier against new fs updates coming in. This ends up crashing the system (IT#29619) in the following route: Assertion failure in journal_flush_Rsmp_ee97f7e7() at journal.c:1250: "!journal->j_running_transaction" ------------[ cut here ]------------ kernel BUG at journal.c:1250! invalid operand: 0000 Kernel 2.4.9-e.12enterprise CPU: 2 EIP: 0010:[<f880682b>] Tainted: P EFLAGS: 00010282 EIP is at journal_flush_Rsmp_ee97f7e7 [jbd] 0x10b eax: 00000021 ebx: 00001965 ecx: c02f61a4 edx: 004c3eb5 esi: f39f6c00 edi: 00000000 ebp: 0008e000 esp: f7539f60 ds: 0018 es: 0018 ss: 0018 Process bdflush (pid: 13, stackpage=f7539000) Stack: f880a381 000004e2 f39fa800 f39fa800 f39fa8e8 f881a1d1 f39f6c00 f397c400 f881a58b f39fa800 f397c400 0000c703 f39fa870 f39fa800 00000001 c019831f f39fa800 f7539fac 00000000 00000001 f39fa800 00000001 00000002 c019843f Call Trace: [<f880a381>] .LC63 [jbd] 0x28b [<f881a1d1>] ext3_mark_recovery_complete [ext3] 0x11 [<f881a58b>] ext3_remount [ext3] 0xab [<c019831f>] go_sync [kernel] 0xef [<c019843f>] do_emergency_sync [kernel] 0xaf [<c014b96f>] bdflush [kernel] 0x8f [<c0105000>] stext [kernel] 0x0 [<c0105000>] stext [kernel] 0x0 [<c0105836>] kernel_thread [kernel] 0x26 [<c014b8e0>] bdflush [kernel] 0x0 Code: 0f 0b 58 5a 8b 5e 30 85 db 74 34 68 20 9b 80 f8 68 e3 04 00 <0>Kernel panic: not continuing <0>Rebooting in 120 seconds.. Version-Release number of selected component (if applicable): e.12 and above How reproducible: Didn't try Additional info: Stephen Tweedie is aware of this problem.
which kernel modules are in use here ?
drivers/char/sysrq.c and ext3.
ext3 doesn't taint the kernel; something else does. Is there a sysreport of this machine available ?
The kernel is tainted but it is not relevant here (though I have to admire your good eyes). This crash is obvious. The journal_flush() initiated by sysrq bypasses the expected fs barrier logic to allow new ext3-fs requests to clobber the journal control block (in this case,it is j_running_transaction). According to âsctâ, the current journal_flush() only expects the following three routes: 1.âremount-roâ (it checks for writable file descriptors before flushing). 2.âunmountâ (it checks potential activities on the fs before flushing) 3."LVM" (the VFSlock for LVM snapshot quiescing sets a journal barrier before the flush). The âforced remount-roâ initiated by sysrq doesn't do any of these. Fix the bug, please.
Why do we need to do this "sync" when the sysadm hit the "sysrq" key ? Isn't it overkill ?
We don't, in general. But "sysrq-s" is the documented combination to force an emergency sync, so in that case we don't have much choice about it!