167343 – Assertion failure in log_do_checkpoint

Bug 167343 - Assertion failure in log_do_checkpoint

Summary: Assertion failure in log_do_checkpoint

Keywords:
Status:	CLOSED DUPLICATE of bug 162814
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	4
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-09-01 19:26 UTC by Jeff Welden
Modified:	2007-11-30 22:11 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-09-05 03:53:04 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Jeff Welden 2005-09-01 19:26:21 UTC

linux version 2.6.9-11.ELsmp 
on Dual Xeon system with ext3 
( formatted w/ /sbin/mkfs.ext3 -b 2048 -i 2048 -j -O dir_index,sparse_super ) 
and Compaq Smart Array 64xx (rev 01) Raid controller 
with a load avg was around a steady 2.0, 
there was a kernel panic after around 12 hours of load testing:

---BEGIN OOPS---

Aug 28 05:03:24 im01 kernel BUG at fs/jbd/checkpoint.c:361!
Aug 28 05:03:24 im01 invalid operand: 0000 [#1]
Aug 28 05:03:24 im01 SMP
Aug 28 05:03:24 im01 Modules linked in: aoe(U) drbd(U) md5 ipv6 autofs4
i2c_dev i2c_core dm_mod button battery ac uhci_hcd ehci_hcd hw_random
tg3 floppy ext3 jbd cciss sd_mod scsi_mod
Aug 28 05:03:24 im01 CPU:    1
Aug 28 05:03:24 im01 EIP:    0060:[<f8868f87>]    Tainted: GF     VLI
Aug 28 05:03:24 im01 EFLAGS: 00010216   (2.6.9-11.ELsmp)
Aug 28 05:03:24 im01 EIP is at log_do_checkpoint+0x111/0x14b [jbd]
Aug 28 05:03:24 im01 eax: 0000006e   ebx: cc7fa4dc   ecx: d3f46d30  
edx: f886cd4c
Aug 28 05:03:24 im01 esi: c3f52a00   edi: f7d4f980   ebp: 00000000  
esp: d3f46d2c
Aug 28 05:03:24 im01 ds: 007b   es: 007b   ss: 0068
Aug 28 05:03:24 im01 Process imapd (pid: 21571, threadinfo=d3f46000
task=cc444230)
Aug 28 05:03:24 im01 Stack: f886cd4c f886befd f886cd38 00000169 f886ce07
008f3cc4 ef9fc8cc cc7fa4dc
Aug 28 05:03:24 im01 00000000 00000000 ecd29de4 cb0d4424 cd1701e8
cb0d489c 00000001 00000000
Aug 28 05:03:24 im01 000000fe 0000017e 00000001 c0401220 c3bc1d60
00000001 c3bc9d60 c3bc1d60
Aug 28 05:03:24 im01 Call Trace:
Aug 28 05:03:24 im01 [<c011d681>] load_balance_newidle+0x5c/0x74
Aug 28 05:03:24 im01 [<c011caf1>] finish_task_switch+0x30/0x66
Aug 28 05:03:24 im01 [<c02c5604>] schedule+0x844/0x87a
Aug 28 05:03:24 im01 [<c0270a65>] memcpy_toiovec+0x5f/0x88
Aug 28 05:03:24 im01 [<c011dd19>] __wake_up_locked+0x11/0x13
Aug 28 05:03:24 im01 [<c02c4c38>] __down+0xcc/0xdb
Aug 28 05:03:24 im01 [<c011dc6f>] default_wake_function+0x0/0xc
Aug 28 05:03:24 im01 [<f8868b40>] __log_wait_for_space+0xbb/0xe5 [jbd]
Aug 28 05:03:24 im01 [<f886537e>] start_this_handle+0x2e4/0x32a [jbd]
Aug 28 05:03:24 im01 [<c016164a>] do_lookup+0x1f/0x8f
Aug 28 05:03:24 im01 [<c011f6ee>] autoremove_wake_function+0x0/0x2d
Aug 28 05:03:24 im01 [<c011f6ee>] autoremove_wake_function+0x0/0x2d
Aug 28 05:03:24 im01 [<f886547c>] journal_start+0x78/0x9e [jbd]
Aug 28 05:03:24 im01 [<f889ef9a>] ext3_dirty_inode+0x24/0x66 [ext3]
Aug 28 05:03:24 im01 [<c0171b04>] __mark_inode_dirty+0x28/0x176
Aug 28 05:03:24 im01 [<c016c68e>] update_atime+0x6a/0x90
Aug 28 05:03:24 im01 [<c013d396>] generic_file_mmap+0x2a/0x37
Aug 28 05:03:24 im01 [<c014b22d>] do_mmap_pgoff+0x481/0x666
Aug 28 05:03:24 im01 [<c010b557>] sys_mmap2+0x7e/0xaf
Aug 28 05:03:24 im01 [<c02c7377>] syscall_call+0x7/0xb
Aug 28 05:03:24 im01 [<c02c007b>] unix_release_sock+0x15a/0x201
Aug 28 05:03:24 im01 Code: 89 f0 e8 4c fc ff ff 0b 44 24 10 75 29 68 07
ce 86 f8 68 69 01 00 00 68 38 cd 86 f8 68 fd be 86 f8 68 4c cd 86 f8 e8
4f 8a 8b c7 <0f> 0b 69 01 38 cd 86 f8 83 c4 14 39 7e 40 0f 84 09 ff ff
ff 8d
Aug 28 05:03:24 im01 <0>Fatal exception: panic in 5 seconds

---END OOPS---


+++ This bug was initially created as a clone of Bug #123137 +++

Description of problem:
Assertion failure in log_do_checkpoint() at fs/jbd/checkpoint.c:361:
"drop_count != 0 || cleanup_ret != 0"
------------[ cut here ]------------
kernel BUG at fs/jbd/checkpoint.c:361!

Version-Release number of selected component (if applicable):
2.6.5-1.327

How reproducible:
Rare

System was a dual Xeon with AMI Megaraid RAID controller.  File
systems are Ext3.

I'll attach the oops output in a second.

-- Additional comment from dac on 2004-05-12 17:01 EST --
Created an attachment (id=100200)
oops output

Oops output when this happened.
The system load was probably 3ish.
Uptime was less than a day (due to an un-related reboot)


-- Additional comment from davej on 2004-05-13 10:35 EST --
There were quite a few ext3 related changes in later kernels.
I'm not guaranteeing they fix this problem, but it makes more sense to
test -358 if you can.


-- Additional comment from sct on 2004-09-10 11:08 EST --
No information given about later kernels, so closing: please reopen if
you can still reproduce this problem.

-- Additional comment from sfrost on 2004-10-19 15:58 EST --
I've been bit by this problem under both 2.6.8.1 and 2.6.9 now.  I
don't have an oops from 2.6.9 yet (unfortunately, I'll check once I
get home and see if it got logged over the serial console) but here is
one from 2.6.8.1:
Assertion failure in log_do_checkpoint() at fs/jbd/checkpoint.c:361:
"drop_count != 0 || cleanup_ret
+!= 0"
kernel BUG at fs/jbd/checkpoint.c:361!
invalid operand: 0000 [#1]
Oops:
------------[ cut here ]------------
SMP
Modules linked in: ipt_REDIRECT ipt_REJECT iptable_nat iptable_mangle
iptable_filter ipt_state
+ipt_pkttype ipt_physdev ipt_multiport ipt_conntrack ipt_MARK ipt_LOG
ip_conntrack ip_tables 8250
+serial_core snd_intel8x0 s
nd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart
snd_rawmidi snd_seq_device snd
+soundcore ehci_hcd uhci_hcd usbcore intel_agp agpgart eeprom lm85
i2c_sensor i2c_i801 i2c_dev
+i2c_core pcspkr
CPU:    1
EIP:    0060:[log_do_checkpoint+364/459]    Not tainted
EFLAGS: 00010286   (2.6.8.1-vs1.9.2kenobi.3)
EIP is at log_do_checkpoint+0x16c/0x1cb
eax: 0000006e   ebx: 00000000   ecx: c036ad04   edx: c036ad04
esi: 00000000   edi: 00000001   ebp: c932d83c   esp: e3a9fd0c
ds: 007b   es: 007b   ss: 0068
Process sendmail (pid: 9628, threadinfo=e3a9e000 task=e10c3770)
Stack: c03323c0 c031be9d c03301f7 00000169 c0335200 00294867 c1a87180
00000000
       00000000 e498574c c0476120 00000000 00000003 c180c0a0 c180cd60
c015a341
       dedcbf5c dedcbf5c dedcbf5c f314ae3c dedcbf5c c01a60ac f700f4e0
f314ae3c
Call Trace:
 [wake_up_buffer+23/83] wake_up_buffer+0x17/0x53
 [do_get_write_access+645/1583] do_get_write_access+0x285/0x62f
 [wake_up_buffer+23/83] wake_up_buffer+0x17/0x53
 [find_busiest_group+234/806] find_busiest_group+0xea/0x326
 [ext3_do_update_inode+517/1094] ext3_do_update_inode+0x205/0x446
 [radix_tree_delete+325/398] radix_tree_delete+0x145/0x18e
 [__log_wait_for_space+199/218] __log_wait_for_space+0xc7/0xda
 [start_this_handle+290/954] start_this_handle+0x122/0x3ba
 [find_get_pages+55/90] find_get_pages+0x37/0x5a
 [pagevec_lookup+46/56] pagevec_lookup+0x2e/0x38
 [truncate_inode_pages+289/696] truncate_inode_pages+0x121/0x2b8
 [journal_start+171/210] journal_start+0xab/0xd2
 [locks_delete_lock+139/221] locks_delete_lock+0x8b/0xdd
 [start_transaction+35/88] start_transaction+0x23/0x58
 [locks_remove_posix+239/268] locks_remove_posix+0xef/0x10c
 [ext3_delete_inode+0/230] ext3_delete_inode+0x0/0xe6
 [ext3_delete_inode+39/230] ext3_delete_inode+0x27/0xe6
 [ext3_delete_inode+0/230] ext3_delete_inode+0x0/0xe6
 [generic_delete_inode+147/316] generic_delete_inode+0x93/0x13c
 [iput+98/124] iput+0x62/0x7c
 [dput+231/403] dput+0xe7/0x193
 [__fput+179/260] __fput+0xb3/0x104
 [filp_close+89/134] filp_close+0x59/0x86
 [sys_close+94/113] sys_close+0x5e/0x71
 [syscall_call+7/11] syscall_call+0x7/0xb
Code: 0f 0b 69 01 f7 01 33 c0 eb b8 8d 44 24 1c 8d 54 24 24 89 44


-- Additional comment from sfrost on 2004-10-19 20:11 EST --
Alright, just happened again, that's twice in one day...

Comment 1 Dave Jones 2005-09-05 03:53:04 UTC


*** This bug has been marked as a duplicate of 162814 ***

Note You need to log in before you can comment on or make changes to this bug.