Bug 707807

Summary: Assertion failure in __journal_drop_transaction() at fs/jbd/checkpoint.c:708: "transaction->t_sync_datalist == NULL"
Product: Red Hat Enterprise Linux 5 Reporter: Weiping Pan <wpan>
Component: kernelAssignee: Carlos Maiolino <cmaiolin>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.7CC: cmaiolin, esandeen, lczerner, ruyang, rwheeler, wpan
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-28 14:32:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Weiping Pan 2011-05-26 02:45:57 UTC
Description of problem:
I saw a kernel panic when booting a RHEL 5.7.

Assertion failure in __journal_drop_transaction() at
fs/jbd/checkpoint.c:708: "transaction->t_sync_datalist == NULL"


Version-Release number of selected component (if applicable):
Linux version 2.6.18-262.el5
gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)

How reproducible:
once

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Assertion failure in __journal_drop_transaction() at fs/jbd/checkpoint.c:708: "transaction->t_sync_datalist == NULL"
------------[ cut here ]------------
kernel BUG at fs/jbd/checkpoint.c:708!
invalid opcode: 0000 [#1]
SMP
last sysfs file: /class/net/bond0/bonding/slaves
Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi ac parport_pc lp parport floppy sg netxen_nic pcspkr tg3 tpm_tis i2c_i801 tpm serio_raw tpm_bios i3000_edac edac_mc ide_cd i2c_core cdrom dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    1
EIP:    0060:[<f88556ef>]    Not tainted VLI
EFLAGS: 00010246   (2.6.18-262.el5 #1)
EIP is at __journal_drop_transaction+0xc7/0x293 [jbd]
eax: 00000078   ebx: c1919e00   ecx: 00000092   edx: 00000000
esi: c1919e00   edi: 00000001   ebp: 00000001   esp: f7dd4ee4
ds: 007b   es: 007b   ss: 0068
Process kjournald (pid: 406, ti=f7dd4000 task=f7f28550 task.ti=f7dd4000)
Stack: f8859402 f8858665 f88593cb 000002c4 f885944e f5459e40 f885591c f74bf6b8
       f688a4e4 f88559d6 f7dd4f30 f688a4e4 f688a4e4 f5459e40 f7dd4f30 00000006
       eb590c40 f885613b ec60d540 00000000 c1919e01 eb58e0c0 f7dd4fc0 c1919e00
Call Trace:
 [<f885591c>] __journal_remove_checkpoint+0x61/0x82 [jbd]
 [<f88559d6>] journal_clean_one_cp_list+0x56/0xeb [jbd]
 [<f885613b>] __journal_clean_checkpoint_list+0x29/0x6e [jbd]
 [<f88541f6>] journal_commit_transaction+0x212/0xf3c [jbd]
 [<c042dfe9>] lock_timer_base+0x15/0x2f
 [<c042e068>] try_to_del_timer_sync+0x65/0x6c
 [<f8857cd6>] kjournald+0xa1/0x1c2 [jbd]
 [<c0436c93>] autoremove_wake_function+0x0/0x2d
 [<f8857c35>] kjournald+0x0/0x1c2 [jbd]
 [<c0436bce>] kthread+0xc0/0xee
 [<c0436b0e>] kthread+0x0/0xee
 [<c0405c87>] kernel_thread_helper+0x7/0x10
 =======================
Code: 85 f8 83 c4 14 eb fe 83 7a 20 00 74 2b 68 4e 94 85 f8 68 c4 02 00 00 68 cb 93 85 f8 68 65 86 85 f8 68 02 94 85 f8 e8 67 08 bd c7 <0f> 0b c4 02 cb 93 85 f8 83 c4 14 eb fe 83 7a 24 00 74 2b 68 73
EIP: [<f88556ef>] __journal_drop_transaction+0xc7/0x293 [jbd] SS:ESP 0068:f7dd4ee4
 <0>Kernel panic - not syncing: Fatal exception

Comment 1 Carlos Maiolino 2013-06-24 16:55:36 UTC
Hi Weiping, 

This looks a bug fixed in 2.6.24 kernel under commit d4beaf4ab, but although this looks similar, I can't confirm it without a reproducer.
Were you able to hit this any other time or was it a one time only event? Can you reproduce this in newer EL5 releases? There were any problem with this system that led the filesystem to an unclean umount, like power loss? Which then, needed a log replay during boot?

Comment 3 Weiping Pan 2014-06-17 00:43:52 UTC
Hi, Carlos,

I saw that bug only once.
Agreed to close this bug.

thanks
Weiping Pan