Bug 381221 - Assertion failure in journal_start() at fs/jbd/transaction.c:274: 'handle->h_transaction->t_journal == journal'
Assertion failure in journal_start() at fs/jbd/transaction.c:274: 'handle->h_...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.5
All Linux
high Severity high
: ---
: ---
Assigned To: Josef Bacik
Martin Jenner
:
: 461871 (view as bug list)
Depends On:
Blocks: 439194
  Show dependency treegraph
 
Reported: 2007-11-13 17:16 EST by Issue Tracker
Modified: 2012-07-03 03:26 EDT (History)
4 users (show)

See Also:
Fixed In Version: RHSA-2008-0665
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-24 15:20:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch to fix the problem. (2.68 KB, patch)
2007-11-15 16:18 EST, Josef Bacik
no flags Details | Diff

  None (edit)
Description Issue Tracker 2007-11-13 17:16:20 EST
Escalated to Bugzilla from IssueTracker
Comment 1 Issue Tracker 2007-11-13 17:16:21 EST
We have seen a number of nodes (8) crash on the above assertion failure, over two days. These are all relatively recent reinstalls with SLC4 and only 1GB of memory (so assume tight memory pressure), ext3, with quotas on 2 file systems.

While we on our side try to find out more on what actually happened and eventually reproduce this on "vanilla" RHEL4, could you perhaps already check whether you have seen something similar (and even better, whether you have a fix)?

According to Google, both Centos4 and various other upstream kernels have been hit by this in the past, with at least one message indicating that Ext3 could be doing something wrong when under memory pressure, this was with 2.6.15.1:
http://lkml.org/lkml/2006/2/1/328

Our tracebacks all look like

Nov  5 09:50:10 Assertion failure in journal_start() at fs/jbd/transaction.c:274: "handle->h_transaction->t_journal == journal"
Nov  5 09:50:10 ------------[ cut here ]------------
Nov  5 09:50:10 kernel BUG at fs/jbd/transaction.c:274!
Nov  5 09:50:10 invalid operand: 0000 [#1]
Nov  5 09:50:10 SMP
Nov  5 09:50:10 Modules linked in: e7xxx_edac edac_mc libafs(U) autofs4 i2c_dev i2c_core sunrpc md5 ipv6 dm_mirror dm_mod button battery ac uhci_hcd hw_random e100 mii ext3 jbd ata_piix libata sd_mod scsi_mod
Nov  5 09:50:10 CPU:    1
Nov  5 09:50:10 EIP:    0060:[<f882f449>]    Tainted: PF     VLI
Nov  5 09:50:10 EFLAGS: 00010216   (2.6.9-55.0.6.EL.cernsmp)
Nov  5 09:50:10 EIP is at journal_start+0x45/0x9e [jbd]
Nov  5 09:50:10 eax: 00000073   ebx: cf532294   ecx: f1bcbacc   edx: f88361ca
Nov  5 09:50:10 esi: f7f55c00   edi: f1bcb000   ebp: c0330f18   esp: f1bcbac8
Nov  5 09:50:10 ds: 007b   es: 007b   ss: 0068
Nov  5 09:50:10 Process fsprobe (pid: 8989, threadinfo=f1bcb000 task=f70103b0)
Nov  5 09:50:10 Stack: f88361ca f8835d62 f88361b5 00000112 f8836224 f6bcbbc0 f1bcbb1c 0000006d
Nov  5 09:50:10        f886bd89 f6bcbbc0 f1bcbb1c c0171bf0 f6bcbbc0 c0171c85 f3760de8 f3760df0
Nov  5 09:50:10        00000000 c017200a 00000080 00000080 00000080 c1b4fdf0 d1afb9a0 00000000
Nov  5 09:50:10 Call Trace:
Nov  5 09:50:10  [<f886bd89>] ext3_dquot_drop+0x14/0x3b [ext3]
Nov  5 09:50:10  [<c0171bf0>] clear_inode+0xb4/0x102
Nov  5 09:50:10  [<c0171c85>] dispose_list+0x47/0x6d
Nov  5 09:50:10  [<c017200a>] prune_icache+0x193/0x1ec
Nov  5 09:50:10  [<c0172077>] shrink_icache_memory+0x14/0x2b
Nov  5 09:50:10  [<c0149dac>] shrink_slab+0xf8/0x161
Nov  5 09:50:10  [<c014ae19>] try_to_free_pages+0xd5/0x1bb
Nov  5 09:50:10  [<c0144338>] __alloc_pages+0x1bc/0x2a6
Nov  5 09:50:10  [<c015491f>] read_swap_cache_async+0x56/0xa7
Nov  5 09:50:10  [<c014e3bf>] swapin_readahead+0x3b/0x57
Nov  5 09:50:10  [<c014e451>] do_swap_page+0x76/0x2ea
Nov  5 09:50:10  [<c014ed89>] handle_mm_fault+0x116/0x193
Nov  5 09:50:10  [<c014d7f0>] get_user_pages+0x235/0x368
Nov  5 09:50:10  [<c0179c11>] dio_refill_pages+0x7d/0x112
Nov  5 09:50:10  [<c0179cbe>] dio_get_page+0x18/0x4a
Nov  5 09:50:10  [<c017a608>] do_direct_IO+0x5b/0x306
Nov  5 09:50:10  [<c017ab12>] direct_io_worker+0x25f/0x4ee
Nov  5 09:50:10  [<c017b17a>] __blockdev_direct_IO+0x3d9/0x422
Nov  5 09:50:10  [<f8863970>] ext3_direct_io_get_blocks+0x0/0xaa [ext3]
Nov  5 09:50:10  [<f8864616>] ext3_direct_IO+0xef/0x1a5 [ext3]
Nov  5 09:50:10  [<f8863970>] ext3_direct_io_get_blocks+0x0/0xaa [ext3]
Nov  5 09:50:10  [<c0142e5c>] generic_file_direct_IO+0x3c/0x5c
Nov  5 09:50:10  [<c0142027>] generic_file_direct_write+0x51/0x122
Nov  5 09:50:10  [<c0126934>] current_fs_time+0x44/0x4c
Nov  5 09:50:10  [<c0142935>] __generic_file_aio_write_nolock+0x33c/0x3b7
Nov  5 09:50:10  [<c01429e9>] generic_file_aio_write_nolock+0x39/0x7f
Nov  5 09:50:10  [<c0142bd3>] generic_file_aio_write+0x72/0xc6
Nov  5 09:50:10  [<f8861d9e>] ext3_file_write+0x19/0x8b [ext3]
Nov  5 09:50:10  [<c015b95c>] do_sync_write+0x9e/0xcb
Nov  5 09:50:10  [<c02d4732>] schedule+0x84e/0x8ec
Nov  5 09:50:10  [<c01ae2f6>] selinux_file_permission+0x117/0x120
Nov  5 09:50:10  [<c012052d>] autoremove_wake_function+0x0/0x2d
Nov  5 09:50:10  [<c015ba3f>] vfs_write+0xb6/0xe2
Nov  5 09:50:11  [<c015bb09>] sys_write+0x3c/0x62
Nov  5 09:50:11  [<c02d68bf>] syscall_call+0x7/0xb
Nov  5 09:50:11  [<c02d007b>] unix_accept+0x5e/0xd6
Nov  5 09:50:11 Code: ff 74 7d 85 db 74 34 8b 03 39 30 74 29 68 24 62 83 f8 68 12 01 00 00 68 b5 61 83 f8 68 62 5d 83 f8 68 ca 61 83 f8 e8 a9 34 8f c7 <0f> 0b 12 01 b5 61 83 f8 83 c4 14 ff 43 08 eb 43 89 d0 e8 64 ff
Nov  5 09:50:11  <0>Fatal exception: panic in 5 seconds
Nov  5 09:50:16 Kernel panic - not syncing: Fatal exception


CentOS: 
http://bugs.centos.org/view.php?id=1167 (2.6.9-22.0.1.ELsmp)
http://bugs.centos.org/view.php?id=2077 (2.6.9-42.0.10.ELsmp)
http://www.centos.org/modules/newbb/viewtopic.php?viewmode=flat&topic_id=8779&forum=27 (2.6.9-55.ELsmp)

This event sent from IssueTracker by bbraswel  [Support Engineering Group]
 issue 137165
Comment 2 Josef Bacik 2007-11-15 16:18:13 EST
Created attachment 260401 [details]
patch to fix the problem.

Please have the customer test this patch and verify it works for them.
Comment 12 RHEL Product and Program Management 2008-03-26 14:39:10 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 13 Vivek Goyal 2008-03-27 19:23:09 EDT
Committed in 68.27.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 16 errata-xmlrpc 2008-07-24 15:20:44 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html
Comment 18 Josef Bacik 2008-09-11 21:28:00 EDT
*** Bug 461871 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.