Bug 381221
| Summary: | Assertion failure in journal_start() at fs/jbd/transaction.c:274: 'handle->h_transaction->t_journal == journal' | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Issue Tracker <tao> | ||||
| Component: | kernel | Assignee: | Josef Bacik <jbacik> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 4.5 | CC: | esandeen, jbaron, tao, wmealing | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | RHSA-2008-0665 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2008-07-24 19:20:44 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 439194 | ||||||
| Attachments: |
|
||||||
|
Description
Issue Tracker
2007-11-13 22:16:20 UTC
We have seen a number of nodes (8) crash on the above assertion failure, over two days. These are all relatively recent reinstalls with SLC4 and only 1GB of memory (so assume tight memory pressure), ext3, with quotas on 2 file systems. While we on our side try to find out more on what actually happened and eventually reproduce this on "vanilla" RHEL4, could you perhaps already check whether you have seen something similar (and even better, whether you have a fix)? According to Google, both Centos4 and various other upstream kernels have been hit by this in the past, with at least one message indicating that Ext3 could be doing something wrong when under memory pressure, this was with 2.6.15.1: http://lkml.org/lkml/2006/2/1/328 Our tracebacks all look like Nov 5 09:50:10 Assertion failure in journal_start() at fs/jbd/transaction.c:274: "handle->h_transaction->t_journal == journal" Nov 5 09:50:10 ------------[ cut here ]------------ Nov 5 09:50:10 kernel BUG at fs/jbd/transaction.c:274! Nov 5 09:50:10 invalid operand: 0000 [#1] Nov 5 09:50:10 SMP Nov 5 09:50:10 Modules linked in: e7xxx_edac edac_mc libafs(U) autofs4 i2c_dev i2c_core sunrpc md5 ipv6 dm_mirror dm_mod button battery ac uhci_hcd hw_random e100 mii ext3 jbd ata_piix libata sd_mod scsi_mod Nov 5 09:50:10 CPU: 1 Nov 5 09:50:10 EIP: 0060:[<f882f449>] Tainted: PF VLI Nov 5 09:50:10 EFLAGS: 00010216 (2.6.9-55.0.6.EL.cernsmp) Nov 5 09:50:10 EIP is at journal_start+0x45/0x9e [jbd] Nov 5 09:50:10 eax: 00000073 ebx: cf532294 ecx: f1bcbacc edx: f88361ca Nov 5 09:50:10 esi: f7f55c00 edi: f1bcb000 ebp: c0330f18 esp: f1bcbac8 Nov 5 09:50:10 ds: 007b es: 007b ss: 0068 Nov 5 09:50:10 Process fsprobe (pid: 8989, threadinfo=f1bcb000 task=f70103b0) Nov 5 09:50:10 Stack: f88361ca f8835d62 f88361b5 00000112 f8836224 f6bcbbc0 f1bcbb1c 0000006d Nov 5 09:50:10 f886bd89 f6bcbbc0 f1bcbb1c c0171bf0 f6bcbbc0 c0171c85 f3760de8 f3760df0 Nov 5 09:50:10 00000000 c017200a 00000080 00000080 00000080 c1b4fdf0 d1afb9a0 00000000 Nov 5 09:50:10 Call Trace: Nov 5 09:50:10 [<f886bd89>] ext3_dquot_drop+0x14/0x3b [ext3] Nov 5 09:50:10 [<c0171bf0>] clear_inode+0xb4/0x102 Nov 5 09:50:10 [<c0171c85>] dispose_list+0x47/0x6d Nov 5 09:50:10 [<c017200a>] prune_icache+0x193/0x1ec Nov 5 09:50:10 [<c0172077>] shrink_icache_memory+0x14/0x2b Nov 5 09:50:10 [<c0149dac>] shrink_slab+0xf8/0x161 Nov 5 09:50:10 [<c014ae19>] try_to_free_pages+0xd5/0x1bb Nov 5 09:50:10 [<c0144338>] __alloc_pages+0x1bc/0x2a6 Nov 5 09:50:10 [<c015491f>] read_swap_cache_async+0x56/0xa7 Nov 5 09:50:10 [<c014e3bf>] swapin_readahead+0x3b/0x57 Nov 5 09:50:10 [<c014e451>] do_swap_page+0x76/0x2ea Nov 5 09:50:10 [<c014ed89>] handle_mm_fault+0x116/0x193 Nov 5 09:50:10 [<c014d7f0>] get_user_pages+0x235/0x368 Nov 5 09:50:10 [<c0179c11>] dio_refill_pages+0x7d/0x112 Nov 5 09:50:10 [<c0179cbe>] dio_get_page+0x18/0x4a Nov 5 09:50:10 [<c017a608>] do_direct_IO+0x5b/0x306 Nov 5 09:50:10 [<c017ab12>] direct_io_worker+0x25f/0x4ee Nov 5 09:50:10 [<c017b17a>] __blockdev_direct_IO+0x3d9/0x422 Nov 5 09:50:10 [<f8863970>] ext3_direct_io_get_blocks+0x0/0xaa [ext3] Nov 5 09:50:10 [<f8864616>] ext3_direct_IO+0xef/0x1a5 [ext3] Nov 5 09:50:10 [<f8863970>] ext3_direct_io_get_blocks+0x0/0xaa [ext3] Nov 5 09:50:10 [<c0142e5c>] generic_file_direct_IO+0x3c/0x5c Nov 5 09:50:10 [<c0142027>] generic_file_direct_write+0x51/0x122 Nov 5 09:50:10 [<c0126934>] current_fs_time+0x44/0x4c Nov 5 09:50:10 [<c0142935>] __generic_file_aio_write_nolock+0x33c/0x3b7 Nov 5 09:50:10 [<c01429e9>] generic_file_aio_write_nolock+0x39/0x7f Nov 5 09:50:10 [<c0142bd3>] generic_file_aio_write+0x72/0xc6 Nov 5 09:50:10 [<f8861d9e>] ext3_file_write+0x19/0x8b [ext3] Nov 5 09:50:10 [<c015b95c>] do_sync_write+0x9e/0xcb Nov 5 09:50:10 [<c02d4732>] schedule+0x84e/0x8ec Nov 5 09:50:10 [<c01ae2f6>] selinux_file_permission+0x117/0x120 Nov 5 09:50:10 [<c012052d>] autoremove_wake_function+0x0/0x2d Nov 5 09:50:10 [<c015ba3f>] vfs_write+0xb6/0xe2 Nov 5 09:50:11 [<c015bb09>] sys_write+0x3c/0x62 Nov 5 09:50:11 [<c02d68bf>] syscall_call+0x7/0xb Nov 5 09:50:11 [<c02d007b>] unix_accept+0x5e/0xd6 Nov 5 09:50:11 Code: ff 74 7d 85 db 74 34 8b 03 39 30 74 29 68 24 62 83 f8 68 12 01 00 00 68 b5 61 83 f8 68 62 5d 83 f8 68 ca 61 83 f8 e8 a9 34 8f c7 <0f> 0b 12 01 b5 61 83 f8 83 c4 14 ff 43 08 eb 43 89 d0 e8 64 ff Nov 5 09:50:11 <0>Fatal exception: panic in 5 seconds Nov 5 09:50:16 Kernel panic - not syncing: Fatal exception CentOS: http://bugs.centos.org/view.php?id=1167 (2.6.9-22.0.1.ELsmp) http://bugs.centos.org/view.php?id=2077 (2.6.9-42.0.10.ELsmp) http://www.centos.org/modules/newbb/viewtopic.php?viewmode=flat&topic_id=8779&forum=27 (2.6.9-55.ELsmp) This event sent from IssueTracker by bbraswel [Support Engineering Group] issue 137165 Created attachment 260401 [details]
patch to fix the problem.
Please have the customer test this patch and verify it works for them.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Committed in 68.27.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html *** Bug 461871 has been marked as a duplicate of this bug. *** |