Bug 237815 - Assertion failure in journal_commit_transaction() at fs/jbd/commit.c:793: "jh->b_next_transaction == ((void *)0)"
Assertion failure in journal_commit_transaction() at fs/jbd/commit.c:793: "jh...
Status: CLOSED DUPLICATE of bug 158363
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.4
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Eric Sandeen
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-04-25 11:13 EDT by Matthew Coffey
Modified: 2007-11-16 20:14 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-31 14:14:35 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Matthew Coffey 2007-04-25 11:13:41 EDT
Description of problem:
Kernel panic with the message:

Assertion failure in journal_commit_transaction() at fs/jbd/commit.c:793:
"jh->b_next_transaction == ((void *)0)"

The server is an HP DL360g5 running Oracle RAC.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux AS release 4 (Nahant Update 4)

How reproducible:
Only one occurrence to date.

Steps to Reproduce:
1. Server had been up for 58 hours and the load on the system at the time of the
kernel panic was minimal. 
2. 
3.
  
Actual results:
Full OOPS
Assertion failure in journal_commit_transaction() at fs/jbd/commit.c:793:
"jh->b_next_transaction == ((void *)0)"
^M----------- [cut here ] --------- [please bite here ] ---------
^MKernel BUG at commit:793
^Minvalid operand: 0000 [1] SMP 
^MCPU 2 
^MModules linked in: hangcheck_timer md5 ipv6 qioctlmod sunrpc ds yenta_socket
pcmcia_core dm_mirror dm_round_robin dm_multipath dm_mod button battery ac
ohci_hcd hw_random bonding(U) e1000 tg3 floppy ext3 jbd qla2300 qla2xxx
scsi_transport_fc cciss sd_mod scsi_mod
^MPid: 569, comm: kjournald Not tainted 2.6.9-42.0.2.ELsmp
^MRIP: 0010:[<ffffffffa00956bd>]
<ffffffffa00956bd>{:jbd:journal_commit_transaction+4073}
^MRSP: 0018:00000101fd5ffbb8  EFLAGS: 00010212
^MRAX: 0000000000000075 RBX: 0000000000000000 RCX: ffffffff803e1fe8
^MRDX: ffffffff803e1fe8 RSI: 0000000000000246 RDI: ffffffff803e1fe0
^MRBP: 0000010085ec00e0 R08: ffffffff803e1fe8 R09: 0000000000000000
^MR10: 0000000100000000 R11: ffffffff8011e884 R12: 0000010034434348
^MR13: 00000100df633580 R14: 00000101fe3e6a00 R15: 0000000000000000
^MFS:  0000002a970416e0(0000) GS:ffffffff804e5280(0000) knlGS:00000000f7f248e0
^MCS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
^MCR2: 0000002a959add44 CR3: 0000000008032000 CR4: 00000000000006e0
^MProcess kjournald (pid: 569, threadinfo 00000101fd5fe000, task 00000101fe34a7f0)
^MStack: 00000ed400000000 00000100a2b8d12c 00000000800327e0 00000100a126d660 
^M       00000100a9147818 0000000000001568 0000000000000000 00000101fe34a7f0 
^M       ffffffff80135752 00000101fd5ffc30 
^MCall Trace:<ffffffff80135752>{autoremove_wake_function+0}
<ffffffff8013271e>{try_to_wake_up+876} 
^M       <ffffffff80135752>{autoremove_wake_function+0}
<ffffffff8030a1c1>{thread_return+0} 
^M       <ffffffff8030a219>{thread_return+88} <ffffffff8013fdf3>{del_timer+107}
^M       <ffffffffa0097914>{:jbd:kjournald+250}
<ffffffff80135752>{autoremove_wake_function+0} 
^M       <ffffffff80135752>{autoremove_wake_function+0}
<ffffffffa0097814>{:jbd:commit_timeout+0} 
^M       <ffffffff80110f47>{child_rip+8} <ffffffffa009781a>{:jbd:kjournald+0} 
^M       <ffffffff80110f3f>{child_rip+0} 
^MCode: 0f 0b d0 a8 09 a0 ff ff ff ff 19 03 4c 89 e7 e8 b5 e3 ff ff 
^MRIP <ffffffffa00956bd>{:jbd:journal_commit_transaction+4073} RSP
<00000101fd5ffbb8>
^M <0>Kernel panic - not syncing: Oops
^M Uhhuh. NMI received. Dazed and confused, but trying to continue
^MUhhuh. NMI received. Dazed and confused, but trying to continue
^MYou probably have a hardware problem with your RAM chips
^MUhhuh. NMI received. Dazed and confused, but trying to continue
^MYou probably have a hardware problem with your RAM chips
^MUhhuh. NMI received. Dazed and confused, but trying to continue
^MYou probably have a hardware problem with your RAM chips
^MYou probably have a hardware problem with your RAM chips

This could be caused by a memory error but, the DIMMs all report OK according to
hpasm.

Expected results:


Additional info:
This looks very much like bugzilla # 161101 except that the line in question has
moved from 790 to 793. There's no mention of this bug being fixed in the
description of the new kernel description that's cited in the ERRATA as fixing
the bug.

I tried attaching the sysreport from the server, but it is too big for the
upload form.
Comment 1 Eric Sandeen 2007-05-29 12:51:20 EDT
This looks like a dup of 
Bug 158363: Assert panic in fs/jbd/commit.c:790:journal_commit_transaction()

which is fixed in kernel 2.6.9-42.5 and later, i.e. RHEL4U5.
Comment 2 Eric Sandeen 2007-05-31 14:14:35 EDT
Dup'ing to bug 158363, the fix for which is available in kernels 2.6.9-50.EL and
beyond (i.e., RHEL4U5).

If the problem persists with that update, please re-open this bug.

Thanks,
-Eric


*** This bug has been marked as a duplicate of 158363 ***

Note You need to log in before you can comment on or make changes to this bug.