Bug 461249

Summary: Assertion failure in do_get_write_access() at fs/jbd/transaction.c:624: "jh->b_transaction == journal->j_committing_transaction"
Product: Red Hat Enterprise Linux 4 Reporter: Bryn M. Reeves <bmr>
Component: kernelAssignee: Josef Bacik <jbacik>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.6CC: jbacik, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-16 18:14:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
crash output from comment #2 none

Description Bryn M. Reeves 2008-09-05 10:21:08 UTC
Description of problem:
Assertion failure in do_get_write_access() at fs/jbd/transaction.c:624: "jh->b_transaction == journal->j_committing_transaction"
   ----------- [cut here ] --------- [please bite here ] ---------
   Kernel BUG at transaction:624
   invalid operand: 0000 [1] SMP
   CPU 0
   Modules linked in: nfsd exportfs lockd nfs_acl md5 ipv6 parport_pc lp parport netconsole netdump autofs4 i2c_dev i2c_core vmmemctl(U) sunrpc ds yenta_socket pcmcia_core vmhgfs(U) button battery ac
   shpchp e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
   Pid: 25983, comm: updatedb Not tainted 2.6.9-67.ELsmp
   RIP: 0010:[<ffffffffa009542c>] <ffffffffa009542c>{:jbd:do_get_write_access+767}
   RSP: 0018:0000010058de3d28  EFLAGS: 00010212
   RAX: 0000000000000084 RBX: 0000000000000000 RCX: ffffffff803e95e8
   RDX: ffffffff803e95e8 RSI: 0000000000000246 RDI: ffffffff803e95e0
   RBP: 0000010069b9d5b0 R08: ffffffff803e95e8 R09: 0000000000000000
   R10: 0000000100000000 R11: 000001007b981380 R12: 0000010064088500
   R13: 0000010076d84680 R14: 000001005af47ee0 R15: 0000000000000000
   FS:  0000002a95562b00(0000) GS:ffffffff804f2d80(0000) knlGS:00000000e78c1ba0
   CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
   CR2: 0000000000637068 CR3: 0000000000101000 CR4: 00000000000006e0
   Process updatedb (pid: 25983, threadinfo 0000010058de2000, task 0000010059921030)
   Stack: 000001006409dea0 00000000640f3be0 0000000000000000 00000000640f3ce8
          0000010037e3f600 00000000640f3fa8 0000000000000000 0000000000000246
          0000000000000400 0000000000500111
   Call Trace:<ffffffff8017f26c>{__getblk+43} <ffffffffa009588e>{:jbd:journal_get_write_access+38}
          <ffffffffa00ab9c6>{:ext3:ext3_reserve_inode_write+63}
          <ffffffffa00aba51>{:ext3:ext3_mark_inode_dirty+47}
          <ffffffffa00944c6>{:jbd:journal_start+223} <ffffffffa00ad7d2>{:ext3:ext3_dirty_inode+99}
          <ffffffff8019a4e0>{__mark_inode_dirty+40} <ffffffff8018d028>{filldir64+0}
          <ffffffff80193932>{update_atime+147} <ffffffff8018cd7c>{vfs_readdir+176}
          <ffffffff8018d158>{sys_getdents64+118} <ffffffff80179d00>{sys_fchdir+111}
          <ffffffff8011026a>{system_call+126}
  
   Code: 0f 0b 96 c3 09 a0 ff ff ff ff 70 02 8b 45 0c 83 f8 05 0f 85
   RIP <ffffffffa009542c>{:jbd:do_get_write_access+767} RSP <0000010058de3d28>

Version-Release number of selected component (if applicable):
2.6.9-67.EL

How reproducible:
Unknown - only seen once to date.

Steps to Reproduce:
1. n/a
  
Actual results:
Above assertion failure & BUG

Expected results:
No BUG

Additional info:
There have been numerous similar bugs over the years:

    http://tinyurl.com/5fbknc [ bugzilla.redhat.com ]

https://bugzilla.redhat.com/show_bug.cgi?id=185451 is probably the closest match; we're hitting the same assertion failure in do_get_write_access(). This was closed ERRATA with a backport from the following git commit:

http://marc.info/?l=git-commits-head&m=111208670104089&w=2

Comment 2 Bryn M. Reeves 2008-09-05 12:05:31 UTC
Offending code:

xffffffffa00953e4 <do_get_write_access+695>:   callq  0xffffffff80138dbd <printk>
 622                 J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
0xffffffffa00953e9 <do_get_write_access+700>:   ud2a
0xffffffffa00953f5 <do_get_write_access+712>:   mov    0x20(%rsp),%rax                                  # load journal into %eax. %ebp has &jh
0xffffffffa00953fa <do_get_write_access+717>:   mov    0x58(%rax),%rax                                  # load journal->j_committing_transaction into %eax
0xffffffffa00953fe <do_get_write_access+721>:   cmp    %rax,0x20(%rbp)                                  # compare j_committing_transaction jh->b_transaction
0xffffffffa0095402 <do_get_write_access+725>:   je     0xffffffffa0095438 <do_get_write_access+779>-----+
0xffffffffa0095404 <do_get_write_access+727>:   mov    $0xffffffffa009c56a,%r8                          | &"jh->b_transaction == journal->j_committing_transaction"
0xffffffffa009540b <do_get_write_access+734>:   mov    $0x270,%ecx                                      |
0xffffffffa0095410 <do_get_write_access+739>:   mov    $0xffffffffa009c354,%rdx                         | &"fs/jbd/transaction.c"
0xffffffffa0095417 <do_get_write_access+746>:   mov    $0xffffffffa009beb0,%rsi                         | &"do_get_write_access"
0xffffffffa009541e <do_get_write_access+753>:   mov    $0xffffffffa009c369,%rdi                         | &"<0>Assertion failure in %s() at %s:%d: "%s"
0xffffffffa0095425 <do_get_write_access+760>:   xor    %eax,%eax                                        |
0xffffffffa0095427 <do_get_write_access+762>:   callq  0xffffffff80138dbd <printk>                      |
 623                 J_ASSERT_JH(jh, jh->b_transaction ==                                               |
 624                                         journal->j_committing_transaction);                        |
0xffffffffa009542c <do_get_write_access+767>:   ud2a                                                    | BUG()
0xffffffffa0095438 <do_get_write_access+779>:   mov    0xc(%rbp),%eaxi <--------------------------------+
include/linux/jbd.h:#define BJ_Shadow   5       /* Buffer contents being shadowed to the log */
 635                 if (jh->b_jlist == BJ_Shadow) {
0xffffffffa009543b <do_get_write_access+782>:   cmp    $0x5,%eax
0xffffffffa009543e <do_get_write_access+785>:   jne    0xffffffffa00954e4 <do_get_write_access+951>

    [exception RIP: do_get_write_access+767]
    RIP: ffffffffa009542c  RSP: 0000010058de3d28  RFLAGS: 00010212
    RAX: 0000000000000084  RBX: 0000000000000000  RCX: ffffffff803e95e8
    RDX: ffffffff803e95e8  RSI: 0000000000000246  RDI: ffffffff803e95e0
    RBP: 0000010069b9d5b0   R8: ffffffff803e95e8   R9: 0000000000000000
    ^^^  &jh 
    R10: 0000000100000000  R11: 000001007b981380  R12: 0000010064088500
    R13: 0000010076d84680  R14: 000001005af47ee0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

We apparently got here with b_transaction = 0x0x100759485c0 but j_committing_transaction = 0x84. Hmmm... oh no: %eax got clobbered before the BUG.. 

jh:

struct journal_head 0000010069b9d5b0
struct journal_head {
  b_bh = 0x10064088500, 
  b_jcount = 1, 
  b_jlist = 2, 
  b_frozen_data = 0x0, 
  b_committed_data = 0x0, 
  b_transaction = 0x100759485c0, 
  b_next_transaction = 0x0, 
  b_tnext = 0x1006dc3cc90, 
  b_tprev = 0x1006dc3c870, 
  b_cp_transaction = 0x10075948bc0, 
  b_cpnext = 0x1006dc3cc90, 
  b_cpprev = 0x1006dc3c870
}

journal:

This should be in 0x20(%RSP) but looking at that in crash doesn't even appear to be a journal_s:

RSP: 0000010058de3d28

struct journal_s 0x0000010058de3d48
struct journal_s {
  j_flags = 1100449314304, 
  j_errno = 1678720936, 
  j_sb_buffer = 0x0, 
  j_superblock = 0x246, 
  j_format_version = 1024, 
  j_state_lock = {
    lock = 0, 
    magic = 5243153
  }, 
  j_barrier_count = 0, 
  j_barrier = {
    count = {
      counter = 582
    }, 
    sleepers = 0, 
    wait = {
      lock = {
        lock = 1678370312, 
        magic = 256
      }, 
      task_list = {
        next = 0x400, 
        prev = 0xffffffff8017f26c
      }
    }
  }, 
  j_running_transaction = 0x280, 
  j_committing_transaction = 0x1, 
  j_checkpoint_transactions = 0x1005af47ee0, 
  j_wait_transaction_locked = {
[...]
  }
}

Comment 3 Bryn M. Reeves 2008-09-05 12:28:56 UTC
Created attachment 315899 [details]
crash output from comment #2

New bugzilla doesn't like long lines - disassembly from comment #2 as an attachment.