Bug 729779

Summary: kernel: Regression: system hangs with "kernel BUG at fs/jbd/transaction.c:1155!" under fs load
Product: [Fedora] Fedora Reporter: Ben Webb <ben>
Component: kernelAssignee: Eric Sandeen <esandeen>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 15CC: ajschult784, aquini, cebbert, davej, esandeen, eteo, gansalmon, goeran, itamar, jonathan, jwboyer, kernel-maint, kmcmartin, madhu.chinakonda, mark, vomlehn
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-17 21:10:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ben Webb 2011-08-10 19:27:53 UTC
Description of problem:
With the 2.6.40-4.fc15.x86_64 kernel, after a short period of heavy filesystem access ("scons -j3" to build a large set of C++ sources) the following is logged:

kernel BUG at fs/jbd/transaction.c:1155!

any further access to this (ext3) filesystem hangs, and the system must be power cycled. Thinking it a possible ext3 bug, I remounted the fs as ext4 (and also ran an fsck to repair possible fs corruption). This did not fix the problem, but did change the logged message to:

kernel BUG at fs/jbd2/transaction.c:1083!

(See below for the full logs. In both cases it seems to be getting confused somewhere in the journal code while trying to make a symlink.)

Downgrading the kernel to 2.6.38.8-35.fc15.x86_64 fixes the problem - the system is stable.

Version-Release number of selected component (if applicable):
kernel-2.6.40-4.fc15.x86_64

How reproducible:
Always (on this system).

Steps to Reproduce:
1. Mount an ext3 or ext4 filesystem.
2. Stress the filesystem.
Note, however, that the same kernel on similar (but not identical) hardware we have here works fine, so I don't know how reproducible this will be for you.
  
Actual results:
With an ext3 filesystem, the following is logged:
Aug  9 15:01:03 chime kernel: [10684.466368] ------------[ cut here ]------------
Aug  9 15:01:03 chime kernel: [10684.466386] kernel BUG at fs/jbd/transaction.c:1155!
Aug  9 15:01:03 chime kernel: [10684.466392] invalid opcode: 0000 [#1] SMP
Aug  9 15:01:03 chime kernel: [10684.466399] CPU 7
Aug  9 15:01:03 chime kernel: [10684.466402] Modules linked in: fuse nfs fscache nfsd nfs_acl auth_rpcgss lockd sunrpc bnep bluetooth rfkill nf_conntrack_ftp ipt_LOG xt_state xt_limit xt_multiport p4_clockmod freq_table speedstep_lib iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle nvidia(P) ppdev dcdbas snd_hda_codec_hdmi microcode joydev iTCO_wdt snd_hda_codec_idt iTCO_vendor_support serio_raw i2c_i801 tg3 snd_hda_intel snd_hda_codec parport_pc parport snd_hwdep snd_seq snd_seq_device snd_pcm ns83820 i5000_edac edac_core i5k_amb snd_timer snd shpchp soundcore snd_page_alloc firewire_ohci firewire_core crc_itu_t nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core mxm_wmi wmi video [last unloaded: mperf]
Aug  9 15:01:03 chime kernel: [10684.466510]
Aug  9 15:01:03 chime kernel: [10684.466517] Pid: 5435, comm: scons Tainted: P            2.6.40-4.fc15.x86_64 #1 Dell Inc.                 Precision WorkStation 490    /0F9382
Aug  9 15:01:03 chime kernel: [10684.466528] RIP: 0010:[<ffffffff811ca8cc>]  [<ffffffff811ca8cc>] journal_dirty_metadata+0x4e/0xd3
Aug  9 15:01:03 chime kernel: [10684.466557] RSP: 0018:ffff88005fcc5d08  EFLAGS: 00010246
Aug  9 15:01:03 chime kernel: [10684.466562] RAX: 0000000000000000 RBX: ffff880109f31af0 RCX: ffff880126671f00
Aug  9 15:01:03 chime kernel: [10684.466567] RDX: ffff8801342b1340 RSI: ffff8801342b1340 RDI: ffff8801342b1340
Aug  9 15:01:03 chime kernel: [10684.466572] RBP: ffff88005fcc5d38 R08: 0000000000000000 R09: ffff8801373cdcc0
Aug  9 15:01:03 chime kernel: [10684.466578] R10: ffff88009eaa68f0 R11: 0000000000000000 R12: ffff88013501f800
Aug  9 15:01:03 chime kernel: [10684.466583] R13: ffff8801373cdcc0 R14: ffff8801342b1340 R15: ffff880126671f00
Aug  9 15:01:03 chime kernel: [10684.466589] FS:  00007fc58f1a1720(0000) GS:ffff88013fdc0000(0000) knlGS:0000000000000000
Aug  9 15:01:03 chime kernel: [10684.466595] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug  9 15:01:03 chime kernel: [10684.466600] CR2: 0000000007eac198 CR3: 00000000b1bce000 CR4: 00000000000006e0
Aug  9 15:01:03 chime kernel: [10684.466605] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug  9 15:01:03 chime kernel: [10684.466611] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug  9 15:01:03 chime kernel: [10684.466617] Process scons (pid: 5435, threadinfo ffff88005fcc4000, task ffff880134eb2e60)
Aug  9 15:01:03 chime kernel: [10684.466621] Stack:
Aug  9 15:01:03 chime kernel: [10684.466624]  0000000000000000 ffffffff8161ead0 ffff8801373cdcc0 ffff88013501ac00
Aug  9 15:01:03 chime kernel: [10684.466636]  ffff8801373cdcc0 ffff88010eba63a0 ffff88005fcc5d68 ffffffff81194292
Aug  9 15:01:03 chime kernel: [10684.466646]  0000a1ff0000a000 ffff8801342b1340 0000000000000000 ffff88010eba63e8
Aug  9 15:01:03 chime kernel: [10684.466656] Call Trace:
Aug  9 15:01:03 chime kernel: [10684.466668]  [<ffffffff81194292>] __ext3_journal_dirty_metadata+0x25/0x53
Aug  9 15:01:03 chime kernel: [10684.466678]  [<ffffffff8118d90e>] ext3_orphan_add+0xfd/0x172
Aug  9 15:01:03 chime kernel: [10684.466687]  [<ffffffff8118e549>] ext3_symlink+0x141/0x284
Aug  9 15:01:03 chime kernel: [10684.466697]  [<ffffffff811314eb>] vfs_symlink+0x54/0x74
Aug  9 15:01:03 chime kernel: [10684.466706]  [<ffffffff81132f93>] sys_symlinkat+0x96/0xef
Aug  9 15:01:03 chime kernel: [10684.466716]  [<ffffffff8112a7cd>] ? sys_newstat+0x2a/0x33
Aug  9 15:01:03 chime kernel: [10684.466725]  [<ffffffff81133002>] sys_symlink+0x16/0x18
Aug  9 15:01:03 chime kernel: [10684.466735]  [<ffffffff814bd7c2>] system_call_fastpath+0x16/0x1b
Aug  9 15:01:03 chime kernel: [10684.466740] Code: f1 ff ff 85 c0 4d 8b 27 0f 85 94 00 00 00 4c 89 f7 e8 ce ed ff ff 83 7b 10 00 75 17 c7 43 10 01 00 00 00 41 8b 45 08 85 c0 7f 02 <0f> 0b ff c8 41 89 45 08 4c 39 7b 28 75 0f 83 7b 0c 02 75 09 4d
Aug  9 15:01:03 chime kernel: [10684.466816] RIP  [<ffffffff811ca8cc>] journal_dirty_metadata+0x4e/0xd3
Aug  9 15:01:03 chime kernel: [10684.466825]  RSP <ffff88005fcc5d08>
Aug  9 15:01:03 chime kernel: [10684.466831] ---[ end trace 0d2045d06963529e ]---

With an ext4 filesystem:
Aug  9 16:50:18 chime kernel: [  653.266641] ------------[ cut here ]------------
Aug  9 16:50:18 chime kernel: [  653.266656] kernel BUG at fs/jbd2/transaction.c:1083!
Aug  9 16:50:18 chime kernel: [  653.266661] invalid opcode: 0000 [#1] SMP
Aug  9 16:50:18 chime kernel: [  653.266667] CPU 4
Aug  9 16:50:18 chime kernel: [  653.266671] Modules linked in: fuse nfs fscache nfsd nfs_acl auth_rpcgss lockd sunrpc bnep bluetooth rfkill nf_conntrack_ftp ipt_LOG xt_state xt_limit xt_multiport p4_clockmod freq_table speedstep_lib iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle nvidia(P) snd_hda_codec_hdmi ppdev dcdbas microcode serio_raw snd_hda_codec_idt i2c_i801 snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm joydev iTCO_wdt iTCO_vendor_support parport_pc parport ns83820 tg3 snd_timer snd i5000_edac edac_core i5k_amb soundcore snd_page_alloc shpchp firewire_ohci firewire_core crc_itu_t nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core mxm_wmi wmi video [last unloaded: mperf]
Aug  9 16:50:18 chime kernel: [  653.266782]
Aug  9 16:50:18 chime kernel: [  653.266788] Pid: 2342, comm: scons Tainted: P            2.6.40-4.fc15.x86_64 #1 Dell Inc.                 Precision WorkStation 490    /0F9382
Aug  9 16:50:18 chime kernel: [  653.266797] RIP: 0010:[<ffffffff811d0ff6>]  [<ffffffff811d0ff6>] jbd2_journal_dirty_metadata+0x4e/0xd4
Aug  9 16:50:18 chime kernel: [  653.266821] RSP: 0018:ffff880037b83cd8  EFLAGS: 00010246
Aug  9 16:50:18 chime kernel: [  653.266824] RAX: 0000000000000000 RBX: ffff8800a94581c0 RCX: 0000000000000000
Aug  9 16:50:18 chime kernel: [  653.266828] RDX: ffff880127cad0d8 RSI: ffff8801371331a0 RDI: ffff8801371331a0
Aug  9 16:50:18 chime kernel: [  653.266831] RBP: ffff880037b83d08 R08: ffff8801371331a0 R09: 0000000000000000
Aug  9 16:50:18 chime kernel: [  653.266834] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880135094800
Aug  9 16:50:18 chime kernel: [  653.266837] R13: ffff880127cad0d8 R14: ffff8801371331a0 R15: ffff880115e56d00
Aug  9 16:50:18 chime kernel: [  653.266843] FS:  00007f6d2f532700(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
Aug  9 16:50:18 chime kernel: [  653.266847] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug  9 16:50:18 chime kernel: [  653.266850] CR2: 00007f6d2400e990 CR3: 00000001381ca000 CR4: 00000000000006e0
Aug  9 16:50:18 chime kernel: [  653.266854] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug  9 16:50:18 chime kernel: [  653.266858] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug  9 16:50:18 chime kernel: [  653.266862] Process scons (pid: 2342, threadinfo ffff880037b82000, task ffff88012bf81730)
Aug  9 16:50:18 chime kernel: [  653.266864] Stack:
Aug  9 16:50:18 chime kernel: [  653.266867]  0000000000000000 0000000000000000 ffff880127cad0d8 ffff8801371331a0
Aug  9 16:50:18 chime kernel: [  653.266875]  00000000000007eb ffffffff816206e0 ffff880037b83d58 ffffffff811bdcaa
Aug  9 16:50:18 chime kernel: [  653.266882]  ffff880037b83d58 ffffffff8119d9e4 ffff880037b83d58 ffff8800a965fae0
Aug  9 16:50:18 chime kernel: [  653.266889] Call Trace:
Aug  9 16:50:18 chime kernel: [  653.266905]  [<ffffffff811bdcaa>] __ext4_handle_dirty_metadata+0xf1/0xfa
Aug  9 16:50:18 chime kernel: [  653.266913]  [<ffffffff8119d9e4>] ? ext4_reserve_inode_write+0x41/0x83
Aug  9 16:50:18 chime kernel: [  653.266919]  [<ffffffff811a5959>] ext4_orphan_add+0x101/0x1af
Aug  9 16:50:18 chime kernel: [  653.266924]  [<ffffffff811a6750>] ext4_symlink+0x187/0x2f7
Aug  9 16:50:18 chime kernel: [  653.266934]  [<ffffffff811314eb>] vfs_symlink+0x54/0x74
Aug  9 16:50:18 chime kernel: [  653.266939]  [<ffffffff81132f93>] sys_symlinkat+0x96/0xef
Aug  9 16:50:18 chime kernel: [  653.266950]  [<ffffffff8112a7cd>] ? sys_newstat+0x2a/0x33
Aug  9 16:50:18 chime kernel: [  653.266956]  [<ffffffff81133002>] sys_symlink+0x16/0x18
Aug  9 16:50:18 chime kernel: [  653.266965]  [<ffffffff814bd7c2>] system_call_fastpath+0x16/0x1b
Aug  9 16:50:18 chime kernel: [  653.266968] Code: f1 ff ff 85 c0 4d 8b 27 0f 85 94 00 00 00 4c 89 f7 e8 d0 eb ff ff 83 7b 10 00 75 17 c7 43 10 01 00 00 00 41 8b 45 08 85 c0 7f 02 <0f> 0b ff c8 41 89 45 08 4c 39 7b 28 75 0f 83 7b 0c 01 75 09 4d
Aug  9 16:50:18 chime kernel: [  653.267011] RIP  [<ffffffff811d0ff6>] jbd2_journal_dirty_metadata+0x4e/0xd4
Aug  9 16:50:18 chime kernel: [  653.267011]  RSP <ffff880037b83cd8>
Aug  9 16:50:18 chime kernel: [  653.267079] ---[ end trace 6712718013b4aaff ]---

Expected results:
Normal operation.

Comment 1 Chuck Ebbert 2011-08-11 01:07:48 UTC
*** Bug 729327 has been marked as a duplicate of this bug. ***

Comment 2 Eric Sandeen 2011-08-11 04:34:36 UTC
I'm on it, but you can probably work around this by disabling selinux for now, if that's acceptable to you.  Will have patches in a bit, just need to discuss the fix a little first...

Thanks,
-Eric

Comment 5 Chuck Ebbert 2011-08-16 14:43:40 UTC
Patches went in 3.0.2.

Fedora kernel maintainers already get the kernel-maint email, no need to copy them separately.

Comment 6 Chuck Ebbert 2011-09-18 22:46:22 UTC
*** Bug 739357 has been marked as a duplicate of this bug. ***