Red Hat Bugzilla – Bug 680105
[ext4/xfstests] kernel BUG at fs/jbd2/transaction.c:1027!
Last modified: 2011-05-19 08:43:47 EDT
Description of problem: When running xfstests in beaker I'm seeing kernel panic on ppc64 platform with ext4 filesystem. I didn't manage to reproduce the problem manually outside the beaker but at least beaker was able to provide the calltrace (in additional info). This is most probably a regression. Version-Release number of selected component (if applicable): 2.6.32-117.el6.ppc64 How reproducible: Not sure, in beaker fairly regular. Steps to Reproduce: 1. Clone job J:56079 in beaker 2. Watch the results for ext4 Actual results: Kernel panic due to 'kernel BUG at fs/jbd2/transaction.c:1027!' Expected results: No panic. Additional info: Last test that beaker noticed was test no. 233 so the problem should arise from one of the tests 234-248 (most probably 234). Related beaker jobs/recipes: https://beaker.engineering.redhat.com/recipes/109908 https://beaker.engineering.redhat.com/recipes/112445 The calltraces are the same (but for different machines): ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:1027! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA pSeries Modules linked in: ext3 jbd ext2 sunrpc ipv6 dm_mirror dm_region_hash dm_log ibmveth sg ext4 jbd2 mbcache sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt dm_mod [last unloaded: scsi_wait_scan] NIP: d000000002260d00 LR: d0000000023d6dbc CTR: d000000002260c70 REGS: c0000000a973f3f0 TRAP: 0700 Not tainted (2.6.32-117.el6.ppc64) MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 24008482 XER: 20000000 TASK = c0000000a9b84da0[19290] 'setquota' THREAD: c0000000a973c000 CPU: 3 GPR00: 0000000000000001 c0000000a973f670 d00000000227de30 c0000000ad5960a0 GPR04: c000000013f80ae0 0000000000000000 c000000013f80ae0 0000000000000001 GPR08: c0000000afff1f00 c0000000a81df080 0000000000000000 0000000000000000 GPR12: d0000000023ecb80 c000000000fa2c80 0000000000000000 0000000000000000 GPR16: d0000000023efef0 c000000013f80ae0 c0000000ad89f4d0 0000000000000008 GPR20: 0000000000000018 c0000000a973f820 c00000004aee1180 c0000000ad89f418 GPR24: 0000000000000018 0000000000000004 0000000000000000 c000000074a70ac0 GPR28: c0000000ad5960a0 c0000000a8163b00 d000000002406688 c000000013f80ae0 NIP [d000000002260d00] .jbd2_journal_dirty_metadata+0x90/0x1c0 [jbd2] LR [d0000000023d6dbc] .__ext4_handle_dirty_metadata+0xac/0x170 [ext4] Call Trace: [c0000000a973f670] [c0000000a973f710] 0xc0000000a973f710 (unreliable) [c0000000a973f710] [d0000000023d6dbc] .__ext4_handle_dirty_metadata+0xac/0x170 [ext4] [c0000000a973f7b0] [d0000000023c556c] .ext4_quota_write+0x18c/0x300 [ext4] [c0000000a973f8c0] [c00000000023280c] .v2_write_file_info+0x13c/0x1a0 [c0000000a973f990] [c00000000022d4bc] .dquot_commit+0x22c/0x250 [c0000000a973fa30] [d0000000023ca8dc] .ext4_write_dquot+0x6c/0xc0 [ext4] [c0000000a973fac0] [c00000000022ff60] .dqput+0x100/0x390 [c0000000a973fb90] [c000000000231390] .vfs_set_dqblk+0x240/0x430 [c0000000a973fc40] [c0000000002358d0] .do_quotactl+0x450/0x6a0 [c0000000a973fd70] [c000000000235dbc] .SyS_quotactl+0x29c/0x4d0 [c0000000a973fe30] [c000000000008564] syscall_exit+0x0/0x40 Instruction dump: 796a57e3 40c200c8 801b0010 2f800000 409e002c 38000001 901b0010 e97c000a 380bffff 7c005b78 54000ffe 7c0007b4 <0b000000> 396bffff 917c0008 e81b0028 Kernel panic - not syncing: Fatal exception Call Trace: [c0000000a973efd0] [c000000000012e04] .show_stack+0x74/0x1c0 (unreliable) [c0000000a973f080] [c0000000005a335c] .panic+0x80/0x1b4 [c0000000a973f110] [c00000000002fbcc] .die+0x21c/0x2a0 [c0000000a973f1c0] [c000000000030000] ._exception+0x110/0x220 [c0000000a973f380] [c000000000004b9c] program_check_common+0x11c/0x180 --- Exception: 700 at .jbd2_journal_dirty_metadata+0x90/0x1c0 [jbd2] LR = .__ext4_handle_dirty_metadata+0xac/0x170 [ext4] [c0000000a973f670] [c0000000a973f710] 0xc0000000a973f710 (unreliable) [c0000000a973f710] [d0000000023d6dbc] .__ext4_handle_dirty_metadata+0xac/0x170 [ext4] [c0000000a973f7b0] [d0000000023c556c] .ext4_quota_write+0x18c/0x300 [ext4] [c0000000a973f8c0] [c00000000023280c] .v2_write_file_info+0x13c/0x1a0 [c0000000a973f990] [c00000000022d4bc] .dquot_commit+0x22c/0x250 [c0000000a973fa30] [d0000000023ca8dc] .ext4_write_dquot+0x6c/0xc0 [ext4] [c0000000a973fac0] [c00000000022ff60] .dqput+0x100/0x390 [c0000000a973fb90] [c000000000231390] .vfs_set_dqblk+0x240/0x430 [c0000000a973fc40] [c0000000002358d0] .do_quotactl+0x450/0x6a0 [c0000000a973fd70] [c000000000235dbc] .SyS_quotactl+0x29c/0x4d0 [c0000000a973fe30] [c000000000008564] syscall_exit+0x0/0x40
I've finally managed to reproduce the problem manually running 'while true;do ./check 234;done' for about an hour. Therefore test no. 234 causes the panic.
1020 if (jh->b_modified == 0) { 1021 /* 1022 * This buffer's got modified and becoming part 1023 * of the transaction. This needs to be done 1024 * once a transaction -bzzz 1025 */ 1026 jh->b_modified = 1; 1027 J_ASSERT_JH(jh, handle->h_buffer_credits > 0); 1028 handle->h_buffer_credits--; 1029 } test 234 does quota work... # FS QA Test No. 234 # # Stress setquota and setinfo handling. and: /* Number of remaining buffers we are allowed to dirty: */ int h_buffer_credits; sounds like perhaps we under-reserved for the quota metadata...
*** Bug 688817 has been marked as a duplicate of this bug. ***
I saw this on x86_64 and i386 too. Please see bug 688817. Change platform to ALL
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Patch(es) available on kernel-2.6.32-130.el6
Ran xfstests 234 in loop for more than 1 hour on -130 kernel, no issue found. Tested on x86_64 i386 and s390x. Set it to VERIFIED.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html