Bug 680105

Summary: [ext4/xfstests] kernel BUG at fs/jbd2/transaction.c:1027!
Product: Red Hat Enterprise Linux 6 Reporter: Boris Ranto <branto>
Component: kernelAssignee: Lukáš Czerner <lczerner>
Status: CLOSED ERRATA QA Contact: Eryu Guan <eguan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.2CC: eguan, esandeen, lczerner, rwheeler, syeghiay
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-2.6.32-130.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 12:43:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Boris Ranto 2011-02-24 11:17:29 UTC
Description of problem:
When running xfstests in beaker I'm seeing kernel panic on ppc64 platform with ext4 filesystem. I didn't manage to reproduce the problem manually outside the beaker but at least beaker was able to provide the calltrace (in additional info). This is most probably a regression.

Version-Release number of selected component (if applicable):
2.6.32-117.el6.ppc64

How reproducible:
Not sure, in beaker fairly regular.

Steps to Reproduce:
1. Clone job J:56079 in beaker
2. Watch the results for ext4
  
Actual results:
Kernel panic due to 'kernel BUG at fs/jbd2/transaction.c:1027!'

Expected results:
No panic.

Additional info:
Last test that beaker noticed was test no. 233 so the problem should arise from one of the tests 234-248 (most probably 234).
Related beaker jobs/recipes:
https://beaker.engineering.redhat.com/recipes/109908
https://beaker.engineering.redhat.com/recipes/112445

The calltraces are the same (but for different machines):
------------[ cut here ]------------ 
kernel BUG at fs/jbd2/transaction.c:1027! 
Oops: Exception in kernel mode, sig: 5 [#1] 
SMP NR_CPUS=1024 NUMA pSeries 
Modules linked in: ext3 jbd ext2 sunrpc ipv6 dm_mirror dm_region_hash dm_log ibmveth sg ext4 jbd2 mbcache sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt dm_mod [last unloaded: scsi_wait_scan] 
NIP: d000000002260d00 LR: d0000000023d6dbc CTR: d000000002260c70 
REGS: c0000000a973f3f0 TRAP: 0700   Not tainted  (2.6.32-117.el6.ppc64) 
MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 24008482  XER: 20000000 
TASK = c0000000a9b84da0[19290] 'setquota' THREAD: c0000000a973c000 CPU: 3 
GPR00: 0000000000000001 c0000000a973f670 d00000000227de30 c0000000ad5960a0  
GPR04: c000000013f80ae0 0000000000000000 c000000013f80ae0 0000000000000001  
GPR08: c0000000afff1f00 c0000000a81df080 0000000000000000 0000000000000000  
GPR12: d0000000023ecb80 c000000000fa2c80 0000000000000000 0000000000000000  
GPR16: d0000000023efef0 c000000013f80ae0 c0000000ad89f4d0 0000000000000008  
GPR20: 0000000000000018 c0000000a973f820 c00000004aee1180 c0000000ad89f418  
GPR24: 0000000000000018 0000000000000004 0000000000000000 c000000074a70ac0  
GPR28: c0000000ad5960a0 c0000000a8163b00 d000000002406688 c000000013f80ae0  
NIP [d000000002260d00] .jbd2_journal_dirty_metadata+0x90/0x1c0 [jbd2] 
LR [d0000000023d6dbc] .__ext4_handle_dirty_metadata+0xac/0x170 [ext4] 
Call Trace: 
[c0000000a973f670] [c0000000a973f710] 0xc0000000a973f710 (unreliable) 
[c0000000a973f710] [d0000000023d6dbc] .__ext4_handle_dirty_metadata+0xac/0x170 [ext4] 
[c0000000a973f7b0] [d0000000023c556c] .ext4_quota_write+0x18c/0x300 [ext4] 
[c0000000a973f8c0] [c00000000023280c] .v2_write_file_info+0x13c/0x1a0 
[c0000000a973f990] [c00000000022d4bc] .dquot_commit+0x22c/0x250 
[c0000000a973fa30] [d0000000023ca8dc] .ext4_write_dquot+0x6c/0xc0 [ext4] 
[c0000000a973fac0] [c00000000022ff60] .dqput+0x100/0x390 
[c0000000a973fb90] [c000000000231390] .vfs_set_dqblk+0x240/0x430 
[c0000000a973fc40] [c0000000002358d0] .do_quotactl+0x450/0x6a0 
[c0000000a973fd70] [c000000000235dbc] .SyS_quotactl+0x29c/0x4d0 
[c0000000a973fe30] [c000000000008564] syscall_exit+0x0/0x40 
Instruction dump: 
796a57e3 40c200c8 801b0010 2f800000 409e002c 38000001 901b0010 e97c000a  
380bffff 7c005b78 54000ffe 7c0007b4 <0b000000> 396bffff 917c0008 e81b0028  
Kernel panic - not syncing: Fatal exception 
Call Trace: 
[c0000000a973efd0] [c000000000012e04] .show_stack+0x74/0x1c0 (unreliable) 
[c0000000a973f080] [c0000000005a335c] .panic+0x80/0x1b4 
[c0000000a973f110] [c00000000002fbcc] .die+0x21c/0x2a0 
[c0000000a973f1c0] [c000000000030000] ._exception+0x110/0x220 
[c0000000a973f380] [c000000000004b9c] program_check_common+0x11c/0x180 
--- Exception: 700 at .jbd2_journal_dirty_metadata+0x90/0x1c0 [jbd2] 
    LR = .__ext4_handle_dirty_metadata+0xac/0x170 [ext4] 
[c0000000a973f670] [c0000000a973f710] 0xc0000000a973f710 (unreliable) 
[c0000000a973f710] [d0000000023d6dbc] .__ext4_handle_dirty_metadata+0xac/0x170 [ext4] 
[c0000000a973f7b0] [d0000000023c556c] .ext4_quota_write+0x18c/0x300 [ext4] 
[c0000000a973f8c0] [c00000000023280c] .v2_write_file_info+0x13c/0x1a0 
[c0000000a973f990] [c00000000022d4bc] .dquot_commit+0x22c/0x250 
[c0000000a973fa30] [d0000000023ca8dc] .ext4_write_dquot+0x6c/0xc0 [ext4] 
[c0000000a973fac0] [c00000000022ff60] .dqput+0x100/0x390 
[c0000000a973fb90] [c000000000231390] .vfs_set_dqblk+0x240/0x430 
[c0000000a973fc40] [c0000000002358d0] .do_quotactl+0x450/0x6a0 
[c0000000a973fd70] [c000000000235dbc] .SyS_quotactl+0x29c/0x4d0 
[c0000000a973fe30] [c000000000008564] syscall_exit+0x0/0x40

Comment 3 Boris Ranto 2011-02-24 11:47:38 UTC
I've finally managed to reproduce the problem manually running 'while true;do ./check 234;done' for about an hour. Therefore test no. 234 causes the panic.

Comment 4 Eric Sandeen 2011-02-24 15:12:32 UTC
1020         if (jh->b_modified == 0) {
1021                 /*
1022                  * This buffer's got modified and becoming part
1023                  * of the transaction. This needs to be done
1024                  * once a transaction -bzzz
1025                  */
1026                 jh->b_modified = 1;
1027                 J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
1028                 handle->h_buffer_credits--;
1029         }

test 234 does quota work...

# FS QA Test No. 234
#
# Stress setquota and setinfo handling.


and:

        /* Number of remaining buffers we are allowed to dirty: */
        int                     h_buffer_credits;

sounds like perhaps we under-reserved for the quota metadata...

Comment 5 Eryu Guan 2011-03-18 06:25:36 UTC
*** Bug 688817 has been marked as a duplicate of this bug. ***

Comment 6 Eryu Guan 2011-03-18 06:29:06 UTC
I saw this on x86_64 and i386 too. Please see bug 688817. Change platform to ALL

Comment 9 RHEL Program Management 2011-03-29 09:59:40 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 12 Aristeu Rozanski 2011-04-07 13:52:51 UTC
Patch(es) available on kernel-2.6.32-130.el6

Comment 15 Eryu Guan 2011-04-11 07:29:43 UTC
Ran xfstests 234 in loop for more than 1 hour on -130 kernel, no issue found. 
Tested on x86_64 i386 and s390x.

Set it to VERIFIED.

Comment 16 errata-xmlrpc 2011-05-19 12:43:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html