Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 680105 - [ext4/xfstests] kernel BUG at fs/jbd2/transaction.c:1027!
[ext4/xfstests] kernel BUG at fs/jbd2/transaction.c:1027!
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.2
All Unspecified
unspecified Severity high
: rc
: ---
Assigned To: Lukáš Czerner
Eryu Guan
: Regression
: 688817 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-02-24 06:17 EST by Boris Ranto
Modified: 2011-05-19 08:43 EDT (History)
5 users (show)

See Also:
Fixed In Version: kernel-2.6.32-130.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-05-19 08:43:47 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0542 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update 2011-05-19 07:58:07 EDT

  None (edit)
Description Boris Ranto 2011-02-24 06:17:29 EST
Description of problem:
When running xfstests in beaker I'm seeing kernel panic on ppc64 platform with ext4 filesystem. I didn't manage to reproduce the problem manually outside the beaker but at least beaker was able to provide the calltrace (in additional info). This is most probably a regression.

Version-Release number of selected component (if applicable):
2.6.32-117.el6.ppc64

How reproducible:
Not sure, in beaker fairly regular.

Steps to Reproduce:
1. Clone job J:56079 in beaker
2. Watch the results for ext4
  
Actual results:
Kernel panic due to 'kernel BUG at fs/jbd2/transaction.c:1027!'

Expected results:
No panic.

Additional info:
Last test that beaker noticed was test no. 233 so the problem should arise from one of the tests 234-248 (most probably 234).
Related beaker jobs/recipes:
https://beaker.engineering.redhat.com/recipes/109908
https://beaker.engineering.redhat.com/recipes/112445

The calltraces are the same (but for different machines):
------------[ cut here ]------------ 
kernel BUG at fs/jbd2/transaction.c:1027! 
Oops: Exception in kernel mode, sig: 5 [#1] 
SMP NR_CPUS=1024 NUMA pSeries 
Modules linked in: ext3 jbd ext2 sunrpc ipv6 dm_mirror dm_region_hash dm_log ibmveth sg ext4 jbd2 mbcache sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt dm_mod [last unloaded: scsi_wait_scan] 
NIP: d000000002260d00 LR: d0000000023d6dbc CTR: d000000002260c70 
REGS: c0000000a973f3f0 TRAP: 0700   Not tainted  (2.6.32-117.el6.ppc64) 
MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 24008482  XER: 20000000 
TASK = c0000000a9b84da0[19290] 'setquota' THREAD: c0000000a973c000 CPU: 3 
GPR00: 0000000000000001 c0000000a973f670 d00000000227de30 c0000000ad5960a0  
GPR04: c000000013f80ae0 0000000000000000 c000000013f80ae0 0000000000000001  
GPR08: c0000000afff1f00 c0000000a81df080 0000000000000000 0000000000000000  
GPR12: d0000000023ecb80 c000000000fa2c80 0000000000000000 0000000000000000  
GPR16: d0000000023efef0 c000000013f80ae0 c0000000ad89f4d0 0000000000000008  
GPR20: 0000000000000018 c0000000a973f820 c00000004aee1180 c0000000ad89f418  
GPR24: 0000000000000018 0000000000000004 0000000000000000 c000000074a70ac0  
GPR28: c0000000ad5960a0 c0000000a8163b00 d000000002406688 c000000013f80ae0  
NIP [d000000002260d00] .jbd2_journal_dirty_metadata+0x90/0x1c0 [jbd2] 
LR [d0000000023d6dbc] .__ext4_handle_dirty_metadata+0xac/0x170 [ext4] 
Call Trace: 
[c0000000a973f670] [c0000000a973f710] 0xc0000000a973f710 (unreliable) 
[c0000000a973f710] [d0000000023d6dbc] .__ext4_handle_dirty_metadata+0xac/0x170 [ext4] 
[c0000000a973f7b0] [d0000000023c556c] .ext4_quota_write+0x18c/0x300 [ext4] 
[c0000000a973f8c0] [c00000000023280c] .v2_write_file_info+0x13c/0x1a0 
[c0000000a973f990] [c00000000022d4bc] .dquot_commit+0x22c/0x250 
[c0000000a973fa30] [d0000000023ca8dc] .ext4_write_dquot+0x6c/0xc0 [ext4] 
[c0000000a973fac0] [c00000000022ff60] .dqput+0x100/0x390 
[c0000000a973fb90] [c000000000231390] .vfs_set_dqblk+0x240/0x430 
[c0000000a973fc40] [c0000000002358d0] .do_quotactl+0x450/0x6a0 
[c0000000a973fd70] [c000000000235dbc] .SyS_quotactl+0x29c/0x4d0 
[c0000000a973fe30] [c000000000008564] syscall_exit+0x0/0x40 
Instruction dump: 
796a57e3 40c200c8 801b0010 2f800000 409e002c 38000001 901b0010 e97c000a  
380bffff 7c005b78 54000ffe 7c0007b4 <0b000000> 396bffff 917c0008 e81b0028  
Kernel panic - not syncing: Fatal exception 
Call Trace: 
[c0000000a973efd0] [c000000000012e04] .show_stack+0x74/0x1c0 (unreliable) 
[c0000000a973f080] [c0000000005a335c] .panic+0x80/0x1b4 
[c0000000a973f110] [c00000000002fbcc] .die+0x21c/0x2a0 
[c0000000a973f1c0] [c000000000030000] ._exception+0x110/0x220 
[c0000000a973f380] [c000000000004b9c] program_check_common+0x11c/0x180 
--- Exception: 700 at .jbd2_journal_dirty_metadata+0x90/0x1c0 [jbd2] 
    LR = .__ext4_handle_dirty_metadata+0xac/0x170 [ext4] 
[c0000000a973f670] [c0000000a973f710] 0xc0000000a973f710 (unreliable) 
[c0000000a973f710] [d0000000023d6dbc] .__ext4_handle_dirty_metadata+0xac/0x170 [ext4] 
[c0000000a973f7b0] [d0000000023c556c] .ext4_quota_write+0x18c/0x300 [ext4] 
[c0000000a973f8c0] [c00000000023280c] .v2_write_file_info+0x13c/0x1a0 
[c0000000a973f990] [c00000000022d4bc] .dquot_commit+0x22c/0x250 
[c0000000a973fa30] [d0000000023ca8dc] .ext4_write_dquot+0x6c/0xc0 [ext4] 
[c0000000a973fac0] [c00000000022ff60] .dqput+0x100/0x390 
[c0000000a973fb90] [c000000000231390] .vfs_set_dqblk+0x240/0x430 
[c0000000a973fc40] [c0000000002358d0] .do_quotactl+0x450/0x6a0 
[c0000000a973fd70] [c000000000235dbc] .SyS_quotactl+0x29c/0x4d0 
[c0000000a973fe30] [c000000000008564] syscall_exit+0x0/0x40
Comment 3 Boris Ranto 2011-02-24 06:47:38 EST
I've finally managed to reproduce the problem manually running 'while true;do ./check 234;done' for about an hour. Therefore test no. 234 causes the panic.
Comment 4 Eric Sandeen 2011-02-24 10:12:32 EST
1020         if (jh->b_modified == 0) {
1021                 /*
1022                  * This buffer's got modified and becoming part
1023                  * of the transaction. This needs to be done
1024                  * once a transaction -bzzz
1025                  */
1026                 jh->b_modified = 1;
1027                 J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
1028                 handle->h_buffer_credits--;
1029         }

test 234 does quota work...

# FS QA Test No. 234
#
# Stress setquota and setinfo handling.


and:

        /* Number of remaining buffers we are allowed to dirty: */
        int                     h_buffer_credits;

sounds like perhaps we under-reserved for the quota metadata...
Comment 5 Eryu Guan 2011-03-18 02:25:36 EDT
*** Bug 688817 has been marked as a duplicate of this bug. ***
Comment 6 Eryu Guan 2011-03-18 02:29:06 EDT
I saw this on x86_64 and i386 too. Please see bug 688817. Change platform to ALL
Comment 9 RHEL Product and Program Management 2011-03-29 05:59:40 EDT
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.
Comment 12 Aristeu Rozanski 2011-04-07 09:52:51 EDT
Patch(es) available on kernel-2.6.32-130.el6
Comment 15 Eryu Guan 2011-04-11 03:29:43 EDT
Ran xfstests 234 in loop for more than 1 hour on -130 kernel, no issue found. 
Tested on x86_64 i386 and s390x.

Set it to VERIFIED.
Comment 16 errata-xmlrpc 2011-05-19 08:43:47 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Note You need to log in before you can comment on or make changes to this bug.