Bug 85379 - Panic caused by bug in ext3 transaction handling.
Panic caused by bug in ext3 transaction handling.
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel (Show other bugs)
2.1
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Larry Woodman
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-02-28 14:43 EST by Heather Conway
Modified: 2007-11-30 17:06 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-19 15:25:18 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Heather Conway 2003-02-28 14:43:58 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.2-2smp i686)

Description of problem:
The panic is due to a failed J_ASSERT assertion at line 227 of
fs/jbd/transaction.c in function journal_start().
	J_ASSERT(handle->h_transaction->t_journal == journal);
The jist of the problem is that while a new ext3 inode was being created
(transaction #1, inode #1), an attempt to allocate dynamic kernel heap memory
(for the new in-memory inode) from the inode kmem cache initiated the attempted
expansion of this cache by slab pages.  the expansion attempt finding no free
pages, initiated memory rea/shrink/prune actions which cause some other ext3
inode to be deleted as a result of trying to prune the dcache.  The attempt to
delete the second inode initiated the creation of a second transaction before
the first one had completed.  It is suspected that ext3 is not designed to
handle embedded transactions within the same process.
Since this is all occurring within the context of a single "cp" process, the
J_ASSERT fails when it tries to assert equivalence between the address of the
data structures for the first and second transactions.  
A possible fix is to change the call to kmem_chcahe_alloc() in the inode_alloc()
macro called from get_empty_inode() in fs/inode.c to use GFP flags of GFP_NOFS
instead of GFP_KERNEL.  The lack of the __GFP_FS flag in the GFP flags parameter
to shrink_dcache_memory() will force shrink_dcache_memory() to return without
trying to prune the dcache and possibly free (and delete) inodes.
The kernel call stack is below:
journal_start
start_transaction
ext3_delete_inode
iput_free
dentry_iput
dput
prune_dcache
shrink_dcache_memory
do_try_to_free_pages
_wrapped_alloc_pages
__alloc_pages
_alloc_pages
__get_free_pages
kmem_cache_grow
kemem_cache_alloc
get_empty_inode
ext3_new_inode
ext3_create
dentry_open

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:

    Run heavy I/O against ext3 filesystems.
T

Additional info:
Comment 1 Arjan van de Ven 2003-02-28 14:48:42 EST
what modules are in use?
Comment 2 Stephen Tweedie 2003-02-28 15:50:33 EST
Which kernel version exactly was this, and can you please post the full OOPS?
Comment 3 Heather Conway 2003-02-28 17:04:13 EST
The kernel being used is 2.4.9-e.12smp with the debugger enabled.  (Ed, Jimmy, 
please correct me if that is incorrect.)

kdb lsmod listing of modules loaded at time of panic are as follows:
----------------------------------------------------
iscsi
autofs
eepro100
appletalk
ipx
ipchains
emcppn	(EMC PowerPath module)
emcpmpc	(EMC PowerPath module)
emcpmp	(EMC PowerPath module)
sg
emcp		(EMC PowerPath module)
usb-ohci
usbcore
ext3
jbd
qla2300
aic7xxx
sd_mod
scsi_mod
Comment 4 Heather Conway 2003-02-28 17:05:59 EST
Ed is actually doing the debugging on this system and has been in kdb.  Is 
there a way to grab the oops data from kdb?  If so, what is it?

Also, Ed just noticed that the i_dev field of the 2nd inode (the one being 
deleted at the top of the call stack) has an i_dev value which indicates it is 
for a file object on one of our (EMC PowerPath) managed disks.  This is the 
first indication that he has seen that PowerPath may be involved in the problem.
Comment 6 RHEL Product and Program Management 2007-10-19 15:25:18 EDT
This bug is filed against RHEL2.1, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products.  Since
this bug does not meet that criteria, it is now being closed.

For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/

If you feel this bug is indeed mission critical, please contact your
support representative.  You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.