From Bugzilla Helper: User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.2-2smp i686) Description of problem: The panic is due to a failed J_ASSERT assertion at line 227 of fs/jbd/transaction.c in function journal_start(). J_ASSERT(handle->h_transaction->t_journal == journal); The jist of the problem is that while a new ext3 inode was being created (transaction #1, inode #1), an attempt to allocate dynamic kernel heap memory (for the new in-memory inode) from the inode kmem cache initiated the attempted expansion of this cache by slab pages. the expansion attempt finding no free pages, initiated memory rea/shrink/prune actions which cause some other ext3 inode to be deleted as a result of trying to prune the dcache. The attempt to delete the second inode initiated the creation of a second transaction before the first one had completed. It is suspected that ext3 is not designed to handle embedded transactions within the same process. Since this is all occurring within the context of a single "cp" process, the J_ASSERT fails when it tries to assert equivalence between the address of the data structures for the first and second transactions. A possible fix is to change the call to kmem_chcahe_alloc() in the inode_alloc() macro called from get_empty_inode() in fs/inode.c to use GFP flags of GFP_NOFS instead of GFP_KERNEL. The lack of the __GFP_FS flag in the GFP flags parameter to shrink_dcache_memory() will force shrink_dcache_memory() to return without trying to prune the dcache and possibly free (and delete) inodes. The kernel call stack is below: journal_start start_transaction ext3_delete_inode iput_free dentry_iput dput prune_dcache shrink_dcache_memory do_try_to_free_pages _wrapped_alloc_pages __alloc_pages _alloc_pages __get_free_pages kmem_cache_grow kemem_cache_alloc get_empty_inode ext3_new_inode ext3_create dentry_open Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: Run heavy I/O against ext3 filesystems. T Additional info:
what modules are in use?
Which kernel version exactly was this, and can you please post the full OOPS?
The kernel being used is 2.4.9-e.12smp with the debugger enabled. (Ed, Jimmy, please correct me if that is incorrect.) kdb lsmod listing of modules loaded at time of panic are as follows: ---------------------------------------------------- iscsi autofs eepro100 appletalk ipx ipchains emcppn (EMC PowerPath module) emcpmpc (EMC PowerPath module) emcpmp (EMC PowerPath module) sg emcp (EMC PowerPath module) usb-ohci usbcore ext3 jbd qla2300 aic7xxx sd_mod scsi_mod
Ed is actually doing the debugging on this system and has been in kdb. Is there a way to grab the oops data from kdb? If so, what is it? Also, Ed just noticed that the i_dev field of the 2nd inode (the one being deleted at the top of the call stack) has an i_dev value which indicates it is for a file object on one of our (EMC PowerPath) managed disks. This is the first indication that he has seen that PowerPath may be involved in the problem.
This bug is filed against RHEL2.1, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.