+++ This bug was initially created as a clone of Bug #171043 +++ Description of problem: There have been repeated requests (from customers) to have a way to trim dentry/inode cache entries. In the case of bugzilla 171043, the customer does GFS filesytem backups on a signle node that is part of a hugemem based cluster. At the end of the backup, large amount (over 1M) of incore inodes hang around on a relatively idle system. The issue is particular troublesome for GFS since each inode is also associated with a global lock. With two layers of optimization (between vfs and GFS), the end result is, after each backup, every file access to the filesystem needs to come to one particular node to negociate the global lock access. The customer sees file latency jumps and it is a headache for (stock trading) financial applications. A test patch was shipped to the customer that piggybacks the inode trimming logic into one of GFS daemons and it has been working reasonably well (so far) based on a run time GFS tunable. It requires kernel exports two functions: 1. shrink_dcache_sb: this has been offered by current kernel. 2. shrink_icache_sb: this is new from this patch. It is a wrapper function (and export) to purge inode cache associated with one particular filesystem (identified as incore super block). This kernel routine calls __prune_icache() that is actually the old pruce_icache() call but would take one more extra parameter (*sb - pointer to incore super block). The old prune_icache() call now is also a wrapper call that sets *sb as NULL. The original prunce_icache() now is __prune_icache() call but has two lines added to it. It skips the inode processing if it is not associated with the passed in super block, *if* *sb is set. In short: shrink_icache_sb(new export) --> __prune_icache(count, sb) prune_icache (the original kernel code) --> __prune_icache(count, NULL) The __prune_icache(count, sb) is the original prune_icache() call but now has two lines added into it: if ((sb) && (inode->i_sb != sb)) continue; Within GFS inode daemon, we trim the inode based on a GFS tunable: /* get rid of shared lock for Cidatel issue */ i_percent = sdp->sd_tune.gt_inoded_purge; if (i_percent) { if (i_percent > 100) i_percent = 100; a_count = atomic_read(&sdp->sd_inode_count); i_count = a_count * i_percent / 100; (void) shrink_dcache_sb(sdp->sd_vfs); shrink_icache_sb(i_count, sdp->sd_vfs); } We havn't finalized GFS fix yet but would like to have the flexibility to use these dentry/inode trimming functions if needed. Look back at several of the EXT3 reports, customers have been asking for the very same function too. So would expect exporting these functions is a good thing to do. Version-Release number of selected component (if applicable): * Kernel 2.4.21-32.0.1.ELhugemem * EMC powerpath modules * GFS-6.0.2.20-2
Created attachment 121095 [details] The patch itself.
A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.9.EL).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html