Bug 173280 - New icache prune export
Summary: New icache prune export
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Wendy Cheng
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 168424 171043
TreeView+ depends on / blocked
 
Reported: 2005-11-15 21:31 UTC by Wendy Cheng
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version: RHSA-2006-0144
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-03-15 16:51:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The patch itself. (1.99 KB, text/plain)
2005-11-15 21:31 UTC, Wendy Cheng
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0144 0 qe-ready SHIPPED_LIVE Moderate: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 7 2006-03-15 05:00:00 UTC

Description Wendy Cheng 2005-11-15 21:31:16 UTC
+++ This bug was initially created as a clone of Bug #171043 +++

Description of problem:
There have been repeated requests (from customers) to have a way to trim 
dentry/inode cache entries. In the case of bugzilla 171043, the customer does
GFS filesytem backups on a signle node that is part of a hugemem based cluster.
At the end of the backup, large amount (over 1M) of incore inodes hang around on
a relatively idle system. The issue is particular troublesome for GFS since each
inode is also associated with a global lock. With two layers of optimization
(between vfs and GFS), the end result is, after each backup, every file access
to the filesystem needs to come to one particular node to negociate the global
lock access. The customer sees file latency jumps and it is a headache for
(stock trading) financial applications.
                                                                             
A test patch was shipped to the customer that piggybacks the inode trimming
logic into one of GFS daemons and it has been working reasonably well (so far)
based on a run time GFS tunable. It requires kernel exports two functions:

1. shrink_dcache_sb: this has been offered by current kernel.
2. shrink_icache_sb: this is new from this patch. It is a wrapper function (and
export) to purge inode cache associated with one particular filesystem
(identified as incore super block). This kernel routine calls __prune_icache()
that is actually the old pruce_icache() call but would take one more extra
parameter (*sb - pointer to incore super block). The old prune_icache() call now
is also a wrapper call that sets *sb as NULL. The original prunce_icache() now
is __prune_icache() call but has two lines added to it. It skips the inode
processing if it is not associated with the passed in super block, *if* *sb is
set. In short:
                                                                              
shrink_icache_sb(new export) --> __prune_icache(count, sb)
prune_icache (the original kernel code) --> __prune_icache(count, NULL)
                                                                              
The __prune_icache(count, sb) is the original prune_icache() call but now has
two lines added into it:

              if ((sb) && (inode->i_sb != sb))
                      continue;

Within GFS inode daemon, we trim the inode based on a GFS tunable:

              /* get rid of shared lock for Cidatel issue */
              i_percent = sdp->sd_tune.gt_inoded_purge;
              if (i_percent) {
                      if (i_percent > 100) i_percent = 100;
                      a_count = atomic_read(&sdp->sd_inode_count);
                      i_count = a_count * i_percent / 100;
                      (void) shrink_dcache_sb(sdp->sd_vfs);
                      shrink_icache_sb(i_count, sdp->sd_vfs);
              } 

We havn't finalized GFS fix yet but would like to have the flexibility to use
these dentry/inode trimming functions if needed. Look back at several of the
EXT3 reports, customers have been asking for the very same function too. So
would expect exporting these functions is a good thing to do. 

Version-Release number of selected component (if applicable):
* Kernel 2.4.21-32.0.1.ELhugemem 
* EMC powerpath modules
* GFS-6.0.2.20-2

Comment 1 Wendy Cheng 2005-11-15 21:31:17 UTC
Created attachment 121095 [details]
The patch itself.

Comment 3 Ernie Petrides 2005-11-18 04:59:21 UTC
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.9.EL).


Comment 6 Red Hat Bugzilla 2006-03-15 16:51:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0144.html



Note You need to log in before you can comment on or make changes to this bug.