Bug 173280

Summary: New icache prune export
Product: Red Hat Enterprise Linux 3 Reporter: Wendy Cheng <nobody+wcheng>
Component: kernelAssignee: Wendy Cheng <nobody+wcheng>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: petrides
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0144 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-15 16:51:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 168424, 171043    
Attachments:
Description Flags
The patch itself. none

Description Wendy Cheng 2005-11-15 21:31:16 UTC
+++ This bug was initially created as a clone of Bug #171043 +++

Description of problem:
There have been repeated requests (from customers) to have a way to trim 
dentry/inode cache entries. In the case of bugzilla 171043, the customer does
GFS filesytem backups on a signle node that is part of a hugemem based cluster.
At the end of the backup, large amount (over 1M) of incore inodes hang around on
a relatively idle system. The issue is particular troublesome for GFS since each
inode is also associated with a global lock. With two layers of optimization
(between vfs and GFS), the end result is, after each backup, every file access
to the filesystem needs to come to one particular node to negociate the global
lock access. The customer sees file latency jumps and it is a headache for
(stock trading) financial applications.
                                                                             
A test patch was shipped to the customer that piggybacks the inode trimming
logic into one of GFS daemons and it has been working reasonably well (so far)
based on a run time GFS tunable. It requires kernel exports two functions:

1. shrink_dcache_sb: this has been offered by current kernel.
2. shrink_icache_sb: this is new from this patch. It is a wrapper function (and
export) to purge inode cache associated with one particular filesystem
(identified as incore super block). This kernel routine calls __prune_icache()
that is actually the old pruce_icache() call but would take one more extra
parameter (*sb - pointer to incore super block). The old prune_icache() call now
is also a wrapper call that sets *sb as NULL. The original prunce_icache() now
is __prune_icache() call but has two lines added to it. It skips the inode
processing if it is not associated with the passed in super block, *if* *sb is
set. In short:
                                                                              
shrink_icache_sb(new export) --> __prune_icache(count, sb)
prune_icache (the original kernel code) --> __prune_icache(count, NULL)
                                                                              
The __prune_icache(count, sb) is the original prune_icache() call but now has
two lines added into it:

              if ((sb) && (inode->i_sb != sb))
                      continue;

Within GFS inode daemon, we trim the inode based on a GFS tunable:

              /* get rid of shared lock for Cidatel issue */
              i_percent = sdp->sd_tune.gt_inoded_purge;
              if (i_percent) {
                      if (i_percent > 100) i_percent = 100;
                      a_count = atomic_read(&sdp->sd_inode_count);
                      i_count = a_count * i_percent / 100;
                      (void) shrink_dcache_sb(sdp->sd_vfs);
                      shrink_icache_sb(i_count, sdp->sd_vfs);
              } 

We havn't finalized GFS fix yet but would like to have the flexibility to use
these dentry/inode trimming functions if needed. Look back at several of the
EXT3 reports, customers have been asking for the very same function too. So
would expect exporting these functions is a good thing to do. 

Version-Release number of selected component (if applicable):
* Kernel 2.4.21-32.0.1.ELhugemem 
* EMC powerpath modules
* GFS-6.0.2.20-2

Comment 1 Wendy Cheng 2005-11-15 21:31:17 UTC
Created attachment 121095 [details]
The patch itself.

Comment 3 Ernie Petrides 2005-11-18 04:59:21 UTC
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.9.EL).


Comment 6 Red Hat Bugzilla 2006-03-15 16:51:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0144.html