Login
[x]
Log in using an account from:
Fedora Account System
Red Hat Associate
Red Hat Customer
Or login using a Red Hat Bugzilla account
Forgot Password
Login:
Hide Forgot
Create an Account
Red Hat Bugzilla – Attachment 146474 Details for
Bug 214239
GFS performance issue - trimming glock
[?]
New
Simple Search
Advanced Search
My Links
Browse
Requests
Reports
Current State
Search
Tabular reports
Graphical reports
Duplicates
Other Reports
User Changes
Plotly Reports
Bug Status
Bug Severity
Non-Defaults
|
Product Dashboard
Help
Page Help!
Bug Writing Guidelines
What's new
Browser Support Policy
5.0.4.rh83 Release notes
FAQ
Guides index
User guide
Web Services
Contact
Legal
This site requires JavaScript to be enabled to function correctly, please enable it.
Glock trimming description
glock_trimming_desc.txt (text/plain), 7.36 KB, created by
Wendy Cheng
on 2007-01-24 23:14:44 UTC
(
hide
)
Description:
Glock trimming description
Filename:
MIME Type:
Creator:
Wendy Cheng
Created:
2007-01-24 23:14:44 UTC
Size:
7.36 KB
patch
obsolete
>Subject: Glock trimming patch > >The original base kernel approach walks thru VFS layer inode_unused link >list to find the idle GFS inode (and subsequently removes it). On a GFS >server node with one single mount, the VFS inode_unused list is most >likely full of GFS inodes. So we will probably scan1/4 of total gfs >inodes to acheive one around of trimming purpose. This is a very rough >(and inaccurate) estimation based on the assumption that linux inodes >are equally distributed across 4 vfs linked lists during their life >cycle - inode_in_use, inode_unused, s_io, and s_dirty. > >With GFS-only approach, we scan per-mount glock hash table that has all >of the glocks pointers. Since glock count is normally, at least, twice >of the vfs inode count, we could (again, rough and inaccurate >estimation) potentially scan 8 times more entries than base kernel >approach (2:1/4). However, as the gfs mount points increase, GFS-only >approach will start to show its advantage. > >The following are some glory details if you care to read. > >The Original Base Kernel Patch >======================== >Other than relying on VM flush daemons and/or application specific APIs >or commands, GFS also flushes its data into storage during glock state >transitions - that is, whenever an inode glock is moved from an >exclusive state (write) into a less restricted state (e.g. shared >state), the memory cached write data is synced into the disk based on a >set of criteria. As the disk write operation is generally expensive, >there are few policies implemented to retain the glocks in its current >state as much as possible. > >As reported via bugzilla 214239 (and several others), we've found GFS >needs to fine-tune it current retain policy to meet the latency >sensitive application requirement. Two particular issues we've found via >the profiling data (collected from several customers' run time >environment) are: > >* Glocks stay in "exclusive" state for too long that end up with burst >mode flushing activities (and other memory/io issues) that could >subsequently push file access time out of bound for latency sensitive >applications. >* System could easily spend half of it CPU cycles in lock hash search >calls due to large amount of glocks accumulation > (ref: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=214239#c1). > >We have been passing few VM tuning tips, together with a shorter >(tunable) demote_secs, to customers and find they do relieve problem#1 >symptoms greatly. Note that the "demote_secs" is the time interval used >by the existing glock scan daemon to move locks into less restricted >states if unused. This implies on an idle system, all locks will be >moved into "unlocked" state eventually. Unfortunately, >"unlocked" does not imply these glocks will be removed from the system. >Actually they'll stay there forever until: > >1. the inode (file) is explicitly deleted (on-disk deletion), or >2. VM issues prune_icache() call due to memory pressure, or >3. Umount command kicks in, or >4. Lock manager issues an LM_CB_DROPLOCKS callback. > >When problem#2 first popped up in RHEL3 time frame, we naturally went >thru the above 4 routes to look for solution. I forgot under what >conditions Lock Manager could issue the DROPLOCK callback. However, in >reality, (3) and (4) share "one" very same exported VFS layer call to do >its core job - that is "invalidate_inodes()". This vfs call walks thru 4 >(global vfs) inodes lists to find the entry that belongs to this >particular filesystem. For each entry found, it is removed. The >operation, interestingly, overlaps with (2) (VM prune_icache call). The >difference is that prune_icache() scans only one list (inode_unused) and >selectively purges inode, instead of all of them. > >As the on-memory inodes are purged, the GFS logic embeded in the inode >deallocation code will remove the corresponding glocks accordingly. It >is then the glock could disappear. > >So here came the original base kernel patch. As a latency issue, we >didn't want to disturb the painstaking efforts of retaining the glocks >done by GFS's original author(s). We ended up with exporting the >modified prune_icache() that allowed it to function like >invalidate_inodes() logic *if asked*. It walked thru inode_unused list >to find the matching mount point. It purges a fixed percentage of inodes >from that list if the entry belongs to the subject mount point. In >short, we created a new call that had the logic needed for glock >trimmming purpose without massive cut-and-pasting the code segment from >the existing prune_icache base kernel call. > >GFS-only Patch >============ >GFS already has a glock scan daemon waking up on a tunable interval to >do glock demote work. It scans the glock hash table to examine the entry >one by one. If the reference count and several criteria meet the >requirement, it demotes the lock into a less restricted state. For >removable glocks, they are transferred into a reclaim list and another >daemon (reclaimd) will eventually purge them from the system. One of the >criteria to identify a removable glock is by its "zero" inode reference >count. Unfortunately, as long as glock is tied to the vfs inode, the >reference count never goes down unless the vfs inode is purged (and it >never does unless the vm thinks it is under memory pressure). > >For lock trimming purpose, it took several tries to get the gfs-only >patch works. The following is the logic that seems to work at this moment: > >Each vfs inode is tied to a pair of glocks - iopen glock (LM_TYPE_IOPEN) >and inode glock (LM_TYPE_INODE). The inode glock normally has frequent >state transitions, depending how and when the file is accessed (read, >write, delete, etc) but the iopen glock is mostly on SHARED state during >its life cycle until either: > >1. The GFS inode is removed (gfs_inode_destroy), or >2. Some logic (that doesn't exist before this patch) kicks off >gfs_iopen_go_callback() to explicitly change its state (presumely by >Lock Manager). > >Since these two glocks have been the major contributors to the glock >accumulation issues, they are our targeted glocks to get trimmed. >Without disturbing the existing GFS code, we piggy-back the logic into >gfs_scand daemon that wakes up every 5-second interval to scan the glock >hash table. If an iopen glock is found, we follow the pointer to obtain >the inode glock state. If it is in unlocked state, we demotes the iopen >glock (from shared into unlocked). This triggers gfs_try_toss_vnode() >logic to prune the associated dentries and subsequently delete the vfs >inode. It then follows the very same purging logic as base kernel >approach. If inode glock is found first (I haven't implemented this >yet), we check it lock state. If unlocked, we follow the pointer to find >its iopen lock, then subsequently demote it. It will then trigger >gfs_try_toss_vnode() logic that generates the same sequence of clean-up >events as described above. > >Few to-do items: >1. Current CVS check-in only looks for iopen glock. We should add >inode-glock as described above to shorten the search process. >2. Have another version of the patch that trims the lock if it is in >idle (unlocked) state longer than a tunable timeout value. The CVS >check-in is based on a tunable percentage count. The trimming action >stops when either the max count reached or we reach the end of the table. >3. Now glocks are trimmed (and gfs lock dump shows the correct result) - >I'm not sure how DLM side makes these locks disappears from ts hash >table (?). > >=== End of write-up >
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Raw
Actions:
View
Attachments on
bug 214239
:
146468
| 146474