Description of problem: Support reports performance issues with a group of 15-nodes clusters - each consists of 14 ftp servers and 1 rsync server. Clusters are located in different geo locations that rsync to each other from time to time. The GFS mounts served as FTP server for IPTV application. After rsync, the system performance sinks - a "LIST" command could take 2 to 16 minutes. Each cluster is serving 30T of files (largest directory contains 11000x2G files) over FTP. Based on oprofile data, the system seems to suffer serveral issues: 1. A known GFS problem with large amount of files within one directory as described in: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=185618). 2. DLM daemons hog CPUs and loops around __find_lock_by_id: After rsync, the system seems to keep large amount of dlm locks that would result dlm daemons, particulary dlm_recvd, to loop around __find_lock_by_id(), consume (hog) CPU and memory. 3. Known linux rsync performance hits that would result large amount of inode and dentry cache entries hanging that subsequently causes memory fragmentation. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
A quick notes: 1) Both gfs and dlm lock search (and/or hash) implementation need to get re-examined since they could easily consume ~50% of CPU cycles in a healthy system: [root@engcluster1 tmp]# ./opreport_module /dlm -l samples % symbol name 200506 47.1299 search_hashchain 152213 35.7784 search_bucket 53249 12.5164 __find_lock_by_id 2934 0.6897 process_asts 1875 0.4407 dlm_hash 1411 0.3317 _release_rsb 2) We may need to manually purge dentry and inode cache as we did with RHEL3's inode_purge tunable. This will cut down lock counts and subsequently alleviate dlm workloads and reduce cache memory fragmentation.
for "df" performance - yes, it is a side effect. You could do shell> gfs_tool settune <mount_point> statfs_slots 128 to boost "df" performance if it is a concern. The default statfs_slots is 64. Make it bigger would help. See the following bz for details: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199984
djoo reported the test RPMs ran well and has been keeping the FTP "LIST" command latency within the expected bound (5 seconds). Will work with kernel folks to see what we want to do with the inode trimming patch.
Created attachment 146468 [details] CVS check-in patch We've decided to go for GFS-only solution and are still taking inputs to finalize the GFS patch (without base kernel changes). Uploaded is a (working) draft patch checked into CVS on Monday. Would like external folks to try it out and provides input. After the gfs.ko is loaded and filesystem mounted, issue the following command to kick-off the trimming logic: shell> gfs_tool settune <mount_point> glock_purge <percentage> (e.g. "gfs_tool settune /mnt/gfs1 glock_purge 50") This will tell GFS to trim roughly 50% of unused glocks every 5 seconds. The default is 0 percent (no trimming). The operation can be dynamically turned of by explicitly set the percentage to "0".
Created attachment 146474 [details] Glock trimming description This write-up documents the technical implementation of the above patch. One of the tunable mentioned in the document can be issued as: shell> gfs_tool settune <mount_point> demote_secs <seconds> (e.g. "gfs_tool settune /mnt/gfs1 demote_secs 200") This will demote gfs write locks into less restricted states and subsequently flush the cache data into disk. Shorter demote second(s) is used to avoid gfs accumulating too much cached data that results with burst mode flushing activities or prolong another nodes' lock access. It is default to 300 seconds. This command can be issued dynamically but has to be done after mount time.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0142.html