Bug 185618
Summary: | gfs 6.1 performance issue (directory lock contention?) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Issue Tracker <tao> | ||||||
Component: | gfs | Assignee: | Wendy Cheng <nobody+wcheng> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | GFS Bugs <gfs-bugs> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4 | CC: | djoo, mkearey, rkenna, rohara, tao | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-09-18 16:02:37 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Issue Tracker
2006-03-16 13:31:10 UTC
Going ahead and escalating this to engineering, though I'm not sure what can really be done with such a pessimal GFS case... They apparently have this benchmark that operates on a directory with around 10000 files. When they run this benchmark and then do an 'ls' in that directory, they get long delays before the ls returns. The problem seems to be slow getdents64() calls, and my guess is that the problem is ultimately contention for the directory lock. I've been able to reproduce something like what they are seeing on a 2 node GFS cluster. One one machine I run the following shell loop in a directory on a GFS filesystem: # while true; do for ((i=1;i<=10000;i++)); do rm -f file_$i;touch file_$i; done; done and then on the other machine do an ls in that directory. timing it gives numbers roughly like these: real 0m32.235s user 0m0.307s sys 0m1.656s Again, not sure what we can do to tune for this, other than telling them "don't do that", and recommending that they architect things to reduce contention for the directory lock (more directories and spread the files into multiple dirs). They also complained that removing files from the directory during this test also takes a very long time, but my guess is that that is due to the same problem (directory lock contention), so anything we do to help the first issue, will probably help the second. We will take a look at it, but this sounds very much like the postmark benchmark performance issue. Doing stats of the filesystem requires accessing every resource group in the filesystem to collect the information. This can take a long time. This is a design issue with GFS that we are addressing in GFS2. If this is for a mail server solution, a hierarchical directory structure to avoid directory contention and mounting with noatime are two options that have been implemented by other customers with some success. Moving off the fix list for U4. This one may not be addressable in RHEL4 version of gfs. Need to open another bugzilla to address the particular customer issue as described in comment #12. The following is a short description of *this* bugzilla that hopefully can help people understand the issue better: The problem here is *much* more than lock contention - it hits several design and architecture limitations. We have been hoping GFS2 could address it. For GFS1, it is better to educate people about the ramifications so they can find the proper workaround for their particular setup. Though I'm trying to do something about this, the work, however, is not a short term project. It is also important to point out that the problems do not exist in GFS alone. These are the issues that may well challenge other cluster filesystems with different degrees of severity and/or symptoms. Users (and support engineers) must understand GFS is a journal filesystem. That is, we have to ensure filesystem consistency without fsck if all possible. This implies each meta-data change (transaction) is logged into journal (file) and there are rules about the "sequence" of these changes. Say, for example, creating two files on the same (SMP) node at the same time. Since their meta-data could be written to journal inter-leaved, sync one file into the disk normally requires sync other files as well. The performance hit would increase if you have many files that have meta-data interleaved with each other on the journal. At the same time, GFS is also a cluster filessytem. That is, it requires to guarantee cache coherency between different nodes. When an "ls" command (which is a "read" that requires a shared lock) is issued, if previous lock holder did some forms of "write" (such as exclusive lock), the data needs to get flushed into the disk before the shared lock is granted. This is to ensure other node can have the consistent data view across the cluster. Now, if you have heavy write activities with *lots of* files within one single directory across the cluster, when an "ls" is issued, All the files are required to get "sync"ed into the disk before the directory shared lock can be granted. Think about the performance hits generated - what else could heavily impact a computer system performance other than these type of operations ? Lock contention, disk IO, VM memory requirements, together with GFS and DLM own overheads. As we have been repeated said in the past, separate the directories and cut down the file count within the directory if all possible. Structure your application and configurations to allow "parallel" processing. GFS allows concurrent accesses to the filesystem across cluster - however, it doesn't imply users can use it randomly without knowing "parallel" principles. I do understand people's need for "load-balancing". However, you must understand the overheads and internals before load-balancing your workload. Actually I should have said "locality", instead of parallel processing. While working on another bugzilla, I happened to catch a thread back trace and realized GFS pre-fetches two glocks on *every* file within a directory in its readdir system call. Look to me it was an optimization effort that assumed whenever "ls" was issued, the file access would followed. However, on a 1000-files directory as this bugzilla has described, this could be overkilled. Will remove that pre-featch logic to see how much we could improve this issue. ok, I was wrong - it doesn't grab the file locks but the directory lock. So it refreshes the directory lock for each disk read - that makes sense. False alarm ! However, GFS does invalidates *all* the pages of this directory file each time and everytime when the directory lock is refreshed. So there may be some room for improvement there .. I was wrong about "I was wrong" in comment #33. It is prefetching the file locks (not directory lock). Unfortunately, we can't see much improvement using the test case mentioned in comment #1 (since the file is created via "touch" that has no length). However, in general case, turning off this file lock prefetching should help. Will describe the issue in details before sending out for team code review. Another thing I plan to turn off is the readahead call. Note that if the directory lock is ping-pong between two nodes with write activities, each directory shared lock will involve page invalidation. So readahead would be useless and even harm the performance. Will continue to see what else I can squeeze out of this readdir code path. All the changes will be most likely included within a tunable flag .. may be named as "fast path ls" or something. We would like to close this out as "won't fix". Will put the development efforts into RHEL5 and work with upstream to push "statlite" implementation to alleivate this known cluster filesystem performance problem. |