Bug 459738
Summary: | GFS2: Multiple writer performance issue. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Dean Jansa <djansa> | ||||
Component: | kernel | Assignee: | Abhijith Das <adas> | ||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 5.3 | CC: | bstevens, edamato, nstraz, rpeterso, syeghiay | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-01-20 20:05:59 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Dean Jansa
2008-08-21 19:13:08 UTC
FWIW -- Running with a single writer and multiple readers doesn't seem to show this wild performance drop when adding readers (using 1M read/write sizes): 1 reader, 1 writer: GFS2 .2 sec (read) 1.2 sec(write) GFS .3 sec (read) 3.6 sec (write) 2 readers, 1 writer: GFS2 1.8 sec (read) 1.8 sec (write) GFS .9 sec (read) .5 sec (write) 3 readers, 1 writer: GFS2 3.0 sec (read) 2.5 sec (write) GFS 3.6 sec (read) 2.0 sec (write) GFS1 runs all seem to show inconsistent results, as seen in the 2 reader, 1 writer case. Probably the test case and the luck of the draw during the runs. Hoped the data was of some use anyway so I've included it. I think I can start to explain some of this now.... looking at the GFS figures too it starts to make a bit more sense. I think what we are seeing is, in part, a result of the different locking in GFS2 vs. GFS. Bearing in mind that GFS is locking complete syscalls and GFS2 is locking on a per page basis, I think its not too surprising that there are more opportunities for GFS2 to drop the lock, and hence for performance to degrade. There is obviously more to it than that, but I do wonder if that is not part of the problem. Looking at the two node results (opening comment), the GFS2 results for 1M are very similar to the GFS results for 4k. The real question is why the 4k results for GFS2 are so much worse. The min hold time code should be enforcing the same minimum hold time whatever the I/O size. We could certainly try some changes to the min-hold time code to see what difference it makes, if any. We could increase the min hold time itself, or another idea is to change the point at which we set gl_tchange to after the glock has read in any info it needs from disk. It also occurs to me that maybe there is a race in that before we process a reply from the DLM, its possible that the demote request arrives first (due to scheduling of the threads) and thus maybe gl_tchange is being checked before its been updated. Thats my list of things to check for now, anyway. In GFS it doesn't surprise me that as the I/O size changes, the performance in this test changes. I'd expect to see less of that effect with GFS2, so I'm pretty sure that the min-hold time code has something not quite right about it still. Created attachment 315086 [details]
Test patch
So this is a test patch to see if I'm right about the race condition. It would also be worth altering the min hold time as well I think, to see if that makes a difference above & beyond this patch.
Results with the test patch build (/kmod-gfs2-1.104-1.1.el5.abhi.4.x86_64.rpm) 1 Node 1M write: GFS2 - 1.4 sec 1 Node 4K write: GFS2 - 2.1 sec 2 Nodes 1M write: GFS2 - 5.6 sec 2 Nodes 4K write: GFS2 - 7.4 sec 3 Nodes 1M write: GFS2 - 6.8 sec 3 Nodes 4K write: GFS2 - 143 sec in kernel-2.6.18-108.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html |