Bug 504188 - GFS1 vs GFS2 performance issue
GFS1 vs GFS2 performance issue
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
urgent Severity urgent
: rc
: ---
Assigned To: Abhijith Das
Cluster QE
: ZStream
Depends On:
Blocks: 629920 639071 639073
  Show dependency treegraph
 
Reported: 2009-06-04 12:45 EDT by Kai Meyer
Modified: 2011-01-13 15:48 EST (History)
20 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
In a two node cluster, moving 100 files between two folders using the lock master was nearly instantaneous. However, not using the lock master resulted in a considerably worse performance on both, GFS1 (Global File System 1) and GFS2 (Global File System 2) file systems. With this update, not using the lock master does not lead to worsened performance on either of the aforementioned file systems.
Story Points: ---
Clone Of:
: 629920 (view as bug list)
Environment:
Last Closed: 2011-01-13 15:48:51 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
First idea at trying to fix this (1.77 KB, patch)
2010-08-12 09:13 EDT, Steve Whitehouse
no flags Details | Diff
Updated patch, including printing the new glock flag in traces (2.18 KB, patch)
2010-08-13 09:49 EDT, Steve Whitehouse
no flags Details | Diff
Another test patch (447 bytes, patch)
2010-09-02 09:22 EDT, Steve Whitehouse
no flags Details | Diff
combined rhel5 patch (2.09 KB, patch)
2010-09-02 13:04 EDT, Abhijith Das
no flags Details | Diff

  None (edit)
Description Kai Meyer 2009-06-04 12:45:57 EDT
Description of problem:

In a two node cluster, the lock master of two folders can move 100 files from one folder to the other in less than 1 second. If a server is not the lock master for that folder, it will take that server 3-5 seconds to perform the same task on GFS1, and 30-50 seconds on GFS1.

While I am still somewhat dissapointed in the 3-5 seconds on GFS1, I could live with it. This bug is simply to point out the speed degradation from GFS1 to GFS2. 


Version-Release number of selected component (if applicable):
Comparison of GFS1 to GFS2
kmod-gfs-0.1.31-3.el5
kernel-2.6.18-128.el5
gfs-utils-0.1.18-1.el5
gfs2-utils-0.1.53-1.el5_3.3


How reproducible:
After formatting the filesystem in either GFS1 or GFS2, I cd into the root directory of the mounted file system (in my case /var/vmail), then on the first node I run:
time rm -fr test1 test2; time mkdir test1 test2 ; time for num in `seq 1 100`; do touch test1/foo.$num; done; time for num in `seq 1 100`; do mv test1/foo.$num test2/bar.$num; done;

On the second node I'll then run:
time for num in `seq 1 100`; do mv test2/bar.$num test1/file.$num; done


Steps to Reproduce:
1. mkfs.gfs[2] -t emailcluster:emailfs -p lock_dlm -j 3 /dev/sdb1
2. Modify /etc/fstab "/dev/sdb1 /var/vmail gfs[2] defaults,noatime 0 0"
3. service gfs[2] start
  
Actual results:
For the non-"lock master" on GFS1, the move takes in the order of 10 times as long as the "lock master".
For the non-"lock master" on GFS2, the move takes in the order of 100 times as long as the "lock master".

Expected results:
For GFS1 and GFS2 to be comparable in this test.

Additional info:
Comment 2 Steve Whitehouse 2010-07-12 08:21:09 EDT
This was reported against the wrong product. Updating to the correct product.
Comment 3 Abhijith Das 2010-07-27 10:06:49 EDT
Update:
I have been able to reproduce this on my shell cluster. gfs2 takes about 30 seconds while gfs1 takes about 4 seconds to complete the test mentioned above. When I changed the .go_min_hold_time from HZ/5 to HZ/100 for gfs2_inode_glops and gfs2_rgrp_glops in glops.c, gfs2 finished the test in about 6-7 seconds.

However, the min-hold-time should ideally expire soon after the glocks are cached (as soon as the files are created) and by the time we move the files on the second node, the glocks should be ready to be released. I'm looking into this further.
Comment 5 Steve Whitehouse 2010-08-12 09:13:42 EDT
Created attachment 438434 [details]
First idea at trying to fix this

This is a bit subtle, since the information about the intent of a remote node is not available at the demoting node. All the demoting node knows is that two demote requests for the same lock have appeared, one from EX to SH and then one from SH to EX.

The patch introduces a new glock flag called the queued flag. This is set any time a holder is queued on a glock. It is reset upon a state change which does not involve the UN state occurs, and only then if there are no queued holders at the time.

If the flag is not set, then the minimum hold time will not be obeyed. Instead the demote will be scheduled immediately. If the flag is set, then the min hold time will be obeyed as usual.

This should, fingers crossed, be enough to solve the problem that we are seeing with multiple demotes, but also enough to ensure that we still don't run into issues when there is lock contention where progress is not made.
Comment 7 Abhijith Das 2010-08-12 17:48:28 EDT
(In reply to comment #5)
> Created an attachment (id=438434) [details]
> First idea at trying to fix this
> 
> This is a bit subtle, since the information about the intent of a remote node
> is not available at the demoting node. All the demoting node knows is that two
> demote requests for the same lock have appeared, one from EX to SH and then one
> from SH to EX.
> 
> The patch introduces a new glock flag called the queued flag. This is set any
> time a holder is queued on a glock. It is reset upon a state change which does
> not involve the UN state occurs, and only then if there are no queued holders
> at the time.
> 
> If the flag is not set, then the minimum hold time will not be obeyed. Instead
> the demote will be scheduled immediately. If the flag is set, then the min hold
> time will be obeyed as usual.
> 
> This should, fingers crossed, be enough to solve the problem that we are seeing
> with multiple demotes, but also enough to ensure that we still don't run into
> issues when there is lock contention where progress is not made.    

No luck with this patch. Still seeing about 38 seconds for the mv commands on gfs2 on the second node.
Comment 8 Steve Whitehouse 2010-08-13 09:49:59 EDT
Created attachment 438686 [details]
Updated patch, including printing the new glock flag in traces
Comment 12 Steve Whitehouse 2010-09-02 04:54:28 EDT
So far as I could tell from yesterday's results, the patch is doing what it was originally written to do. So the problem is likely to be elsewhere.
Comment 13 Steve Whitehouse 2010-09-02 08:26:07 EDT
Another thought....

Lets assume two nodes, A and B. Node A has a lock cached in EX but is idle. Node B tries a lookup followed by a write.

Node B sends a PR request to the dlm
The dlm send a callback to node A
Node A drops to PR
The dlm grants node B's request for PR

Node B does the lookup

Node B send a PR -> EX convert request to the dlm
The dlm sends a single callback to Node A (and if I've read the code correctly) also Node B, and fails node B's request due to the blocking PR which Node A is still holding

So I'd like to verify whether indeed Node B does get a callback from the dlm when the convert fails. If so then I think we might have found the issue. The reason why that is a problem is that Node B has only just been granted a lock. If the demote request arrives before the reply to the convert request, this will set the glock workqueue according to the min hold time.

The glock workqueue is of the delayed variety. That means that the "work" is executed at some future date depending upon when the timer goes off. If a subsequent request to queue work is received while the timer is pending, then it will be ignored. So if there is a demote request being received by node B, that will prevent any further running of the workqueue for 200mS or thereabouts, assuming that the lookup took a very small amount of time.

So if you still have those traces from yesterday, then I'd like to see whether any demote requests were received on the node which you were doing the writes on.
Comment 14 Abhijith Das 2010-09-02 09:13:10 EDT
west-10 - first node

# tracer: nop
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
        dlm_astd-2018  [002] 1885930.700013: gfs2_demote_rq: 253,3 glock 2:66314 demote EX to PR flags:DyIq
        dlm_astd-2018  [002] 1885930.700019: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state EX to XX tgt:EX dmt:PR flags:DyIq - gfs2_glock_cb delay:0
 glock_workqueue-1854  [002] 1885930.715963: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state EX to XX tgt:PR dmt:PR flags:lDpIq - do_xmote delay:0
 glock_workqueue-1854  [002] 1885930.715996: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state EX to XX tgt:PR dmt:PR flags:lDpIq - glock_work_func delay:0
 glock_workqueue-1854  [002] 1885930.716012: gfs2_glock_state_change: 253,3 glock 2:66314 state EX to PR tgt:PR dmt:PR flags:lDpIq
 glock_workqueue-1854  [002] 1885930.716016: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state PR to XX tgt:PR dmt:EX flags:I - glock_work_func delay:0
        dlm_astd-2018  [002] 1885930.740405: gfs2_demote_rq: 253,3 glock 2:66314 demote PR to NL flags:DI
        dlm_astd-2018  [002] 1885930.740406: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state PR to XX tgt:PR dmt:NL flags:DI - gfs2_glock_cb delay:0
 glock_workqueue-1854  [002] 1885930.740448: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state PR to XX tgt:NL dmt:NL flags:lDpI - do_xmote delay:0
 glock_workqueue-1854  [002] 1885930.740457: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state PR to XX tgt:NL dmt:NL flags:lDpI - glock_work_func delay:0
 glock_workqueue-1854  [002] 1885930.740468: gfs2_glock_state_change: 253,3 glock 2:66314 state PR to NL tgt:NL dmt:NL flags:lDpI
 glock_workqueue-1854  [002] 1885930.740470: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state NL to XX tgt:NL dmt:EX flags:I - glock_work_func delay:0
 glock_workqueue-1854  [002] 1885934.676075: gfs2_glock_state_change: 253,3 glock 2:66213 state NL to EX tgt:EX dmt:EX flags:lIq
 glock_workqueue-1854  [002] 1885934.676428: gfs2_promote: 253,3 glock 2:66213 promote first EX
 glock_workqueue-1854  [002] 1885934.676439: gfs2_glock_abhi_debug: 253,3 glock 2:66213 state EX to XX tgt:EX dmt:EX flags:Iq - glock_work_func delay:0
 glock_workqueue-1854  [002] 1885934.676540: gfs2_glock_state_change: 253,3 glock 2:66213 state EX to NL tgt:NL dmt:NL flags:lDpIq
 glock_workqueue-1854  [002] 1885934.676543: gfs2_glock_abhi_debug: 253,3 glock 2:66213 state NL to XX tgt:NL dmt:EX flags:Iq - glock_work_func delay:0
            grep-11934 [000] 1885937.435063: gfs2_promote: 253,3 glock 2:22 promote first PR
            grep-11934 [000] 1885937.435071: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            grep-11934 [000] 1885937.435081: gfs2_promote: 253,3 glock 2:22 promote first PR
            grep-11934 [000] 1885937.435082: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
     gfs2_quotad-11923 [001] 1885944.683912: gfs2_glock_abhi_debug: 253,3 glock 2:66213 state NL to XX tgt:EX dmt:EX flags:lIq - do_xmote delay:0
     gfs2_quotad-11923 [001] 1885944.684409: gfs2_demote_rq: 253,3 glock 2:66213 demote EX to NL flags:DIq
     gfs2_quotad-11923 [001] 1885944.684411: gfs2_glock_queue: 253,3 glock 2:66213 dequeue EX
     gfs2_quotad-11923 [001] 1885944.684412: gfs2_glock_abhi_debug: 253,3 glock 2:66213 state EX to XX tgt:EX dmt:NL flags:DIq - gfs2_glock_dq delay:0
 glock_workqueue-1851  [001] 1885944.684458: gfs2_glock_abhi_debug: 253,3 glock 2:66213 state EX to XX tgt:NL dmt:NL flags:lDpIq - do_xmote delay:0
 glock_workqueue-1851  [001] 1885944.684471: gfs2_glock_abhi_debug: 253,3 glock 2:66213 state EX to XX tgt:NL dmt:NL flags:lDpIq - glock_work_func delay:0
            find-11933 [003] 1885947.442094: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-11933 [003] 1885947.442102: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-11933 [003] 1885947.442111: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-11933 [003] 1885947.442112: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-11933 [001] 1885947.444452: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-11933 [001] 1885947.444460: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-11933 [001] 1885947.444464: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-11933 [001] 1885947.444465: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-11933 [001] 1885947.444470: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-11933 [001] 1885947.444471: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-11933 [001] 1885947.444521: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-11933 [001] 1885947.444522: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-11933 [001] 1885947.444523: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-11933 [001] 1885947.444524: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-11933 [001] 1885947.444533: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-11933 [001] 1885947.444534: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-11933 [001] 1885947.474184: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-11933 [001] 1885947.474191: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-11933 [001] 1885947.474199: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-11933 [001] 1885947.474200: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
Comment 15 Abhijith Das 2010-09-02 09:13:56 EDT
west-11

# tracer: nop
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
    open_n_write-29159 [002] 1630943.457578: gfs2_promote: 253,3 glock 2:22 promote first PR
    open_n_write-29159 [002] 1630943.457584: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
    open_n_write-29159 [002] 1630943.457591: gfs2_promote: 253,3 glock 2:22 promote first PR
    open_n_write-29159 [002] 1630943.457629: gfs2_glock_abhi_debug: 253,3 glock 5:66314 state NL to XX tgt:PR dmt:EX flags:lq - do_xmote delay:0
 glock_workqueue-19106 [002] 1630943.457995: gfs2_glock_state_change: 253,3 glock 5:66314 state NL to PR tgt:PR dmt:EX flags:lIq
 glock_workqueue-19106 [002] 1630943.457997: gfs2_promote: 253,3 glock 5:66314 promote other PR
 glock_workqueue-19106 [002] 1630943.458000: gfs2_glock_abhi_debug: 253,3 glock 5:66314 state PR to XX tgt:PR dmt:EX flags:Iq - glock_work_func delay:0
    open_n_write-29159 [002] 1630943.458005: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
    open_n_write-29159 [002] 1630943.458007: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state NL to XX tgt:PR dmt:EX flags:lq - do_xmote delay:0
 glock_workqueue-19106 [002] 1630943.497774: gfs2_glock_state_change: 253,3 glock 2:66314 state NL to PR tgt:PR dmt:EX flags:lIq
 glock_workqueue-19106 [002] 1630943.497984: gfs2_promote: 253,3 glock 2:66314 promote first PR
 glock_workqueue-19106 [002] 1630943.497987: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state PR to XX tgt:PR dmt:EX flags:Iq - glock_work_func delay:0
    open_n_write-29159 [002] 1630943.497990: gfs2_glock_queue: 253,3 glock 2:66314 dequeue PR
    open_n_write-29159 [002] 1630943.497994: gfs2_promote: 253,3 glock 2:66314 promote first PR
    open_n_write-29159 [002] 1630943.498177: gfs2_glock_queue: 253,3 glock 2:66314 dequeue PR
    open_n_write-29159 [002] 1630943.498313: gfs2_promote: 253,3 glock 2:66314 promote first PR
    open_n_write-29159 [002] 1630943.498314: gfs2_glock_queue: 253,3 glock 2:66314 dequeue PR
    open_n_write-29159 [002] 1630943.498319: gfs2_promote: 253,3 glock 2:66314 promote first PR
    open_n_write-29159 [002] 1630943.498320: gfs2_glock_queue: 253,3 glock 2:66314 dequeue PR
    open_n_write-29159 [002] 1630943.498332: gfs2_promote: 253,3 glock 2:66314 promote first PR
    open_n_write-29159 [002] 1630943.498334: gfs2_glock_queue: 253,3 glock 2:66314 dequeue PR
    open_n_write-29159 [002] 1630943.498338: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state PR to XX tgt:EX dmt:EX flags:lIq - do_xmote delay:0
        dlm_astd-2092  [002] 1630943.498552: gfs2_demote_rq: 253,3 glock 2:66314 demote PR to NL flags:ldrIq
        dlm_astd-2092  [002] 1630943.498553: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state PR to XX tgt:EX dmt:NL flags:ldrIq - gfs2_glock_cb delay:200
 glock_workqueue-19106 [002] 1630943.498556: gfs2_glock_state_change: 253,3 glock 2:66314 state PR to PR tgt:EX dmt:NL flags:ldIq
 glock_workqueue-19106 [002] 1630943.498572: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state PR to XX tgt:EX dmt:NL flags:ldIq - do_xmote delay:0
 glock_workqueue-19106 [002] 1630943.498583: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state PR to XX tgt:EX dmt:NL flags:ldIq - glock_work_func delay:200
 glock_workqueue-19106 [002] 1630943.697269: gfs2_glock_state_change: 253,3 glock 2:66314 state PR to NL tgt:EX dmt:NL flags:ldIq
 glock_workqueue-19106 [002] 1630943.697271: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state NL to XX tgt:EX dmt:NL flags:ldIq - do_xmote delay:0
 glock_workqueue-19106 [002] 1630943.697279: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state NL to XX tgt:EX dmt:NL flags:lIq - glock_work_func delay:0
 glock_workqueue-19106 [002] 1630943.697485: gfs2_glock_state_change: 253,3 glock 2:66314 state NL to EX tgt:EX dmt:NL flags:lIq
 glock_workqueue-19106 [002] 1630943.697740: gfs2_promote: 253,3 glock 2:66314 promote first EX
 glock_workqueue-19106 [002] 1630943.697743: gfs2_glock_abhi_debug: 253,3 glock 2:66314 state EX to XX tgt:EX dmt:NL flags:Iq - glock_work_func delay:0
    open_n_write-29159 [002] 1630943.697751: gfs2_promote: 253,3 glock 1:2 promote other PR
    open_n_write-29159 [002] 1630943.697771: gfs2_glock_queue: 253,3 glock 1:2 dequeue PR
    open_n_write-29159 [002] 1630943.697773: gfs2_glock_queue: 253,3 glock 2:66314 dequeue EX
 glock_workqueue-19106 [002] 1630948.158332: gfs2_glock_state_change: 253,3 glock 2:66213 state NL to EX tgt:EX dmt:EX flags:lIq
 glock_workqueue-19106 [002] 1630948.158578: gfs2_promote: 253,3 glock 2:66213 promote first EX
 glock_workqueue-19106 [002] 1630948.158590: gfs2_glock_abhi_debug: 253,3 glock 2:66213 state EX to XX tgt:EX dmt:EX flags:Iq - glock_work_func delay:0
 glock_workqueue-19106 [002] 1630948.158732: gfs2_glock_state_change: 253,3 glock 2:66213 state EX to NL tgt:NL dmt:NL flags:lDpIq
 glock_workqueue-19106 [002] 1630948.158736: gfs2_glock_abhi_debug: 253,3 glock 2:66213 state NL to XX tgt:NL dmt:EX flags:Iq - glock_work_func delay:0
            find-29161 [002] 1630950.204468: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-29161 [002] 1630950.204476: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-29161 [002] 1630950.204488: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-29161 [002] 1630950.204490: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-29161 [000] 1630950.208075: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-29161 [000] 1630950.208084: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-29161 [000] 1630950.208091: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-29161 [000] 1630950.208093: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-29161 [000] 1630950.208102: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-29161 [000] 1630950.208104: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-29161 [000] 1630950.208176: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-29161 [000] 1630950.208178: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-29161 [000] 1630950.208181: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-29161 [000] 1630950.208183: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-29161 [000] 1630950.208200: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-29161 [000] 1630950.208202: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-29161 [000] 1630950.267308: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-29161 [000] 1630950.267317: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            find-29161 [000] 1630950.267330: gfs2_promote: 253,3 glock 2:22 promote first PR
            find-29161 [000] 1630950.267331: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
     gfs2_quotad-29146 [003] 1630954.860305: gfs2_glock_abhi_debug: 253,3 glock 2:66213 state NL to XX tgt:EX dmt:EX flags:lIq - do_xmote delay:0
     gfs2_quotad-29146 [003] 1630954.860929: gfs2_demote_rq: 253,3 glock 2:66213 demote EX to NL flags:DIq
     gfs2_quotad-29146 [003] 1630954.860931: gfs2_glock_queue: 253,3 glock 2:66213 dequeue EX
     gfs2_quotad-29146 [003] 1630954.860932: gfs2_glock_abhi_debug: 253,3 glock 2:66213 state EX to XX tgt:EX dmt:NL flags:DIq - gfs2_glock_dq delay:0
 glock_workqueue-19107 [003] 1630954.861002: gfs2_glock_abhi_debug: 253,3 glock 2:66213 state EX to XX tgt:NL dmt:NL flags:lDpIq - do_xmote delay:0
 glock_workqueue-19107 [003] 1630954.861034: gfs2_glock_abhi_debug: 253,3 glock 2:66213 state EX to XX tgt:NL dmt:NL flags:lDpIq - glock_work_func delay:0
            grep-29162 [001] 1630956.907173: gfs2_promote: 253,3 glock 2:22 promote first PR
            grep-29162 [001] 1630956.907182: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
            grep-29162 [001] 1630956.907190: gfs2_promote: 253,3 glock 2:22 promote first PR
            grep-29162 [001] 1630956.907192: gfs2_glock_queue: 253,3 glock 2:22 dequeue PR
Comment 16 Steve Whitehouse 2010-09-02 09:22:36 EDT
Created attachment 442615 [details]
Another test patch

Patch to see if my hunch is correct
Comment 17 Abhijith Das 2010-09-02 13:04:45 EDT
Created attachment 442677 [details]
combined rhel5 patch

This is the combined rhel5 patch for the two upstream patches above.
With both these patches in place, gfs2 performs much better than before with the 'move 100 files from one dir to another' test

GFS1
----
real    0m0.917s
user    0m0.036s
sys     0m0.261s

GFS2
----
real    0m2.511s
user    0m0.016s
sys     0m0.183s
Comment 18 Abhijith Das 2010-09-02 16:12:17 EDT
http://porkchop.devel.redhat.com/brewroot/scratch/adas/task_2730019/

This is a rhel5 kernel build with the patch above. It's in my scratch space... don't know how long it'll be there before it's wiped away by brew housekeeping. However, I've saved off the rpms onto my work machine. In case you can't find the builds at the link above, ask me.
Comment 19 Steve Whitehouse 2010-09-03 05:35:37 EDT
Patch for GFS2 has been pushed into the -nmw git tree.
Patch for dlm has been sent to dct.
Comment 20 RHEL Product and Program Management 2010-09-03 05:59:25 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 21 David Teigland 2010-09-03 10:59:14 EDT
I'm trying to understand the dlm patch in comment 16.

send_blocking_asts() sends basts to locks on the grantqueue, and I don't see that the lkb arg would ever be on the grantqueue.  So I don't think the patch has any effect on send_blocking_asts().

send_blocking_asts_all() sends basts to locks on both grantqueue and convertqueue, and I do expect that the lkb arg would be on the convertqueue, so the patch does make sense in that case.  Is that what you've observed?  But send_blocking_asts_all() is only used when gfs uses the DLM_LKF_NOQUEUEBAST flag.  Is gfs using that flag here?
Comment 22 David Teigland 2010-09-03 11:01:36 EDT
Ah, I hadn't seen the email with the description in it:

From: Steven Whitehouse <swhiteho@redhat.com>                                   
To: David Teigland <teigland@redhat.com>                                        
Cc: cluster-devel@redhat.com, linux-kernel@vger.kernel.org                      
Date: Fri, 03 Sep 2010 10:40:16 +0100                                           
Message-ID: <1283506816.2500.21.camel@localhost>                                


When converting a lock, an lkb is in the granted state and also being used
to request a new state. In the case that the conversion was a "try 1cb"
type which has failed, and if the new state was incompatible with the old
state, a callback was being generated to the requesting node. This is
incorrect as callbacks should only be sent to all the other nodes holding
blocking locks. The requesting node should receive the normal (failed)
response to its "try 1cb" conversion request only.

This was discovered while debugging a performance problem on GFS2, however
this fix also speeds up GFS as well. In the GFS2 case the performance gain
is over 10x for cases of write activity to an inode whose glock is cached
on another, idle (wrt that glock) node.
Comment 23 Steve Whitehouse 2010-09-08 05:02:57 EDT
Barry, since you are looking into performance on GFS2, I thought you might be interested in having a heads up on this particular bug. It only affects performance in cluster operation, it won't make any difference on single node.
Comment 25 Abhijith Das 2010-09-09 14:38:38 EDT
posted -part2.patch to rhkernel as well
http://post-office.corp.redhat.com/archives/rhkernel-list/2010-September/msg00464.html
Comment 28 Jarod Wilson 2010-09-15 09:59:42 EDT
in kernel-2.6.18-221.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.
Comment 40 Bill C. Riemers 2010-11-04 16:09:25 EDT
Sorry, I was logged into the wrong instance.
Comment 41 Martin Prpič 2010-11-11 08:56:21 EST
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
In a two node cluster, the lock master of two folders can move 100 files from one folder to the other in less than 1 second. If a server is not the lock master for that folder, it would take that server 3-5 seconds to perform the same task on GFS1 (Global File System 1), and 30-50 seconds on GFS2 (Global File System 2). With this update, the aforementioned task takes less than 1 second on GFS1 and about 3 seconds on GFS2.
Comment 42 Martin Prpič 2010-11-19 09:45:41 EST
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-In a two node cluster, the lock master of two folders can move 100 files from one folder to the other in less than 1 second. If a server is not the lock master for that folder, it would take that server 3-5 seconds to perform the same task on GFS1 (Global File System 1), and 30-50 seconds on GFS2 (Global File System 2). With this update, the aforementioned task takes less than 1 second on GFS1 and about 3 seconds on GFS2.+In a two node cluster, moving 100 files between two folders using the lock master was nearly instantaneous. However, not using the lock master resulted in a considerably worse performance on both, GFS1 (Global File System 1) and GFS2 (Global File System 2) file systems. With this update, not using the lock master does not lead to worsened performance on either of the aforementioned file systems.
Comment 43 Nate Straz 2010-12-14 11:55:26 EST
I added a new test case to our regression run for this bz.  Running on a RHEL5.5 cluster and a RHEL5.6 cluster showed a huge improvement.

RHEL5.5:

Mounting gfs /dev/timetrials/lmrename on all nodes with opts '': buzz-01 buzz-02 buzz-03 buzz-04 buzz-05 
Creating 1000 files for each node to move from buzz-01
Running mv on buzz-01
Running mv on buzz-02
Move from lock master took 1.12 seconds
Move from other node took 8.03 seconds
buzz-02 took 7.17x longer than buzz-01 (master)

Mounting gfs2 /dev/timetrials/lmrename on all nodes with opts '': buzz-01 buzz-02 buzz-03 buzz-04 buzz-05 
Creating 1000 files for each node to move from buzz-01
Running mv on buzz-01
Running mv on buzz-02
Move from lock master took 1.13 seconds
Move from other node took 221.37 seconds
buzz-02 took 195.90x longer than buzz-01 (master)


RHEL5.6:

Mounting gfs /dev/timetrials/lmrename on all nodes with opts '': dash-01 dash-02 dash-03 
Creating 1000 files for each node to move from dash-01
Running mv on dash-01
Running mv on dash-02
Move from lock master took 3.19 seconds
Move from other node took 9.58 seconds
dash-02 took 3.00x longer than dash-01 (master)

Mounting gfs2 /dev/timetrials/lmrename on all nodes with opts '': dash-01 dash-02 dash-03 
Creating 1000 files for each node to move from dash-01
Running mv on dash-01
Running mv on dash-02
Move from lock master took 3.15 seconds
Move from other node took 31.30 seconds
dash-02 took 9.94x longer than dash-01 (master)
Comment 45 errata-xmlrpc 2011-01-13 15:48:51 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html

Note You need to log in before you can comment on or make changes to this bug.