Bug 977544
| Summary: | gluster volume quota limit-usage now takes 40 seconds per command execution | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ben England <bengland> |
| Component: | glusterd | Assignee: | Krutika Dhananjay <kdhananj> |
| Status: | CLOSED ERRATA | QA Contact: | Saurabh <saujain> |
| Severity: | high | Docs Contact: | |
| Priority: | low | ||
| Version: | 2.1 | CC: | bengland, kdhananj, kparthas, mzywusko, rhs-bugs, saujain, spandura, vagarwal, vbellur |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | Flags: | vagarwal:
needinfo+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | glusterfs-3.4.0.36rhs-1 | Doc Type: | Bug Fix |
| Doc Text: |
Previously, quota limit-usage command's quota configuration file updating logic had high latency. Now, with this update, the quota configuration file updating logic is improved and takes much lesser time.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-11-27 15:25:08 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ben England
2013-06-24 20:48:58 UTC
Could you please let me know what your expected time to complete one 'quota limit-usage' transaction is, if 2 seconds is a long time? Sayan asked if we could support 60000 quotas like NetApp, that's what got me thinking about this. So IMHO the quota command should just be marking the directory as having a quota, not doing all the processing to calculate the space used by the directory at that time. Just marking the directory should not take more than 1/4 sec I would guess (enough time to set extended attributes on the directory). A background process could calculate how much space is currently in use in the directory. Tested this issue on build "glusterfs 3.4.0.35rhs built on Oct 15 2013 14:06:04"
On a distribute-replicate volume having 1000 top-level directories , setting limit-usage on each directory is taking 40S.
root@rhs-client11 [Oct-17-2013-13:09:15] >time gluster v quota vol_dis_rep limit-usage /user1 10GB
volume quota : success
real 0m37.559s
user 0m0.097s
sys 0m0.016s
root@rhs-client11 [Oct-17-2013-13:10:08] >time gluster v quota vol_dis_rep limit-usage /user2 10GB
volume quota : success
real 0m37.716s
user 0m0.099s
sys 0m0.014s
root@rhs-client11 [Oct-17-2013-13:12:07] >time gluster v quota vol_dis_rep limit-usage /user3 10GB
volume quota : success
real 0m37.862s
user 0m0.098s
sys 0m0.017s
root@rhs-client12 [Oct-17-2013-12:53:52] >gluster v info vol_dis_rep
Volume Name: vol_dis_rep
Type: Distributed-Replicate
Volume ID: 7f8013d4-dd04-47ee-8e7d-f096ac2a1597
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: rhs-client11:/rhs/bricks/brick1
Brick2: rhs-client12:/rhs/bricks/brick2
Brick3: rhs-client13:/rhs/bricks/brick3
Brick4: rhs-client14:/rhs/bricks/brick4
Brick5: rhs-client11:/rhs/bricks/brick5
Brick6: rhs-client12:/rhs/bricks/brick6
Brick7: rhs-client13:/rhs/bricks/brick7
Brick8: rhs-client14:/rhs/bricks/brick8
Brick9: rhs-client11:/rhs/bricks/brick9
Brick10: rhs-client12:/rhs/bricks/brick10
Brick11: rhs-client13:/rhs/bricks/brick11
Brick12: rhs-client14:/rhs/bricks/brick12
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.quota: on
===========================================
Meminfo of the machine:-
===========================================
root@rhs-client11 [Oct-17-2013-13:13:17] >free -tg
total used free shared buffers cached
Mem: 15 13 1 0 0 11
-/+ buffers/cache: 2 13
Swap: 7 0 7
Total: 23 13 9
ROOT CAUSE OF THE PROBLEM DESCRIBED IN COMMENT #5: ------------------------------------------------- When limit-usage is invoked for a particular path, glusterd needs to store the gfids of the paths in quota.conf. Now, to eliminate duplicate entries in quota.conf (ex: when quota limit is set for the second time on the same directory, we don't want glusterd to be entering two copies of the gfid in quota.conf), glusterd reads a gfid at a time from quota.conf, compares the gfid of the gfid read with the gfid of the path on which limit is being set in the current operation. Therefore, the number of reads = number of entries in quota.conf the number of comparisons <= number of entries in quota.conf This means, greater the number of entries in quota.conf, slower the operation gets. *** Bug 1021089 has been marked as a duplicate of this bug. *** Ben, I tried to execute some test related to this bug, for glusterfs.3.4.0.36rhs The test basically is about setting quota limit on 64000 directories of one volume in a four node cluster. The 64000 thousand directories are created in the root of the volume. So, in order to set the limit on these directories I have done in batches of 10000 directories in a loop, and executing the same loop for 7 times, last for loop being for 4000 directories. For each 10000 directory loop, I captured time that it had taken, sharing the same with you, for 1-10000 real 34m37.672s user 21m35.960s sys 4m29.646s for 10001-20000 real 35m43.942s user 21m33.960s sys 4m35.986s for 20001-30000 real 36m47.739s user 21m33.709s sys 4m38.902s for 30001-40000 real 37m30.599s user 21m28.254s sys 4m35.139s for 40001-50000 real 38m35.286s user 21m30.369s sys 4m39.452s for 50001-60000 real 39m41.455s user 21m31.950s sys 4m40.954s for 60001-64000, real 16m8.127s user 8m36.569s sys 1m52.466s Also, for listing them I have collected the taken time, using the command "time gluster volume quota $volname list" listing 10000 takes, real 0m23.917s user 0m1.418s sys 0m1.386s listing 30000 takes, real 1m40.601s user 0m4.975s sys 0m4.862s listing 40000 takes, real 2m15.123s user 0m5.017s sys 0m4.659s listing 50000 takes, real 2m52.655s user 0m6.249s sys 0m5.633s listing 60000 takes, real 3m32.462s user 0m8.202s sys 0m6.909s listing 64000 takes, real 3m36.193s user 0m9.551s sys 0m8.665s Want clarification from you, if this test suffices for verifying the fix of the original issue? Or, if you can provide suggestions to verify it. Note:-There was no data in the directories Saurabh, thanks for pinging me, I had forgot to check back on this bz. Sp here's the worst case for establishing quotas: for 50001-60000 real 39m41.455s user 21m31.950s sys 4m40.954s that's 2381 seconds for 10000 quotas or 4 quotas/sec, 8x improvement, I think that's pretty good. This is a very extreme case, typically people do not constantly adjust quotas so it would be a one-time cost for the volume. Is the gluster volume info command fixed so it doesn't output the quotas right in there? Is there a different command to output the quotas? A separate test would be to create a bunch of files and delete them with 60000 quotas. see https://docspace.corp.redhat.com/docs/DOC-156688 for ideas on how to do that. I did it with 400 quotas or something like that. (In reply to Ben England from comment #13) > Saurabh, thanks for pinging me, I had forgot to check back on this bz. > > Sp here's the worst case for establishing quotas: > > for 50001-60000 > real 39m41.455s > user 21m31.950s > sys 4m40.954s > > that's 2381 seconds for 10000 quotas or 4 quotas/sec, 8x improvement, I > think that's pretty good. This is a very extreme case, typically people > do not constantly adjust quotas so it would be a one-time cost for the > volume. > > Is the gluster volume info command fixed so it doesn't output the quotas > right in there? Saurabh >> yes this fixed, Is there a different command to output the quotas? Saurabh >> in order to get the information about the directories having limit set, one need to use "gluster volume quota $volname list" or "gluster volume quota $volname list <path>" > > A separate test would be to create a bunch of files and delete them with > 60000 quotas. see https://docspace.corp.redhat.com/docs/DOC-156688 for > ideas on how to do that. I did it with 400 quotas or something like that. moving it to verified based on commment 12 and comment 13 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html |