Bug 1273348
Summary: | [Tier]: lookup from client takes too long {~7m for 18k files} | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rahul Hinduja <rhinduja> | ||||
Component: | tier | Assignee: | Dan Lambright <dlambrig> | ||||
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rhgs-3.1 | CC: | asrivast, jbyers, nchilaka, rhs-bugs, sankarshan, sarumuga, storage-qa-internal | ||||
Target Milestone: | --- | Keywords: | ZStream | ||||
Target Release: | RHGS 3.1.2 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-3.7.5-7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 1273333 | Environment: | |||||
Last Closed: | 2016-03-01 05:43:06 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1273333 | ||||||
Bug Blocks: | 1260783, 1260923 | ||||||
Attachments: |
|
Comment 3
Rahul Hinduja
2015-11-02 12:38:54 UTC
Can someone tell me where to find the "crefi" utility? dan, as mentioned in scrum, we can use general dd command or touch command to create a significant number of files. You don't need crefi utility. RCA ; the problem is the overhead of doing a readdir (basically internal lookups) seems to scale with the number of sub volumes. On a tiered volume, there are more sub volumes. One question, the bug shows when tiering is disabled the time is 3.11 minutes. When tiering is enabled time is 7.41 minutes. I would like to know if the jump in time is due to demoting data between the nodes. Can you repeat the test and, at the same time the "find . | xargs stat" is running, in another window issue "gluster vol tier <volname> status". Check if the counters are increasing at the same time the find command runs. Wait until all data moves to the cold tier and repeat the test. Monitor counters to be sure they are not increasing and the system is stable. Are the times different or just as bad? I have seen moderate performance degredation while demotion is happening on a 2 node system , hot tier 6x2 and cold tier 6x2. Updating my observation: 18k empty files created, able to sync to slave(in geo-rep setup) . ================================================== NON-tier volume: 3x2 volume [root@gfvm3 non-tierd_volume]# echo 3 > /proc/sys/vm/drop_caches [root@gfvm3 non-tierd_volume]# time ls .. real 0m10.130s user 0m0.194s sys 0m0.345s [root@gfvm3 non-tierd_volume]# ================================================== Tier Volume : 3x2 cold tier , 2x2 hot tier [root@gfvm3 tierd_volume]# echo 3 > /proc/sys/vm/drop_caches [root@gfvm3 tierd_volume]# time ls .. real 0m18.290s user 0m0.206s sys 0m0.372s ================================================== (In reply to Saravanakumar from comment #8) > Updating my observation: This is observed with readdirp to cold tier only patch. (http://review.gluster.org/#/c/12530/) Created attachment 1097590 [details]
logs find command
Executed the following command with and without tiering:
#time find . | xargs stat
Following is my observation.
WITHOUT TIERING:
real 0m3.126s
user 0m0.324s
sys 0m0.580s
WITH TIERING:
real 0m7.822s
user 0m0.506s
sys 0m0.889s
------------------------
Please find complete log in attached for all commands executed.
Verified with the build: glusterfs-3.7.5-7.el7rhgs.x86_64 Volume type: Tiered Number of files: 17077 {Actual data files with total size of 5G} Time taken in each case {find . | xargs stat}: Case 1: Default CTR {enabled} and watermarks enabled {mode=cache} real 1m13.305s user 0m1.220s sys 0m2.686s Case 2: Enabled watermarks low and hi to 10 and 60 respectively real 1m16.308s user 0m1.233s sys 0m2.769s Case 3: Enabled watermarks low and hi to 10 and 20 respectively real 1m12.880s user 0m1.204s sys 0m2.551s Case 4: Set the watermark to test mode {mode=test} real 1m36.250s user 0m1.239s sys 0m2.743s Case 5: Disabled CTR real 1m36.958s user 0m1.246s sys 0m2.661s In all the above cases, time taken is approximately same. Moving the bug to verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html |