Bug 1598056
Summary: | [Perf] mkdirs are regressing 54% on 3 way replicated volume | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Karan Sandha <ksandha> | ||||||||||||||
Component: | core | Assignee: | Mohit Agrawal <moagrawa> | ||||||||||||||
Status: | CLOSED DEFERRED | QA Contact: | Prasanth <pprakash> | ||||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||||
Priority: | medium | ||||||||||||||||
Version: | rhgs-3.4 | CC: | bkunal, guillaume.pavese, jahernan, ksubrahm, moagrawa, mpillai, nchilaka, pprakash, rhinduja, rhs-bugs, sanandpa, sheggodu, storage-qa-internal, vdas | ||||||||||||||
Target Milestone: | --- | Keywords: | Performance, Regression, Reopened | ||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | All | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | brick-multiplexing | ||||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2020-08-25 15:21:14 UTC | Type: | Bug | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Bug Depends On: | |||||||||||||||||
Bug Blocks: | 1577750, 1610751, 1651050, 1651508 | ||||||||||||||||
Attachments: |
|
Created attachment 1457408 [details]
mkdir_340
Following is the condensed information from comment #12 and comment #13. Time is in seconds. total-times<331> | total-times<340> ========================================================================= entrylk-total-time 5300.63 |entrylk-total-time 6675.02 lookup-total-time 5539.24 |lookup-total-time 6547.7 inodelk-total-time 5646.45 |inodelk-total-time 6791.9 finodelk-total-time 0.003457 |finodelk-total-time 0.003292 mkdir-total-time 23383 |mkdir-total-time 23599 setxattr-total-time 4341.38 |setxattr-total-time 5710.13 stat-total-time |stat-total-time 19.09 statfs-total-time 18.845 |statfs-total-time 42.8818 total-calls<331> | total-calls<340> ======================================================================== entrylk-total-calls 34560612 |entrylk-total-calls 36000624 lookup-total-calls 18833403 |lookup-total-calls 18583905 inodelk-total-calls 37440300 |inodelk-total-calls 37440162 finodelk-total-calls 48 |finodelk-total-calls 48 mkdir-total-calls 17280072 |mkdir-total-calls 17280072 setxattr-total-calls 17280072 |setxattr-total-calls 17280072 stat-total-calls |stat-total-calls 122498 statfs-total-calls 145440 |statfs-total-calls 287064 Note the increase in total times of entrylk, lookup, inodelk, mkdir (slightly around 220s), stat and statfs. However, * number of entrylk calls is increased on brick (as expected as DHT is adding two entrylks for mkdir) * number of lookup calls is actually _REDUCED_ (but lookup latency has increased on brick for 340 by 1008 seconds) * number of inodelk calls has almost remained constant (but inodelk latency is increased in 340 by 1146 seconds) * mkdir total calls are exactly same, but there is a slight increase in latency (of around 220s) * number of setxattr calls has remained exactly same, but there is a latency increase by 1369 seconds * there were no stats in 331, but there are few in 340, but their latency is negligible * number of statfs calls have increased by 141624 in 340, but the increase in latency is by a negligible 20s Note that I've added up times without considering replica count. But that should give a rough overview. I'll think more about this data before coming to conclusions. Due to patch, https://review.gluster.org/#/c/19732 I would expect more lookups as mkdir would initiate server-resolve with RESOLVE_NOT, but I am seeing less number of lookups. This is puzzling. 331 total time | 340 total time (bmux off) ============================================================= entrylk-total-time 5300.63 |entrylk-total-time 7294 lookup-total-time 5539.24 |lookup-total-time 6613.1 inodelk-total-time 5646.45 |inodelk-total-time 7007.72 finodelk-total-time 0.003457 | mkdir-total-time 23383 |mkdir-total-time 24624.4 setxattr-total-time 4341.38 |setxattr-total-time 4333.94 stat-total-time | statfs-total-time 18.845 | 331 total calls | 340 total calls (bmux off) ============================================================ entrylk-total-calls 34560612 |entrylk-total-calls 36000624 lookup-total-calls 18833403 |lookup-total-calls 20836566 inodelk-total-calls 37440300 |inodelk-total-calls 37470204 finodelk-total-calls 48 | mkdir-total-calls 17280072 |mkdir-total-calls 17280072 setxattr-total-calls 17280072 |setxattr-total-calls 17295093 stat-total-calls | total-times<331> | total-times<340> (bmux on) ========================================================================= entrylk-total-time 5300.63 |entrylk-total-time 6675.02 lookup-total-time 5539.24 |lookup-total-time 6547.7 inodelk-total-time 5646.45 |inodelk-total-time 6791.9 finodelk-total-time 0.003457 |finodelk-total-time 0.003292 mkdir-total-time 23383 |mkdir-total-time 23599 setxattr-total-time 4341.38 |setxattr-total-time 5710.13 stat-total-time |stat-total-time 19.09 statfs-total-time 18.845 |statfs-total-time 42.8818 total-calls<331> | total-calls<340> (bmux on) ======================================================================== entrylk-total-calls 34560612 |entrylk-total-calls 36000624 lookup-total-calls 18833403 |lookup-total-calls 18583905 inodelk-total-calls 37440300 |inodelk-total-calls 37440162 finodelk-total-calls 48 |finodelk-total-calls 48 mkdir-total-calls 17280072 |mkdir-total-calls 17280072 setxattr-total-calls 17280072 |setxattr-total-calls 17280072 stat-total-calls |stat-total-calls 122498 statfs-total-calls 145440 |statfs-total-calls 287064 Created attachment 1471969 [details]
bmux-off-mkdir total time
Time is in seconds. Each line represents total time latency of mkdir as seen on protocol/client in a setup with bmux off
Created attachment 1471970 [details]
bmux-on-mkdir-total-time
Time is in seconds. Each line represents total time latency of mkdir as seen on protocol/client in a setup with bmux on
Created attachment 1472629 [details]
sorted mkdir latencies on brick stack with bmux on
Created attachment 1472630 [details]
sorted mkdir latencies on brick stack with bmux off
Hi, I have tried to find out why we have a performance difference for mkdir brick_mux vs non_brick_mux ?? I have found the main root cause of this issue is lock contention at io-thread. Here I have setup distribute-replicate environment (12x3 nodes) In case of non-brick mux scenario glusterd spawn 12 brick processes and every process spawn 4 pooler thread to poll the request from the socket and every brick process can scale 16 io-thread at peak rate. In case of brick_mux scenario, a glusterd spawn a single brick process to handle all bricks.In this case by default, brick_process spawn 1 poller thread per brick and it also scales 16 io-thread per brick to handle fop request. In case of 12 bricks (16x12 = 192 io-thread) can be handled at the single brick process at high peak rate. The main difference between these (brick mux vs non brick_mux) 1) pooler thread (Non brick_mux 4 pooler thread per brick, brick_mux 1 pooler thread per brick) 2) io-thread (Non brick_mux 16 io-thread per process brick_mux 192 io-thread for a single process at peak) Below is the current io-thread design 1) We have 4 type of queues based on priority and single mutex to enqueue/dequeue and signaling for all queues I have changes to like below 1) Introduce separate lock for every queue 2) Use atomic ops to know if a queue has any data. 3) Use the separate lock to for signaling purpose I have executed mkdir test case as QE was running and I have found performance was improved with below configuration 1) 4 io-thread 2) set 37 server-event threads Here I have used 4 io-thread because we have 4 queues so that every queue has a similar priority to handle the fop while multiple types of fop are requested.(In this test case most of the fops are belongs to the same queue) 37 server-event threads is required 37 + 1 default pooler thread per brick(brick_mux) it will become 48 pooler thread similar to non_brick mux scenario. For now server.event-thread the option does not work perfectly for brick_mux so I have configured this number. (we need to correct it) I think to test this patch I need to generate parallel workload for fop of multiple queues otherwise, it is difficult to measure the improvement. In case of mkdir/rmdir with the smallfile tool after use above configuration, i have not found major regression in a current standard build. Below are the numbers i got in my testing with io-thread granular patch (with brick_mux) Brick_mux 1) 4-io-thread mkdir (109 files /sec) rmdir (108 files /sec) 2) 4 io-thread with 15 server thread mkdir 118 files /sec rmdir 114 files /sec 3) 4 io-thread with 37 server-thread mkdir 119 files /sec rmdir 125 files /sec Baseline (on current build 3.4((without brick_mux) mkdir (122 files /sec) rmdir (136 files /sec)) Current available build (4 io-thread , 37 server-thread brick_mux) mkdir 118 files /sec rmdir 117 files /sc Before sharing any patch with QE I have requested to share the perf number with above option on current build. Regards, Mohit Agrawal As this regression is pronounced with brick mux enabled, I am moving it out of dht. I am not aware about brickmux enable by default in BU2. About testing ,there are some fixes going on related to brickmux in BU2 because of that we have initiated performance run with brickmux enable. Sachin, I believe with 3.5.0 release, this is now not observed. Would like someone to revisit this bug, and close it if its not the case with 3.5.0 builds. Moving the needinfo. This will be revisited in the coming build(s) of RHGS 3.5.0. Will keep this bug updated with the progress. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Created attachment 1457407 [details] mkdir_331