Bug 1530519
Summary: | disperse eager-lock degrades performance for file create workloads | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ashish Pandey <aspandey> |
Component: | disperse | Assignee: | Xavi Hernandez <jahernan> |
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.3 | CC: | amukherj, aspandey, asriram, bugs, jahernan, mpillai, nchilaka, pkarampu, rhinduja, rhs-bugs, sheggodu, srmukher, storage-qa-internal, ubansal |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | RHGS 3.4.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.12.2-5 | Doc Type: | Enhancement |
Doc Text: |
Previously, the eager locking option was used to provide good performance for file access, however, directory access suffered when eager-lockk was enabled for some use cases. To overcome this problem, A a new option 'other-eager-lock' is introduced. This option keeps eager-locking enabled for regular files but disabled for directory accesses.
As a result, Use cases where directories are accessed from multiple clients can benefit from disabling eager-locking for directories without losing performance on file accesses.
|
Story Points: | --- |
Clone Of: | 1502610 | Environment: | |
Last Closed: | 2018-09-04 06:40:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1502610 | ||
Bug Blocks: | 1502455, 1503137, 1512460 |
Comment 14
Nag Pavan Chilakam
2018-03-26 12:37:28 UTC
The behavior with other-eager-lock disabled is expected. When it's enabled it shouldn't be any functional difference compared to the same version without the patch. If I understand correctly, to execute this test you are running an 'ls' in an infinite loop from 4 clients and then checking the time taken by one of them (or another fifth ls executed manually). In this case each 'ls' can block the directory for up to 1 second, causing all other 'ls' to have to wait. Considering this, the time doesn't seem unexpected to me. Did this test take less time before the patch ? Xavi, There were only 15 entries in a directory. He opened 4 console and mounted same volume on 4 different mount point. Then he executed the "ls" command in that directory at the same time (pressed enter and using broadcast feature of the terminal) from all the clients. It was not an infinite loop on any client. I am not sure about the regression and if the previous release without this patch was taking the same time or not. I think in any case it is too much time to list 15 entries only. However, disabling other-eager-lock is giving good performance. Nag, If you can also provide the number without this patch in previous release then that would be great. If 'ls' is not executed in a loop, then anything beyond 4/5 seconds seems bad, but it should be the same that it took before the patch, so there shouldn't be any regression. Maybe self-heal is being triggered for some reason and is competing with regular clients. I'll need to check this. In that case we should execute the same steps with shd disabled. Profile info and other logs will also be helpful. (In reply to Ashish Pandey from comment #16) > Xavi, > > There were only 15 entries in a directory. > He opened 4 console and mounted same volume on 4 different mount point. > Then he executed the "ls" command in that directory at the same time > (pressed enter and using broadcast feature of the terminal) from all the > clients. > > It was not an infinite loop on any client. > > I am not sure about the regression and if the previous release without this > patch was taking the same time or not. > > I think in any case it is too much time to list 15 entries only. > However, disabling other-eager-lock is giving good performance. > > Nag, > If you can also provide the number without this patch in previous release > then that would be great. I checked in 3.3.1-async ie 3.8.4-54-3 and when triggered from 2 clients, simultaneously(didn't have 4 clients), it too less than 1 sec here. as discussed above and over emails i have raised a new bug for the parallel lookup perf degradation (1577750 - severe drop in response time of simultaneous lookups with other-eager-lock enabled) moving this to verified on 3.12.2-9 as the different options are available in the form of other-eager-lock Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607 |