Bug 1743595
Summary: | Increased performance for Samba vfs_glusterfs when using pthreadpool | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Guenther Deschner <gdeschner> | ||||
Component: | samba | Assignee: | Guenther Deschner <gdeschner> | ||||
Status: | CLOSED ERRATA | QA Contact: | Vivek Das <vdas> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rhgs-3.4 | CC: | amanzane, amukherj, anoopcs, bkunal, dkochuka, mduasope, olim, pgurusid, puebele, rcyriac, rhs-smb, skandark, skourdi, vdas | ||||
Target Milestone: | --- | Keywords: | Performance | ||||
Target Release: | RHGS 3.5.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | samba-4.9.8-108.el7rhgs | Doc Type: | Enhancement | ||||
Doc Text: |
Asynchronous I/O operations were impeded by a bottleneck in the workflow at the point of notification of successful completion. The bottleneck has been removed and asynchronous I/O operations now perform better.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-10-30 12:18:28 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1696809 | ||||||
Attachments: |
|
Description
Guenther Deschner
2019-08-20 09:25:52 UTC
Created attachment 1607030 [details]
pthreadpool and profiling patch
RCA: Observation from the tests run in customer site: The throughput on Fuse mount exported Gluster volume is better than VFS(gfapi) exported Gluster volume, only when aio is enabled. RCA: The AIO(Asynchronous IO) implementation in VFS gluster module is different from the AIO implementation in the VFS default module. In VFS gluster, we use the async APIs provided by libgfapi, and the callback of async API is executed in another thread. Since Samba is majorly single threaded, this approach of executing async callback in another thread was resulting in crashes. Hence we had to register another event to notify the async callback, so that we workaround the scenario of executing async callback in another thread. But this resulted in reduces performance, as it delays the notification of the AIO completion(it requires two event loop to mark the completion). The solution was to change the AIO implementation to use pthread pool rather than Async APIs by the libgfapi. Hello Team, Customer installed the hotfix and provided an update that Implementation of the hotfix has been successful. ----------- This week we have been installing the hotfix as delivered for the performance issue when using VFS (so disabling the need of the fuse mount / SMB export). Implementation of that update has been successful. Only issue encountered was that on cvltgelgln01 we had the test fix installed that made installation of the hotfix impossible (it did report nothing to do). Roll back of the test fix did fix this issue, so after that also that node was updated successfully. We have removed all additional logging settings as well as all additional created shares and volumes, so that this system is now production ready. I've preformed a number of performance tests yesterday and today. See attached the results of those. You can see that on both the local and the CTDB IP address we do see the expected improvement, this test even was better than the ones performed with the testfix. Hence confirming the working of the hotfix for the issue at hand. Only exception there seems to be writes when using the hostname used for round robbin DNS, this i consider a result of more load from commvault on the system at the time of the test (morning vs. afternoon for all other tests), also this is still a slight improvement vs earlier fuse mount tests. Since we now really need the storage in our backup environment, we will be moving production backup load to these systems in the next days/weeks, so unfortunately we won't be able to perform any tests on the system that involve installing additional software packages, or interfering with the production status of the system. So we won't be able to reproduce the results and logging via the script delivered. ----------- Customer has a query: A final question, since we need to install this hotfix on top of a specific version of gluster and a newer version is / might be available when adding new nodes to the cluster, we have the idea that we could downgrade gluster after initial install (since we use gdeploy for this, it will come with the most recent version to our knowledge). Is this indeed a proper way to add nodes (untill a permanent fix is available and we have upgraded to a version with that fix) ? Regards, Sameer Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:3253 |