Bug 1006172
Summary: | Dist-geo-rep: Performance degradation between earlier versions to .33rhs | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Amar Tumballi <amarts> |
Component: | geo-replication | Assignee: | Amar Tumballi <amarts> |
Status: | CLOSED ERRATA | QA Contact: | Neependra Khare <nkhare> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 2.1 | CC: | aavati, csaba, dshaks, grajaiya, kparthas, rhs-bugs, shaines, vagarwal, vbellur, vkoppad, vraman |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.4.0.34rhs | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-11-27 15:37:32 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Amar Tumballi
2013-09-10 07:15:12 UTC
>>>> On 08/28/2013 10:30 AM, Venky Shankar wrote: >>>>> On Wed, Aug 28, 2013 at 09:43:50AM +0530, Neependra Khare wrote: >>>>>> On 08/28/2013 02:05 AM, Amar Tumballi wrote: >>>>> [snip] >>>>> >>>>>> As you would see the WAN throughput drops over time with build 21. >>>>>> This may happen because of two reason >>>>>> I can think of :- >>>>>> 1. Slow reads from master server - this does not seem to be problem >>>>>> 2. Slow processing of changelog files >>>>> Well, changelog processing is not O(1). The more the number of >>>>> entries, more the entry and data operations on the slave. >>>>> >>>>> Changelog processing logic has not changed extremely between these >>>>> builds. Let us take some sample runs with the current workload and >>>>> check how much time does it take to process a changelog. >>>>> >>>> I have been working with Venky to collect more stats with another run. >>>> With the latest run the result is still the same. Build 20 is >>>> performing better than build 21. >>>> Venky is suspecting the problem with AFR. >>>> >>>> In my opinion there are two ways we can proceed :- >>>> 1. Take a run without replication . >>>> 2. Create a custom build with latest build patch-sets excluding the >>>> patches added in build 21 and >>>> take the run with AFR. >>>> >>>> Any other suggestions are welcome. >>>> >>> Avati, >>> >>> It is just 1 patch which went in between these two builds. Can you think >>> of a reason ? I am thinking because we removed '(need_unwind)' check in >>> afr_writev, we may be taking more time to unwind the write calls now. >>> >>> Regards, >>> Amar >>> >>> >>>> Regards, >>>> Neependra >>>> >>>> >>>>>> Let me know if you want me to run any specific tests. >>>>>> >>>>>> Regards, >>>>>> Neependra >>>>>> >>>>> Thanks, >>>>> -venky >>>> >>> >> >> Amar >> It was removed only in truncate and ftruncate.. writev is untouched. >> >> Avati >> > > And it turns out rsync --inplace (the way geo-rep uses rsync now) calls > ftruncate() even if size matches. That explains the perf drop. How > severe is this? Are we planning to fix this by GA? > > Avati > And here's a fix - http://review.gluster.org/5737. It will be great to get this patch tested that it indeed fixes the perf regression (as the analysis is based only on theory) first. Amar, can you please provide Neependra with a build so that we can verify this? Avati Fixed in version please. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html |