Bug 2015551 - Perf: Entry self-heal does xattrops unnecessarily in many cases
Summary: Perf: Entry self-heal does xattrops unnecessarily in many cases
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Karthik U S
QA Contact: Vivek Das
URL:
Whiteboard:
Depends On: 2073919
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-19 13:33 UTC by Karthik U S
Modified: 2022-10-12 09:44 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-12 09:44:54 UTC
Embargoed:


Attachments (Terms of Use)

Description Karthik U S 2021-10-19 13:33:20 UTC
Description of issue:
Clone of upstream issue: https://github.com/gluster/glusterfs/issues/2626

While healing a name, afr is doing an xattrop to do new entry pending heals marking for files that are created as part of heal. For this

- Lookup on sink brick is done to check if the gfid exists or not
- If it is not present then sequentially new entry marking is done on each of the source bricks before creating the file.

In most cases by the time afr detects that it needs to do new entry marking, it would have already done a named lookup which would lead to possession of the xattrs to make the decision without the lookup. If all source bricks contain file with pending marking on the sink brick, there is no need for new entry marking. This would save 3 sequential network calls LOOKUP, 2 XATTROPs.

Comment 1 Karthik U S 2021-10-19 13:36:41 UTC
Upstream patch: https://github.com/gluster/glusterfs/pull/2627

Comment 2 SATHEESARAN 2021-10-20 11:54:35 UTC
Hi Karthik,

This patch is provides the substantial improvement of network calls,
but does that really worthy for our customers. ?

Because, I see this statement in the patch:
<snip>
 On my setup for full heal of a directory with 100000 entries with this fix it takes 1.5 minutes as opposed to 2 minutes.
</snip>

So for 100000 entries, we see 0.5 minute improvement which again is not a great improvement, but really a good improvement.

Do we have a strong factors that this patch brings in to accomodate this patch for RHGS 3.5.7 ?

Comment 3 Karthik U S 2021-10-28 09:00:20 UTC
Hi Sas,

The improvement that this patch brings in depends on the workload and the amount of time that the outage of node(s) lasts. Higher the time of outage during the entry transaction heavy workload gives better improvement with this patch.

Regards,
Karthik


Note You need to log in before you can comment on or make changes to this bug.