2015551 – Perf: Entry self-heal does xattrops unnecessarily in many cases

Bug 2015551 - Perf: Entry self-heal does xattrops unnecessarily in many cases

Summary: Perf: Entry self-heal does xattrops unnecessarily in many cases

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Karthik U S
QA Contact:	Vivek Das
Docs Contact:
URL:
Whiteboard:
Depends On:	2073919
Blocks:
TreeView+	depends on / blocked

Reported:	2021-10-19 13:33 UTC by Karthik U S
Modified:	2022-10-12 09:44 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-10-12 09:44:54 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Karthik U S 2021-10-19 13:33:20 UTC

Description of issue:
Clone of upstream issue: https://github.com/gluster/glusterfs/issues/2626

While healing a name, afr is doing an xattrop to do new entry pending heals marking for files that are created as part of heal. For this

- Lookup on sink brick is done to check if the gfid exists or not
- If it is not present then sequentially new entry marking is done on each of the source bricks before creating the file.

In most cases by the time afr detects that it needs to do new entry marking, it would have already done a named lookup which would lead to possession of the xattrs to make the decision without the lookup. If all source bricks contain file with pending marking on the sink brick, there is no need for new entry marking. This would save 3 sequential network calls LOOKUP, 2 XATTROPs.

Comment 1 Karthik U S 2021-10-19 13:36:41 UTC

Upstream patch: https://github.com/gluster/glusterfs/pull/2627

Comment 2 SATHEESARAN 2021-10-20 11:54:35 UTC

Hi Karthik,

This patch is provides the substantial improvement of network calls,
but does that really worthy for our customers. ?

Because, I see this statement in the patch:
<snip>
 On my setup for full heal of a directory with 100000 entries with this fix it takes 1.5 minutes as opposed to 2 minutes.
</snip>

So for 100000 entries, we see 0.5 minute improvement which again is not a great improvement, but really a good improvement.

Do we have a strong factors that this patch brings in to accomodate this patch for RHGS 3.5.7 ?

Comment 3 Karthik U S 2021-10-28 09:00:20 UTC

Hi Sas,

The improvement that this patch brings in depends on the workload and the amount of time that the outage of node(s) lasts. Higher the time of outage during the entry transaction heavy workload gives better improvement with this patch.

Regards,
Karthik

Note You need to log in before you can comment on or make changes to this bug.