1994593 – Granular entry self-heal is taking more time than full entry self heal for creation and deletion workloads

Bug 1994593 - Granular entry self-heal is taking more time than full entry self heal for creation and deletion workloads

Summary: Granular entry self-heal is taking more time than full entry self heal for cr...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.5.z Batch Update 6
Assignee:	Karthik U S
QA Contact:	Pranav Prakash
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-17 13:37 UTC by Karthik U S
Modified:	2022-01-27 14:26 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-6.0-60
Doc Type:	Bug Fix
Doc Text:	Previously, granular entry self heal took more time than the full entry self heal when there were many entry self heals pending due to the creation and deletion heavy workloads. With this update, the extra lookup to delete the stale index is removed from the code path of the granular entry self heal, which improves the heal performance in the creation and deletion heavy workloads when the granular entry self heal is enabled.
Clone Of:
Environment:
Last Closed:	2022-01-27 14:26:32 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2022:0315	0	None	None	None	2022-01-27 14:26:56 UTC

Description Karthik U S 2021-08-17 13:37:17 UTC

Description of the issue:
Clone of upstream issue: https://github.com/gluster/glusterfs/issues/2611

Number of lookups in granular entry self-heal is very high compared to full entry self-heal. Here are the numbers for the following workload:

Once a replica 3 volume(r3) is created and mounted:
gluster volume profile r3 start
pushd /mnt/r3
mkdir d
cd d
kill -9 $(gluster volume status | grep Brick | awk '{print $NF}' | head -1)
for i in {1..100000}; do touch $i; done
gluster volume profile r3 info incremental
gluster volume start r3 force
popd
Once heal completes, take one more profile info incremental.

base-full-heal.txt:      52.72  38828.24  ns  20130.00  ns  31109878.00  ns  300024   LOOKUP
base-full-heal.txt:      65.47  41799.10  ns  10173.00  ns  2497157.00   ns  400022   LOOKUP
base-full-heal.txt:      66.68  70149.87  ns  34635.00  ns  1554337.00   ns  200017   LOOKUP

base-granular-heal.txt:  72.06  57450.77  ns  20245.00  ns  23499908.00  ns  800010   LOOKUP
base-granular-heal.txt:  79.99  57926.16  ns  12702.00  ns  12933708.00  ns  900008   LOOKUP
base-granular-heal.txt:  82.11  69301.03  ns  27820.00  ns  12533029.00  ns  700006   LOOKUP

This is happening because there is a check for stale index before triggering the actual heal which is triggering extra lookups. This lookup happens on AFR xlator which will try metadata heal etc so the number of lookups increases even more.

Comment 1 Karthik U S 2021-08-17 13:40:38 UTC

Upstream patch: https://github.com/gluster/glusterfs/pull/2612

Comment 19 errata-xmlrpc 2022-01-27 14:26:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0315

Note You need to log in before you can comment on or make changes to this bug.