1463980 – RFE: Need to optimize on time taken by heal info to display o/p when large number of entries exist

Bug 1463980 - RFE: Need to optimize on time taken by heal info to display o/p when large number of entries exist

Summary: RFE: Need to optimize on time taken by heal info to display o/p when large nu...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	disperse
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Ashish Pandey
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1880109 (view as bug list)
Depends On:
Blocks:	1605066
TreeView+	depends on / blocked

Reported:	2017-06-22 08:24 UTC by Nag Pavan Chilakam
Modified:	2024-03-25 15:00 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1605066 (view as bug list)
Environment:
Last Closed:	2019-11-18 15:31:09 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2017-06-22 08:24:48 UTC

Description of problem:
=======================
on a simple 1x(4+2) ec volume if there are about 60k files to be healed, the o/p takes quite some time to display the total number of entries.
The problem is for knowing the total number of entries, we got to wait for all the scanning in xattrops to get over
I feel there can be some optimization here.
Below are very crude approaches, take them as more of a hint to work on the problems
1) tag and hardlink the files requiring heal into another directory, so that everytime we don't have to scan the xattrops for fetching heal info list
(yes, there will be corner cases and we may even have a better approach on discussion)
2)note down the heal info list by some background mechanism every few minutes and display the o/p to the user when requested. Yes the data can be stale or not   realtime, but that can be displayed by giving the user a warn saying the o/p is a few minutes old. But that would give the admin a context of the approx no. of files requiring heal
3)When we know a brick is down , it means the xattrops of all bricks up are marked for the those files modified during that time and hence heal info will mention all of the files, but if we can capture the state of bricks down at one time, we don't have to scan all bricks xattrops
Eg: In below case b1 and b2 are down and the IOs were going on and completed.
That means all the bricks have same heal pendings list. 
We don't have to scan to get the same o/p for all bricks and hence save on time.
I understand that this is a happy case and we need to consider many other options, like cyclic brick down fashion and other things such as the list need not be same. But we can work on such optimization
[root@dhcp35-45 ~]# gluster v heal ecv info|grep ntries
Number of entries: -
Number of entries: -
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943





It took me 10min for getting below o/p with no IOs
[root@dhcp35-45 ~]# time gluster v heal ecv info|grep ntries
Number of entries: -
Number of entries: -
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943

real	10m8.552s
user	2m35.365s
sys	3m17.962s



We can surely optimize, ofcourse based on other priorities and bugs of higher importance

Version-Release number of selected component (if applicable):
======
3.8.4-28

Comment 2 Nag Pavan Chilakam 2017-06-22 08:25:59 UTC

marking for 3.3.0-beyond as it is an RFE and we can't prioritize for this release

Comment 9 Ashish Pandey 2020-09-24 11:34:22 UTC

*** Bug 1880109 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.