Bug 1463980

Summary: RFE: Need to optimize on time taken by heal info to display o/p when large number of entries exist
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: disperseAssignee: Ashish Pandey <aspandey>
Status: CLOSED CURRENTRELEASE QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: medium Docs Contact:
Priority: medium    
Version: rhgs-3.3CC: anrobins, jahernan, rhs-bugs, storage-qa-internal
Target Milestone: ---Keywords: FutureFeature, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1605066 (view as bug list) Environment:
Last Closed: 2019-11-18 15:31:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1605066    

Description Nag Pavan Chilakam 2017-06-22 08:24:48 UTC
Description of problem:
=======================
on a simple 1x(4+2) ec volume if there are about 60k files to be healed, the o/p takes quite some time to display the total number of entries.
The problem is for knowing the total number of entries, we got to wait for all the scanning in xattrops to get over
I feel there can be some optimization here.
Below are very crude approaches, take them as more of a hint to work on the problems
1) tag and hardlink the files requiring heal into another directory, so that everytime we don't have to scan the xattrops for fetching heal info list
(yes, there will be corner cases and we may even have a better approach on discussion)
2)note down the heal info list by some background mechanism every few minutes and display the o/p to the user when requested. Yes the data can be stale or not   realtime, but that can be displayed by giving the user a warn saying the o/p is a few minutes old. But that would give the admin a context of the approx no. of files requiring heal
3)When we know a brick is down , it means the xattrops of all bricks up are marked for the those files modified during that time and hence heal info will mention all of the files, but if we can capture the state of bricks down at one time, we don't have to scan all bricks xattrops
Eg: In below case b1 and b2 are down and the IOs were going on and completed.
That means all the bricks have same heal pendings list. 
We don't have to scan to get the same o/p for all bricks and hence save on time.
I understand that this is a happy case and we need to consider many other options, like cyclic brick down fashion and other things such as the list need not be same. But we can work on such optimization
[root@dhcp35-45 ~]# gluster v heal ecv info|grep ntries
Number of entries: -
Number of entries: -
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943





It took me 10min for getting below o/p with no IOs
[root@dhcp35-45 ~]# time gluster v heal ecv info|grep ntries
Number of entries: -
Number of entries: -
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943

real	10m8.552s
user	2m35.365s
sys	3m17.962s



We can surely optimize, ofcourse based on other priorities and bugs of higher importance

Version-Release number of selected component (if applicable):
======
3.8.4-28

Comment 2 Nag Pavan Chilakam 2017-06-22 08:25:59 UTC
marking for 3.3.0-beyond as it is an RFE and we can't prioritize for this release

Comment 9 Ashish Pandey 2020-09-24 11:34:22 UTC
*** Bug 1880109 has been marked as a duplicate of this bug. ***