Bug 1463980 - RFE: Need to optimize on time taken by heal info to display o/p when large number of entries exist
RFE: Need to optimize on time taken by heal info to display o/p when large nu...
Status: ASSIGNED
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: disperse (Show other bugs)
3.3
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Ashish Pandey
nchilaka
: FutureFeature, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-22 04:24 EDT by nchilaka
Modified: 2017-09-28 13:14 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description nchilaka 2017-06-22 04:24:48 EDT
Description of problem:
=======================
on a simple 1x(4+2) ec volume if there are about 60k files to be healed, the o/p takes quite some time to display the total number of entries.
The problem is for knowing the total number of entries, we got to wait for all the scanning in xattrops to get over
I feel there can be some optimization here.
Below are very crude approaches, take them as more of a hint to work on the problems
1) tag and hardlink the files requiring heal into another directory, so that everytime we don't have to scan the xattrops for fetching heal info list
(yes, there will be corner cases and we may even have a better approach on discussion)
2)note down the heal info list by some background mechanism every few minutes and display the o/p to the user when requested. Yes the data can be stale or not   realtime, but that can be displayed by giving the user a warn saying the o/p is a few minutes old. But that would give the admin a context of the approx no. of files requiring heal
3)When we know a brick is down , it means the xattrops of all bricks up are marked for the those files modified during that time and hence heal info will mention all of the files, but if we can capture the state of bricks down at one time, we don't have to scan all bricks xattrops
Eg: In below case b1 and b2 are down and the IOs were going on and completed.
That means all the bricks have same heal pendings list. 
We don't have to scan to get the same o/p for all bricks and hence save on time.
I understand that this is a happy case and we need to consider many other options, like cyclic brick down fashion and other things such as the list need not be same. But we can work on such optimization
[root@dhcp35-45 ~]# gluster v heal ecv info|grep ntries
Number of entries: -
Number of entries: -
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943





It took me 10min for getting below o/p with no IOs
[root@dhcp35-45 ~]# time gluster v heal ecv info|grep ntries
Number of entries: -
Number of entries: -
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943

real	10m8.552s
user	2m35.365s
sys	3m17.962s



We can surely optimize, ofcourse based on other priorities and bugs of higher importance

Version-Release number of selected component (if applicable):
======
3.8.4-28
Comment 2 nchilaka 2017-06-22 04:25:59 EDT
marking for 3.3.0-beyond as it is an RFE and we can't prioritize for this release

Note You need to log in before you can comment on or make changes to this bug.