1643559 – Heal info summary taking very long time(more than 5 hrs) hence rendering its purpose not useful

Bug 1643559 - Heal info summary taking very long time(more than 5 hrs) hence rendering its purpose not useful

Summary: Heal info summary taking very long time(more than 5 hrs) hence rendering its ...

Keywords:
Status:	CLOSED DUPLICATE of bug 1721355
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Ravishankar N
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-10-26 14:34 UTC by Nag Pavan Chilakam
Modified:	2020-01-10 01:50 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-12-03 08:34:01 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2018-10-26 14:34:31 UTC

Description of problem:
=======================
Heal info summary is taking significantly long time to display the summary o/p
In my test bed, it took more than 5 hrs to display the o/p
(was doing upgrade test)
While I understand it depends on number of heals pending, but 5hr+ is simply not acceptable.
I had more than 5Lakh files for healing, when heal info summary was triggered, but by the time heal info summary o/p was displayed, there were only 50K files pending heals

Summary is supposed to give me a lucid and crisp o/p of heals pending(even if they are approx numbers, it should be ok)
However, summary o/p is infact rendered useless for my purpose

We need to understand how summary is being calculated, because if we are continuously scanning entries while they are getting healed, that could cause the delay, and the final o/p takes huge time

[root@dhcp35-140 ~]# for i in $(gluster v list);do echo "#####  vol $i ###############";time gluster v heal $i info|grep ntries;echo "#####################";done
#####  vol arbo ###############
Number of entries: 0
Number of entries: 0
Number of entries: 0
Number of entries: 0
Number of entries: 0
Number of entries: 0

real	0m1.540s
user	0m0.216s
sys	0m0.220s
#####################
#####  vol basevol-10 ###############
Number of entries: 52193
Number of entries: 0
Number of entries: 0

real	301m18.394s
user	4m50.940s
sys	6m20.620s




Version-Release number of selected component (if applicable):
====================
3.4.0->3.4.1 upgrade(and command issued on 3.4.1)

How reproducible:
==================
1/1
but should be fairly reproducible

Steps to Reproduce:
1.create a 6 node brick mux setup
2.created 6 1x3 vols and 1 2x(2+1) arbiter vol
3.mounted the volumes on one client each
4. pumping IOs(mostly untars)
5. started to upgrade 1 or 2 nodes at a time(made sure the maintenance nodes didn't have same replica pairs)
6. issued heal info summary while upgrade cycle was happening(ie 4 nodes were in 3.4.1 and 2 on 3.4.0)

Actual results:
===============
heal info summary is taking very long time

Expected results:
=================
Summary is supposed to give me a lucid and crisp o/p of heals pending(even if they are approx numbers, it should be ok)
However, summary o/p is infact rendered useless for my purpose

We need to understand how summary is being calculated, because if we are continuously scanning entries while they are getting healed, that could cause the delay, and the final o/p takes huge time

Comment 2 Atin Mukherjee 2018-11-11 18:28:53 UTC

Ravi/Karthik - Can one of you have a look at this BZ and do the first pass analysis?

Comment 3 Nag Pavan Chilakam 2018-11-28 11:23:57 UTC

proposing this for 3.4.3, as it the performance leaves the customer with bad experience

Comment 7 Pranith Kumar K 2019-12-03 08:34:01 UTC

This bug is being fixed as part of https://bugzilla.redhat.com/show_bug.cgi?id=1721355. If this issue is seen even after the fix, please feel free to re-open this bug.

*** This bug has been marked as a duplicate of bug 1721355 ***

Note You need to log in before you can comment on or make changes to this bug.