Description of problem: ======================= Heal info summary is taking significantly long time to display the summary o/p In my test bed, it took more than 5 hrs to display the o/p (was doing upgrade test) While I understand it depends on number of heals pending, but 5hr+ is simply not acceptable. I had more than 5Lakh files for healing, when heal info summary was triggered, but by the time heal info summary o/p was displayed, there were only 50K files pending heals Summary is supposed to give me a lucid and crisp o/p of heals pending(even if they are approx numbers, it should be ok) However, summary o/p is infact rendered useless for my purpose We need to understand how summary is being calculated, because if we are continuously scanning entries while they are getting healed, that could cause the delay, and the final o/p takes huge time [root@dhcp35-140 ~]# for i in $(gluster v list);do echo "##### vol $i ###############";time gluster v heal $i info|grep ntries;echo "#####################";done ##### vol arbo ############### Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 real 0m1.540s user 0m0.216s sys 0m0.220s ##################### ##### vol basevol-10 ############### Number of entries: 52193 Number of entries: 0 Number of entries: 0 real 301m18.394s user 4m50.940s sys 6m20.620s Version-Release number of selected component (if applicable): ==================== 3.4.0->3.4.1 upgrade(and command issued on 3.4.1) How reproducible: ================== 1/1 but should be fairly reproducible Steps to Reproduce: 1.create a 6 node brick mux setup 2.created 6 1x3 vols and 1 2x(2+1) arbiter vol 3.mounted the volumes on one client each 4. pumping IOs(mostly untars) 5. started to upgrade 1 or 2 nodes at a time(made sure the maintenance nodes didn't have same replica pairs) 6. issued heal info summary while upgrade cycle was happening(ie 4 nodes were in 3.4.1 and 2 on 3.4.0) Actual results: =============== heal info summary is taking very long time Expected results: ================= Summary is supposed to give me a lucid and crisp o/p of heals pending(even if they are approx numbers, it should be ok) However, summary o/p is infact rendered useless for my purpose We need to understand how summary is being calculated, because if we are continuously scanning entries while they are getting healed, that could cause the delay, and the final o/p takes huge time
Ravi/Karthik - Can one of you have a look at this BZ and do the first pass analysis?
proposing this for 3.4.3, as it the performance leaves the customer with bad experience
This bug is being fixed as part of https://bugzilla.redhat.com/show_bug.cgi?id=1721355. If this issue is seen even after the fix, please feel free to re-open this bug. *** This bug has been marked as a duplicate of bug 1721355 ***