Bug 1568758

Summary: Block delete times out for blocks created of very large size
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sweta Anandpara <sanandpa>
Component: shardingAssignee: Krutika Dhananjay <kdhananj>
Status: CLOSED ERRATA QA Contact: Sweta Anandpara <sanandpa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: amukherj, kdhananj, pkarampu, prasanna.kalever, rhinduja, rhs-bugs, sanandpa, sasundar, sheggodu, storage-qa-internal, vdas, xiubli
Target Milestone: ---Keywords: Rebase
Target Release: RHGS 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-6.0-2 Doc Type: Bug Fix
Doc Text:
Deleting a file with a large number of shards timed out because unlink operations occurred on all shards in parallel, which led to contention on the .shard directory. Timeouts resulted in failed deletions and stale shards remaining in the .shard directory. Shard deletion is now a background process that deletes one batch of shards at a time, to control contention on the .shard directory and prevent timeouts. The size of shard deletion batches is controlled with the features.shard-deletion-rate option, which is set to 100 by default.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-30 12:19:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1520882    
Bug Blocks: 1503143, 1696807    
Attachments:
Description Flags
Verification logs on rhgs3.5.0 none

Description Sweta Anandpara 2018-04-18 08:59:53 UTC
Description of problem:
======================

Had a 6node cluster, with a 1*3 volume 'ozone' created on node1, node2 and node3. The setup was brickmux enabled, and the volume option was set to group 'gluster-block'

There were quite a few (<10) blocks created while verifying bz 1514344 and bz 1545049. Started deleting the blocks one by one, and the 'gluster-block delete' command timed out for block 'ob10'. Unable to see anything amiss in the logs /var/log/gluster-block or in /var/log/messages, rebooted all the services, and tried it another time. This time the command succeeded for few blocks and again timed out for block 'ob9'. 

Trying to relate the similarity between the two blocks that timed out, they are of fairly large sizes - 1E and 1P. 
The entire system slows down after the command fails, I suppose because internally it keeps trying to do what was intended, and is not able to get through.. In other words, every 'gluster-block' command given after the failure takes a 2-3 mins to show the output. Restarting the gluster-block daemon does get the system back to normal. 

I was not able to gather much from the logs, maybe we need better logging there..

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.12.2-7
tcmu-runner-1.2.0-18
gluster-block-0.2.1-17


How reproducible:
================
3:3


Steps to Reproduce:
==================
1. Create a block of 1K, 1M, 1G, 1T, 1P, 1E on a replica 3 volume
2. Execute 'gluster-block' delete on all the above created blocks


Actual results:
==============
Block delete succeeds for blocks of sizes 1K, 1M, 1G, 1T, but times out on 1P and 1E.


Expected results:
================
Either block delete should succeed immediately, or it should disallow creation of such large blocks if it affects the functionality. 



Additional info:
===============
Sosreports and gluster-block logs will be copied at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/swetas/<bugnumber>

Comment 3 Pranith Kumar K 2018-04-18 09:08:58 UTC
Sweta,
   Could you disable sharding and redo this test? I am suspecting that this has to do with sharding xlator taking lot of time to delete the individual shards. Krutika is working on doing unlinks in background as part of https://bugzilla.redhat.com/show_bug.cgi?id=1520882 for 3.4.0.

Comment 4 Pranith Kumar K 2018-04-18 09:09:59 UTC
(In reply to Pranith Kumar K from comment #3)
> Sweta,
>    Could you disable sharding and redo this test? I am suspecting that this
> has to do with sharding xlator taking lot of time to delete the individual
> shards. Krutika is working on doing unlinks in background as part of
> https://bugzilla.redhat.com/show_bug.cgi?id=1520882 for 3.4.0.

Please note that you need to both create and delete the block volume while sharding is disabled for us to confirm that the delay was introduced because of sharding.

Comment 14 Krutika Dhananjay 2018-10-31 04:48:40 UTC
Note: The fixes to this issue have been merged upstream: https://review.gluster.org/#/q/status:merged+project:glusterfs+branch:master+topic:ref-1568521

Moving this bug to POST state.

Comment 18 SATHEESARAN 2018-11-29 12:14:43 UTC
The fix for this issue is already merged and the other bug BZ 1520882 is ON_QA.
It would be more relevant to have this bug too on ON_QA, as the fix addresses this issue too.

Why is that, this bug is not moved ON_QA ?

Comment 19 Krutika Dhananjay 2018-11-29 13:19:57 UTC
(In reply to SATHEESARAN from comment #18)
> The fix for this issue is already merged and the other bug BZ 1520882 is
> ON_QA.
> It would be more relevant to have this bug too on ON_QA, as the fix
> addresses this issue too.
> 
> Why is that, this bug is not moved ON_QA ?

Ok. I don't completely understand the process, but shouldn't this be done only when all 3 acks are in place? Let me know if that is not the case.

-Krutika

Comment 29 Sweta Anandpara 2019-07-02 07:17:40 UTC
Created attachment 1586539 [details]
Verification logs on rhgs3.5.0

Comment 37 errata-xmlrpc 2019-10-30 12:19:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249