Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1568758 - Block delete times out for blocks created of very large size
Block delete times out for blocks created of very large size
Status: POST
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: sharding (Show other bugs)
3.4
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Krutika Dhananjay
Sweta Anandpara
:
Depends On: 1520882
Blocks: 1503143
  Show dependency treegraph
 
Reported: 2018-04-18 04:59 EDT by Sweta Anandpara
Modified: 2018-10-31 00:48 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: When a large block file with large number of shards is deleted, shard translator synchronously sends unlink operation on all the shards in parallel. This in turn causes replicate translator to acquire locks on .shard directory in parallel. Consequence: After a while, when a huge number of locks get accumulated in locks translator, the search for a possible matching lock gets slower, sometimes taking several minutes to complete, causing timeouts and leading to disconnects and subsequent failure of file deletion potentially also leading to stale shards being left out under .shard. Workaround (if any): For one, it is important that the customer uses 64MB as the shard block size to reduce, if not eliminate the possibility of timeouts. The lower the shard-block-size, the higher the chances of timeouts. And then if the issue is still hit with 64MB shards, the least that can be done is to identify ghost shards and delete from the backend of each brick after figuring out the gfid of the image on which deletion failed. Result: This way all the space consumed by ghost shards can be reclaimed.
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Sweta Anandpara 2018-04-18 04:59:53 EDT
Description of problem:
======================

Had a 6node cluster, with a 1*3 volume 'ozone' created on node1, node2 and node3. The setup was brickmux enabled, and the volume option was set to group 'gluster-block'

There were quite a few (<10) blocks created while verifying bz 1514344 and bz 1545049. Started deleting the blocks one by one, and the 'gluster-block delete' command timed out for block 'ob10'. Unable to see anything amiss in the logs /var/log/gluster-block or in /var/log/messages, rebooted all the services, and tried it another time. This time the command succeeded for few blocks and again timed out for block 'ob9'. 

Trying to relate the similarity between the two blocks that timed out, they are of fairly large sizes - 1E and 1P. 
The entire system slows down after the command fails, I suppose because internally it keeps trying to do what was intended, and is not able to get through.. In other words, every 'gluster-block' command given after the failure takes a 2-3 mins to show the output. Restarting the gluster-block daemon does get the system back to normal. 

I was not able to gather much from the logs, maybe we need better logging there..

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.12.2-7
tcmu-runner-1.2.0-18
gluster-block-0.2.1-17


How reproducible:
================
3:3


Steps to Reproduce:
==================
1. Create a block of 1K, 1M, 1G, 1T, 1P, 1E on a replica 3 volume
2. Execute 'gluster-block' delete on all the above created blocks


Actual results:
==============
Block delete succeeds for blocks of sizes 1K, 1M, 1G, 1T, but times out on 1P and 1E.


Expected results:
================
Either block delete should succeed immediately, or it should disallow creation of such large blocks if it affects the functionality. 



Additional info:
===============
Sosreports and gluster-block logs will be copied at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/swetas/<bugnumber>
Comment 3 Pranith Kumar K 2018-04-18 05:08:58 EDT
Sweta,
   Could you disable sharding and redo this test? I am suspecting that this has to do with sharding xlator taking lot of time to delete the individual shards. Krutika is working on doing unlinks in background as part of https://bugzilla.redhat.com/show_bug.cgi?id=1520882 for 3.4.0.
Comment 4 Pranith Kumar K 2018-04-18 05:09:59 EDT
(In reply to Pranith Kumar K from comment #3)
> Sweta,
>    Could you disable sharding and redo this test? I am suspecting that this
> has to do with sharding xlator taking lot of time to delete the individual
> shards. Krutika is working on doing unlinks in background as part of
> https://bugzilla.redhat.com/show_bug.cgi?id=1520882 for 3.4.0.

Please note that you need to both create and delete the block volume while sharding is disabled for us to confirm that the delay was introduced because of sharding.
Comment 14 Krutika Dhananjay 2018-10-31 00:48:40 EDT
Note: The fixes to this issue have been merged upstream: https://review.gluster.org/#/q/status:merged+project:glusterfs+branch:master+topic:ref-1568521

Moving this bug to POST state.

Note You need to log in before you can comment on or make changes to this bug.