Red Hat Bugzilla – Bug 1568758
Block delete times out for blocks created of very large size
Last modified: 2018-10-31 00:48:40 EDT
Description of problem: ====================== Had a 6node cluster, with a 1*3 volume 'ozone' created on node1, node2 and node3. The setup was brickmux enabled, and the volume option was set to group 'gluster-block' There were quite a few (<10) blocks created while verifying bz 1514344 and bz 1545049. Started deleting the blocks one by one, and the 'gluster-block delete' command timed out for block 'ob10'. Unable to see anything amiss in the logs /var/log/gluster-block or in /var/log/messages, rebooted all the services, and tried it another time. This time the command succeeded for few blocks and again timed out for block 'ob9'. Trying to relate the similarity between the two blocks that timed out, they are of fairly large sizes - 1E and 1P. The entire system slows down after the command fails, I suppose because internally it keeps trying to do what was intended, and is not able to get through.. In other words, every 'gluster-block' command given after the failure takes a 2-3 mins to show the output. Restarting the gluster-block daemon does get the system back to normal. I was not able to gather much from the logs, maybe we need better logging there.. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.12.2-7 tcmu-runner-1.2.0-18 gluster-block-0.2.1-17 How reproducible: ================ 3:3 Steps to Reproduce: ================== 1. Create a block of 1K, 1M, 1G, 1T, 1P, 1E on a replica 3 volume 2. Execute 'gluster-block' delete on all the above created blocks Actual results: ============== Block delete succeeds for blocks of sizes 1K, 1M, 1G, 1T, but times out on 1P and 1E. Expected results: ================ Either block delete should succeed immediately, or it should disallow creation of such large blocks if it affects the functionality. Additional info: =============== Sosreports and gluster-block logs will be copied at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/swetas/<bugnumber>
Sweta, Could you disable sharding and redo this test? I am suspecting that this has to do with sharding xlator taking lot of time to delete the individual shards. Krutika is working on doing unlinks in background as part of https://bugzilla.redhat.com/show_bug.cgi?id=1520882 for 3.4.0.
(In reply to Pranith Kumar K from comment #3) > Sweta, > Could you disable sharding and redo this test? I am suspecting that this > has to do with sharding xlator taking lot of time to delete the individual > shards. Krutika is working on doing unlinks in background as part of > https://bugzilla.redhat.com/show_bug.cgi?id=1520882 for 3.4.0. Please note that you need to both create and delete the block volume while sharding is disabled for us to confirm that the delay was introduced because of sharding.
Note: The fixes to this issue have been merged upstream: https://review.gluster.org/#/q/status:merged+project:glusterfs+branch:master+topic:ref-1568521 Moving this bug to POST state.