+++ This bug was initially created as a clone of Bug #1520882 +++ Description of problem: ghost shard files present even after deleting vm from the rhev resulting more space utilized than it should actually have. Version-Release number of selected component (if applicable): glusterfs-3.8.4-18.6.el7rhgs.x86_64 How reproducible: Customer environment Actual results: shard files available even after deleting vm Expected results: shard files should have deleted after vm deletion. Additional info: --- Additional comment from Red Hat Bugzilla Rules Engine on 2017-12-05 06:46:06 EST --- This bug is automatically being proposed for the release of Red Hat Gluster Storage 3 under active development and open for bug fixes, by setting the release flag 'rhgs‑3.4.0' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from Abhishek Kumar on 2017-12-05 06:48:08 EST --- Account Name : ITER Organization / 1222270 Case Number : 01978821 Severity : Sev 2 --- Additional comment from Abhishek Kumar on 2017-12-05 06:49:05 EST --- $ gluster volume info rhs-virt-1 Volume Name: rhs-virt-1 Type: Replicate Volume ID: 0c0e96bf-0611-4fc4-a1e3-6f72da38e425 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 4501FS-STO-3601.codac.iter.org:/rhs_brick1 Brick2: 4501FS-STO-3601.codac.iter.org:/rhs_brick2 Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off performance.low-prio-threads: 32 network.remote-dio: enable cluster.eager-lock: enable cluster.quorum-type: fixed cluster.server-quorum-type: none cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.quorum-count: 1 $ gluster volume info nfs-virt-1 Volume Name: nfs-virt-1 Type: Replicate Volume ID: ee2d9a58-c435-4f0d-a8b7-6c846c465a3d Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 4501FS-STO-3601.codac.iter.org:/rhs_nfs1 Brick2: 4501FS-STO-3601.codac.iter.org:/rhs_nfs2 Options Reconfigured: nfs.disable: off performance.readdir-ahead: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off performance.low-prio-threads: 32 network.remote-dio: enable cluster.eager-lock: enable cluster.quorum-type: fixed cluster.server-quorum-type: none cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.quorum-count: 1 --- Additional comment from Abhishek Kumar on 2017-12-05 07:05:24 EST --- logs are available at below location on collab. /cases/01978821 http://collab-shell.usersys.redhat.com/01978821/ --- Additional comment from Abhishek Kumar on 2017-12-05 07:11:29 EST --- Root cause of the issue : Presence of 3 orphaned shard set in .shard location caused less free available space on the volume. All these 3 shard set almost consist of around 630 GB of space. Reason for these shard not getting deleted after VM got deleted : ~~~ [2017-11-20 11:52:48.788824] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 1725: UNLINK() /1d7b3d24-88d4-4eba-a0cf-9bd3625acbf6/images/_remove_me_d9a3fbce-1abf-46b4-ab88-ae38a61bb9f9/e47a4bfe-84d8-4f29-af78-7730e7ec1008 => -1 (Transport endpoint is not connected) ~~~ and a lot of errors of the kind 'Transport endpoint is not connected' in the fuse mount log from both the replicas, it is clear that the UNLINK operation (which is nothing but disk deletion) failed midway before all shards could be cleaned up. --- Additional comment from Kaushal on 2017-12-05 07:54:32 EST --- FYI, the customer no longer has taken down the RHEV+RHGS setup that hit this issue (because of another problem), and is rebuilding it again. So no more requests for data for this issue are possible. I would suggest lowering the severity, as we already have a cause, and doesn't affect the customer at present.
Dependent RHGS bug is in ON_QA. Moving this bug to ON_QA too
Verified with glusterfs-3.12.2-31.el7rhgs and RHV 4.2.7-1 1. Created a 2 TB disk on the gluster SD backed with 64MB sharded gluster volume 2. Created a filesystem on the disk and populated data almost around 2TB. 3. Deleted the VM image from RHV storage domain. Observed that there are no hangs, no issues with SD or hosts, all the shards are deleted. No ghost shards left behind.