Bug 1522624 - [GSS]shard files present even after deleting vm from the rhev
Summary: [GSS]shard files present even after deleting vm from the rhev
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhhi
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: RHHI-V 1.5.z Async
Assignee: Sahina Bose
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On: 1520882 1568521
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-06 06:37 UTC by Sahina Bose
Modified: 2019-05-20 04:54 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.12.2-27
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1520882
Environment:
Last Closed: 2019-05-20 04:54:37 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1568758 0 unspecified CLOSED Block delete times out for blocks created of very large size 2021-02-22 00:41:40 UTC

Internal Links: 1568758

Description Sahina Bose 2017-12-06 06:37:04 UTC
+++ This bug was initially created as a clone of Bug #1520882 +++

Description of problem:

ghost shard files present even after deleting vm from the rhev resulting more space utilized than it should actually have.

Version-Release number of selected component (if applicable):

glusterfs-3.8.4-18.6.el7rhgs.x86_64 


How reproducible:

Customer environment


Actual results:

shard files available even after deleting vm

Expected results:

shard files should have deleted after vm deletion.

Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-12-05 06:46:06 EST ---

This bug is automatically being proposed for the release of Red Hat Gluster Storage 3 under active development and open for bug fixes, by setting the release flag 'rhgs‑3.4.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Abhishek Kumar on 2017-12-05 06:48:08 EST ---

Account Name : ITER Organization / 1222270
Case Number  : 01978821
Severity     : Sev 2

--- Additional comment from Abhishek Kumar on 2017-12-05 06:49:05 EST ---

$ gluster volume info rhs-virt-1

Volume Name: rhs-virt-1
Type: Replicate
Volume ID: 0c0e96bf-0611-4fc4-a1e3-6f72da38e425
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 4501FS-STO-3601.codac.iter.org:/rhs_brick1
Brick2: 4501FS-STO-3601.codac.iter.org:/rhs_brick2
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: fixed
cluster.server-quorum-type: none
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
cluster.quorum-count: 1


$ gluster volume info nfs-virt-1

Volume Name: nfs-virt-1
Type: Replicate
Volume ID: ee2d9a58-c435-4f0d-a8b7-6c846c465a3d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 4501FS-STO-3601.codac.iter.org:/rhs_nfs1
Brick2: 4501FS-STO-3601.codac.iter.org:/rhs_nfs2
Options Reconfigured:
nfs.disable: off
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: fixed
cluster.server-quorum-type: none
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
cluster.quorum-count: 1

--- Additional comment from Abhishek Kumar on 2017-12-05 07:05:24 EST ---

logs are available at below location on collab.

 /cases/01978821 

http://collab-shell.usersys.redhat.com/01978821/

--- Additional comment from Abhishek Kumar on 2017-12-05 07:11:29 EST ---

Root cause of the issue :

Presence of 3 orphaned shard set in .shard location caused less free available space on the volume.

All these 3 shard set almost consist of around 630 GB of space.

Reason for these shard not getting deleted after VM got deleted :
~~~
[2017-11-20 11:52:48.788824] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 1725: UNLINK() /1d7b3d24-88d4-4eba-a0cf-9bd3625acbf6/images/_remove_me_d9a3fbce-1abf-46b4-ab88-ae38a61bb9f9/e47a4bfe-84d8-4f29-af78-7730e7ec1008 => -1 (Transport endpoint is not connected)
~~~
and a lot of errors of the kind 'Transport endpoint is not connected' in the fuse mount log from both the replicas, it is clear that the UNLINK operation (which is nothing but disk deletion) failed midway before all shards could be cleaned up.

--- Additional comment from Kaushal on 2017-12-05 07:54:32 EST ---

FYI, the customer no longer has taken down the RHEV+RHGS setup that hit this issue (because of another problem), and is rebuilding it again. So no more requests for data for this issue are possible.

I would suggest lowering the severity, as we already have a cause, and doesn't affect the customer at present.

Comment 5 SATHEESARAN 2018-11-29 12:10:35 UTC
Dependent RHGS bug is in ON_QA.
Moving this bug to ON_QA too

Comment 6 SATHEESARAN 2018-12-10 13:11:41 UTC
Verified with glusterfs-3.12.2-31.el7rhgs and RHV 4.2.7-1

1. Created a 2 TB disk on the gluster SD backed with 64MB sharded gluster volume
2. Created a filesystem on the disk and populated data almost around 2TB.
3. Deleted the VM image from RHV storage domain.

Observed that there are no hangs, no issues with SD or hosts, all the shards are deleted.
No ghost shards left behind.


Note You need to log in before you can comment on or make changes to this bug.