Bug 866908

Summary:	[RHEV-RHS] "gluster volume heal <vol_name> info" command outputs entries which are in ".glusterfs" and ".landfill" directory
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	spandura
Component:	glusterfs	Assignee:	Pranith Kumar K <pkarampu>
Status:	CLOSED WONTFIX	QA Contact:	spandura
Severity:	unspecified	Docs Contact:
Priority:	medium
Version:	2.0	CC:	grajaiya, rhs-bugs, sdharane, shaines, vbellur, vinaraya
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Known Issue
Doc Text:	Cause: Whenever a brick is restarted and self-heal has to remove a directory 'dir' it sets a flag indicating in rmdir fop that it is 'rm -rf'. The 'dir' is moved to '.glusterfs/landfill' by posix because of the flags in rmdir fop, Janitor thread removes it asynchronously. gfid-handles exists for 'dir' and files under 'dir' until the janitor thread deletes them from landfill. Janitor thread is run every 10 minutes. Self-heald depends on gfid-handle presence to determine that a file exists in the filesystem. Because of this behaviour, stale entries show up in the output of 'gluster volume heal <volname> info'. The issue is transient, after 10 minutes the stale entries won't show up in the output. Consequence: Workaround (if any): Wait for 10 minutes. The entries will be removed from the landfill directory. Volume heal <volname> will not show these stale entries. Result:	Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-12-13 06:08:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description spandura 2012-10-16 09:52:17 UTC

Description of problem:
========================
"gluster volume heal <vol_name> info" command outputs entries which are in ".glusterfs" and ".landfill" directory

Version-Release number of selected component (if applicable):
===============================================================
[10/15/12 - 21:18:25 root@rhs-client6 ~]# rpm -qa | grep gluster
glusterfs-geo-replication-3.3.0rhsvirt1-7.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-fuse-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-debuginfo-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch

[10/15/12 - 21:18:58 root@rhs-client6 ~]# gluster --version
glusterfs 3.3.0rhsvirt1 built on Oct  8 2012 15:23:00


Steps to Reproduce:
===================
1.Create a pure replicate volume (1x2) with 2 servers and 1 brick on each server. This is the storage for the VM's. start the volume.

2.Set-up the KVM to use the volume as VM store. 

3.Create 5 virtual machines (vm1 and vm2) 

4.power off both server1 and server2  

5.Poweron server2 immediately.

6.Start the VM's

7.execute "gluster volume heal <volume_name> info"   
  
Actual results:
==================
[10/16/12 - 14:32:37 root@rhs-client6 ~]# gluster v heal replicate info
Heal operation on volume replicate has been successful

Brick rhs-client6.lab.eng.blr.redhat.com:/disk0
Number of entries: 4
<gfid:862f996a-29cf-44ed-a465-67c7953bea31>
<gfid:9eadcb00-ff79-4884-b60d-2121133c3b45>
<gfid:0911332e-f679-4566-877f-778ce1863174>
<gfid:a0e5cd24-34f5-4406-8ca7-f8a16dd13345>

Brick rhs-client7.lab.eng.blr.redhat.com:/disk0
Number of entries: 2
/2a1bdd6f-8de1-46bb-acba-5f2cfe282b38/dom_md/leases
/2a1bdd6f-8de1-46bb-acba-5f2cfe282b38/images/d93aec61-ca51-4d39-9ae4-d666e9314e30/ad026918-68ca-4de0-87e1-72561533671f


[10/16/12 - 14:34:19 root@rhs-client6 ~]# stat -c %i  /disk0/.glusterfs/86/2f/862f996a-29cf-44ed-a465-67c7953bea31
536871060

[10/16/12 - 14:34:31 root@rhs-client6 ~]# find /disk0 -inum 536871060 -print
/disk0/.glusterfs/86/2f/862f996a-29cf-44ed-a465-67c7953bea31
/disk0/.landfill/1256805642/d3d96de6-c510-4c86-a8a0-061bd0f70648


[10/16/12 - 14:34:59 root@rhs-client6 ~]# stat -c %i /disk0/.glusterfs/9e/ad/9eadcb00-ff79-4884-b60d-2121133c3b45 
536871093

[10/16/12 - 14:35:17 root@rhs-client6 ~]# find /disk0 -inum 536871093 -print
/disk0/.glusterfs/9e/ad/9eadcb00-ff79-4884-b60d-2121133c3b45
/disk0/.landfill/4015650659/c32eaa20-9b98-4145-85dd-4caa3bb56eec


[10/16/12 - 14:35:30 root@rhs-client6 ~]# stat -c %i /disk0/.glusterfs/09/11/0911332e-f679-4566-877f-778ce1863174 
1610612909

[10/16/12 - 14:35:43 root@rhs-client6 ~]# find /disk0 -inum 1610612909 -print
/disk0/.glusterfs/09/11/0911332e-f679-4566-877f-778ce1863174
/disk0/.landfill/1231363262/a40498fb-b892-47ed-8637-594aa8d99964


Expected results:
=================
we would expect heal info not to list ".glusterfs" and ".landfill" directory entries. 

Not sure of what is the exact expected result. 


Additional info:
===============
root@rhs-client6 ~]# gluster volume info replicate
 
Volume Name: replicate
Type: Replicate
Volume ID: b2f1fd96-fcec-4110-81e6-963dba306d00
Status: Created
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: rhs-client6.lab.eng.blr.redhat.com:/disk0
Brick2: rhs-client7.lab.eng.blr.redhat.com:/disk0
Options Reconfigured:
storage.linux-aio: enable
cluster.eager-lock: enable
performance.read-ahead: disable
performance.stat-prefetch: disable
performance.io-cache: disable
performance.quick-read: disable

Comment 2 Pranith Kumar K 2012-11-01 07:21:30 UTC

Steps to recreate the issue:
1) Create a replicate volume with eager-lock enabled.
2) create a directory 'dir' and cd into it
3) start an infinite dd. 'dd of=a if=/dev/urandom' <--> Actually any command which can lead to index file in both the bricks for a noticeable time. When eager-lock is enabled, the chances of this happening increases.
4) kill one of the bricks
5) stop the dd using ctrl+c
6) rm -rf the directory 'dir'
7) do gluster volume start <volname> force
8) gluster volume heal <volname> info will show the stale file gfid entry on the brick which just came up.

[root@pranithk-laptop r2]# gluster volume heal r2 info
Gathering Heal info on volume r2 has been successful

Brick pranithk-laptop:/gfs/r2_0
Number of entries: 1
<gfid:466cadc3-880f-45a7-a17d-4b4f183bdb3f>

Brick pranithk-laptop:/gfs/r2_1
Number of entries: 0
[root@pranithk-laptop r2]# ls /gfs/r2_0/.glusterfs/landfill/
d785558b-9dfc-4750-9687-6704fe71cbef
[root@pranithk-laptop r2]# ls /gfs/r2_0/.glusterfs/landfill/d785558b-9dfc-4750-9687-6704fe71cbef/
a
[root@pranithk-laptop r2]# ls /gfs/r2_0/.glusterfs/landfill/d785558b-9dfc-4750-9687-6704fe71cbef/a ^C
[root@pranithk-laptop r2]# getfattr -d -m . -e hex /gfs/r2_0/.glusterfs/landfill/d785558b-9dfc-4750-9687-6704fe71cbef/a 
getfattr: Removing leading '/' from absolute path names
# file: gfs/r2_0/.glusterfs/landfill/d785558b-9dfc-4750-9687-6704fe71cbef/a
trusted.afr.r2-client-0=0x000000010000000000000000
trusted.afr.r2-client-1=0x000000010000000000000000
trusted.gfid=0x466cadc3880f45a7a17d4b4f183bdb3f

Reason for the issue:
When the brick is restarted again, self-heal has to remove the directory 'dir' which is moved to landfill by posix because of the flags indicated by self-heal in rmdir fop. Janitor thread is run every 10 minutes, until then the stale entry shows up in the output of 'gluster volume heal <volname> info' Because the gfid-handle exists until the janitor thread deletes the stale entries in landfill.