1227687 – [remove-brick]: Creation of file from NFS writes to the decommissioned subvolume and subsequent lookup from fuse creates a link

Bug 1227687 - [remove-brick]: Creation of file from NFS writes to the decommissioned subvolume and subsequent lookup from fuse creates a link

Summary: [remove-brick]: Creation of file from NFS writes to the decommissioned subvo...

Keywords:
Status:	CLOSED DUPLICATE of bug 1232378
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Susant Kumar Palai
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1225452 1232378 1256283
Blocks:	1223636
TreeView+	depends on / blocked

Reported:	2015-06-03 10:17 UTC by Susant Kumar Palai
Modified:	2017-08-28 10:01 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:	1225452
Environment:
Last Closed:	2017-08-28 10:01:09 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Susant Kumar Palai 2015-06-03 10:17:14 UTC

+++ This bug was initially created as a clone of Bug #1225452 +++

Description of problem:
=======================

Removed one of the complete subvolume from (2x3) volume, which must have triggered the rebalance. Once rebalance status is shown as completed. Copied a file "/etc/hosts" to fuse mount and copied another file "/etc/hosts.allow" to nfs mount. 

file hosts is written on the available subvolume (which was not removed) but file hosts.allow is written on the subvolume which was removed. After this did a ls from fuse mount and observed a link file created on the available subvolume. 

Performed a commit after this, the removed subvolume is no longer part of the volume but it contains the actual file hosts.allow which is lost. And the available subvolume has a T file which errors in "Structure needs cleaning" 

[root@georep1 b1]# cd /rhs/brick1/b1/
[root@georep1 b1]# ls -lrt hosts.allow 
---------T. 2 root root 0 May 27 20:43 hosts.allow
[root@georep1 b1]# cd /rhs/brick2/b2/
[root@georep1 b2]# ls -lrt hosts.allow 
-rw-r--r--. 2 root root 370 May 27 20:43 hosts.allow
[root@georep1 b2]# gluster v info master
 
Volume Name: master
Type: Replicate
Volume ID: 7b933011-28da-48d7-90a0-40ac33102aae
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.46.96:/rhs/brick1/b1
Brick2: 10.70.46.97:/rhs/brick1/b1
Brick3: 10.70.46.93:/rhs/brick1/b1
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
performance.readdir-ahead: on
[root@georep1 b2]# 

[root@georep1 b2]# getfattr -n trusted.gfid -e hex -m . /rhs/brick1/b1/hosts.allow 
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/b1/hosts.allow
trusted.gfid=0x70ad500ee72645a39fc3b54d88aa2cc7

[root@georep1 b2]# getfattr -n trusted.gfid -e hex -m . /rhs/brick2/b2/hosts.allow 
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/b2/hosts.allow
trusted.gfid=0x70ad500ee72645a39fc3b54d88aa2cc7

[root@georep1 b2]# 


Version-Release number of selected component (if applicable):
==============================================================

glusterfs-3.7.0-2.el6rhs.x86_64


How reproducible:
=================

Tried only once


Steps Carried:
==============
1. Created Master and Slave Cluster
2. Created Master and Slave Volume (2x3)
3. Created shared meta volume and mounted on all the master nodes
4. Created and Started geo-rep session between master and slave volume
5. Mount the master and slave volumes (Fuse & NFS)
6. From Fuse and NFS mount of master volume, create set of dirs and files
7. Once data creation is completed, wait for it to sync to slave
8. Verify using arequal that the data at master and slave matches
9. Remove a complete subvolume using "remove start" 
10. Wait for it to complete (Monitor using remove status"
11. Once completed, touch a file "rahul" from fuse and cp a file hosts from /etc/hosts to fuse mount of master
12. Copy a file /etc/hosts.allow to nfs mount of master
13. do a ls from fuse mount , it should list all the rahul,hosts,hosts.allow file
14. do a remove commit to remove the complete subvolume
15. geo-rep session goes to faulty with traceback "OSError: [Errno 117] Structure needs cleaning: '.gfid/70ad500e-e726-45a3-9fc3-b54d88aa2cc7'"

Checked in backend, the original file which is written from NFS is present in the decommissioned bricks and its T link file in the present available volume

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-05-27 17:10:21 MVT ---

This bug is automatically being proposed for Red Hat Gluster Storage 3.2.0 by setting the release flag 'rhgs‑3.2.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Rahul Hinduja on 2015-05-27 17:20:13 MVT ---

sosreports are at: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1225452/

The appropriate logs will be at master and client

Before commit:
=============

[root@georep1 scripts]# gluster v info master
 
Volume Name: master
Type: Distributed-Replicate
Volume ID: 7b933011-28da-48d7-90a0-40ac33102aae
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.46.96:/rhs/brick1/b1
Brick2: 10.70.46.97:/rhs/brick1/b1
Brick3: 10.70.46.93:/rhs/brick1/b1
Brick4: 10.70.46.96:/rhs/brick2/b2
Brick5: 10.70.46.97:/rhs/brick2/b2
Brick6: 10.70.46.93:/rhs/brick2/b2
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
performance.readdir-ahead: on
[root@georep1 scripts]# 



After commit:
============
[root@georep1 ~]# gluster v info master
 
Volume Name: master
Type: Replicate
Volume ID: 7b933011-28da-48d7-90a0-40ac33102aae
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.46.96:/rhs/brick1/b1
Brick2: 10.70.46.97:/rhs/brick1/b1
Brick3: 10.70.46.93:/rhs/brick1/b1
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
performance.readdir-ahead: on
[root@georep1 ~]#

Comment 1 Anand Avati 2015-06-03 10:20:29 UTC

REVIEW: http://review.gluster.org/11061 (server/nfs: Restart nfs server post remove-brick start) posted (#1) for review on master by Susant Palai (spalai)

Comment 2 Anand Avati 2015-06-03 10:24:52 UTC

REVIEW: http://review.gluster.org/11061 (server/nfs: Restart nfs server post remove-brick start) posted (#2) for review on master by Susant Palai (spalai)

Comment 3 Nithya Balachandran 2017-08-28 10:01:09 UTC


*** This bug has been marked as a duplicate of bug 1232378 ***

Note You need to log in before you can comment on or make changes to this bug.