Bug 983416

Summary: DHT :- not able to unlink file ( rm -f, unlink) if cached sub-volume is up and hashed sub-volume is down
Product: [Community] GlusterFS Reporter: shishir gowda <sgowda>
Component: distributeAssignee: Susant Kumar Palai <spalai>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: gluster-bugs, nsathyan, racpatel, rhs-bugs, shaines, spalai, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.5.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 970686 Environment:
Last Closed: 2014-04-17 11:43:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 970686    
Bug Blocks:    

Comment 1 Anand Avati 2013-07-11 07:39:28 UTC
REVIEW: http://review.gluster.org/5317 (cluster/dht: If linkfile unlink fails with ENOTCONN, do not fail) posted (#1) for review on master by Shishir Gowda (sgowda)

Comment 2 Anand Avati 2013-07-12 00:43:39 UTC
COMMIT: http://review.gluster.org/5317 committed in master by Anand Avati (avati) 
------
commit 60d1949b00fa42e0c5d1f0a763004ca474a4645d
Author: shishir gowda <sgowda>
Date:   Thu Jul 11 13:05:55 2013 +0530

    cluster/dht: If linkfile unlink fails with ENOTCONN, do not fail
    
    Currently if linkfile fails with ENOENT, we do not fail. We also
    need to treat failures with ENOTCONN as success, as if cached subvol
    is up, rm of a file should succeed. A stale linkfile will get removed
    later
    
    Change-Id: I71d136847933351ed9e2c939bda4a69bc96a3cfc
    BUG: 983416
    Signed-off-by: shishir gowda <sgowda>
    Reviewed-on: http://review.gluster.org/5317
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>

Comment 5 Susant Kumar Palai 2014-01-29 10:29:33 UTC
Moving the bug to assigned state as it is reproducible.

Steps to reproduce: 
1. mount DHT volume and create few files and Dir
2. bring one sub-volume down
3. unmount the volume and remount.
4. Trying to remove files for which linkto file is present in hashed subvol and the data file is present in cached subvol, will throw error as "Invalid argument".

Here are the details :

created the data accoring to summary, renamed it and killed one of the brick process.

[root@vm1 mnt]# gluster v status
Status of volume: test1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 192.168.122.240:/brick2/1				N/A	N	7763
Brick 192.168.122.240:/brick2/2				49166	Y	7773
NFS Server on localhost					2049	Y	7782
 
Task Status of Volume test1
------------------------------------------------------------------------------
There are no active volume tasks
 

[root@vm1 mnt]# ll -R /brick2/ 
/brick2/:
total 8
drwxr-xr-x 3 root root 4096 Jan 29 10:23 1
drwxr-xr-x 3 root root 4096 Jan 29 10:23 2

/brick2/1:
total 0
---------T 2 root root 0 Jan 29 10:23 fnew1
---------T 2 root root 0 Jan 29 10:23 fnew10
-rw-r--r-- 2 root root 0 Jan 29 10:23 fnew4
-rw-r--r-- 2 root root 0 Jan 29 10:23 fnew6
-rw-r--r-- 2 root root 0 Jan 29 10:23 fnew7
-rw-r--r-- 2 root root 0 Jan 29 10:23 fnew8
-rw-r--r-- 2 root root 0 Jan 29 10:23 fnew9

/brick2/2:
total 0
-rw-r--r-- 2 root root 0 Jan 29 10:23 fnew1
-rw-r--r-- 2 root root 0 Jan 29 10:23 fnew10
-rw-r--r-- 2 root root 0 Jan 29 10:23 fnew2
-rw-r--r-- 2 root root 0 Jan 29 10:23 fnew3
-rw-r--r-- 2 root root 0 Jan 29 10:23 fnew5
---------T 2 root root 0 Jan 29 10:23 fnew7
[root@vm1 mnt]# 


[root@vm1 mnt]# ls
fnew1  fnew10  fnew2  fnew3  fnew5
[root@vm1 mnt]# cd
[root@vm1 ~]# umount /mnt/
[root@vm1 ~]# mount -t glusterfs 192.168.122.240:/test1 /mnt/
[root@vm1 ~]# cd /mnt/
[root@vm1 mnt]# ls
fnew1  fnew10  fnew2  fnew3  fnew5
[root@vm1 mnt]# rm -f fnew1
rm: cannot remove `fnew1': Invalid argument
[root@vm1 mnt]#

Comment 6 Anand Avati 2014-01-29 10:57:37 UTC
REVIEW: http://review.gluster.org/6851 (cluster/dht: If hashed_subvol is NULL, do not fail) posted (#1) for review on master by susant palai (spalai)

Comment 7 Anand Avati 2014-02-08 19:29:27 UTC
COMMIT: http://review.gluster.org/6851 committed in master by Anand Avati (avati) 
------
commit 14792bd894e7838efdc8f50a16af5445b448dc2e
Author: Susant Palai <spalai>
Date:   Wed Jan 29 10:47:20 2014 +0000

    cluster/dht: If hashed_subvol is NULL, do not fail
    
    Problem: With the current implementation we are allowing unlink
    of a file if hashed subvol is down and cached subvol is up.
    For the above op to work we should have the info of hashed_subvol.
    But incase we do remount of the volume we will have a zeroed layout
    for the disconnected subvol(start=0, stop=0, err=ENOTCONN) which will
    result  into hashed_subvol being NULL and failing unlink op.
    
    Solution: Dont fail if hashed_subvol is NULL. Check cached subvol
    and unlink in cached subvol. The linkto file in the hashed subvol
    can be remove later.
    
    Change-Id: Ic1982c15c8942a1adcb47ed0017d2d5ace5c9241
    BUG: 983416
    Signed-off-by: Susant Palai <spalai>
    Reviewed-on: http://review.gluster.org/6851
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Raghavendra G <rgowdapp>
    Reviewed-by: Anand Avati <avati>

Comment 8 Niels de Vos 2014-04-17 11:43:27 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user