Bug 1161156 - DHT: two problems, first rename fails for a file, second rename failures give different error messages
Summary: DHT: two problems, first rename fails for a file, second rename failures give...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Shyamsundar
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-11-06 14:23 UTC by Shyamsundar
Modified: 2015-05-14 17:44 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.7.0
Clone Of: 1138737
Environment:
Last Closed: 2015-05-14 17:28:21 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Shyamsundar 2014-11-06 14:23:40 UTC
+++ This bug was initially created as a clone of Bug #1138737 +++

Description of problem:
I tried a rename a 5GB file and it failed.
Failure is seen when the quota related available space is reported to be less the double of file size, so in this case less than 10GB available as per quota.
This is the observation, may be there are other reasons that rename is failing.

Altogether rename failure is the problem and second is the error messages in subsequent trials of rename are different.

How reproducible:
always

Steps to Reproduce:
1. create a volume of 6x2 type, start it
2. enable quota on the volume
3. set quota limit on the "/",say 20GB
4. mount the volume over nfs
5. create a file of 5GB
6. create a directory, and create data in this dir fill up the volume, lets say upto 18GB(approx)
7. try to rename the 5GB file --- fails


Actual results:
first problem
step 7 fails,

second problem,
subsequent trials of rename of same file gave different error messages, as can be understood from these logs,

[root@rhsauto002 dir3]# mv 5GBfile-rename 5GBfile
[root@rhsauto002 dir3]# mv 5GBfile 5GBfile-rename
mv: cannot stat ‘5GBfile’: No such file or directory
[root@rhsauto002 dir3]# ls
5GBfile-rename
[root@rhsauto002 dir3]# mv 5GBfile-rename 5GBfile
[root@rhsauto002 dir3]# ls
5GBfile-rename
[root@rhsauto002 dir3]# 
[root@rhsauto002 dir3]# mv 5GBfile-rename 5GBfile
mv: ‘5GBfile-rename’ and ‘5GBfile’ are the same file



one of the bricks around this time,
[2014-09-05 02:43:27.157125] A [quota.c:4200:quota_log_usage] 0-dist-rep-quota: Usage crossed soft limit: 19.6GB used by /
[2014-09-05 02:53:22.111639] I [server-rpc-fops.c:999:server_rename_cbk] 0-dist-rep-server: 749084: RENAME /dir3/5GBfile-rename (00000000-0000-0000-0000-000000000000/5GBfile-rename) -> /dir3/5GBfile (00000000-0000-0000-0000-000000000000/5GBfile) ==> (Disk quota exceeded)
[2014-09-05 02:53:49.873920] I [server-rpc-fops.c:999:server_rename_cbk] 0-dist-rep-server: 749162: RENAME /dir3/5GBfile-rename (00000000-0000-0000-0000-000000000000/5GBfile-rename) -> /dir3/5GBfile (00000000-0000-0000-0000-000000000000/5GBfile) ==> (Disk quota exceeded)

Please note that the brick log says soft limit crossed to 19.6GB, whereas per quota list command it is 18.7 GB

[root@nfs1 ~]# gluster volume quota dist-rep list
                  Path                   Hard-limit Soft-limit   Used  Available  Soft-limit exceeded? Hard-limit exceeded?
---------------------------------------------------------------------------------------------------------------------------
/                                         20.0GB       80%      18.7GB   1.3GB             Yes                   No
/dir1                                      5.0GB       80%       5.0GB  0Bytes             Yes                  Yes


Expected results:
First problem,
rename should pass,

second problem,
error messages should remain same.

Additional info:

(In reply to Saurabh from comment #0)
> 
> 
> one of the bricks around this time,
> [2014-09-05 02:43:27.157125] A [quota.c:4200:quota_log_usage]
> 0-dist-rep-quota: Usage crossed soft limit: 19.6GB used by /
> [2014-09-05 02:53:22.111639] I [server-rpc-fops.c:999:server_rename_cbk]
> 0-dist-rep-server: 749084: RENAME /dir3/5GBfile-rename
> (00000000-0000-0000-0000-000000000000/5GBfile-rename) -> /dir3/5GBfile
> (00000000-0000-0000-0000-000000000000/5GBfile) ==> (Disk quota exceeded)
> [2014-09-05 02:53:49.873920] I [server-rpc-fops.c:999:server_rename_cbk]
> 0-dist-rep-server: 749162: RENAME /dir3/5GBfile-rename
> (00000000-0000-0000-0000-000000000000/5GBfile-rename) -> /dir3/5GBfile
> (00000000-0000-0000-0000-000000000000/5GBfile) ==> (Disk quota exceeded)
> 
> Please note that the brick log says soft limit crossed to 19.6GB, whereas
> per quota list command it is 18.7 GB


The usage displayed in the logs, 19.6GB is the current disk usage of "/". The message indicates that the usage of "/" has crossed its soft limit, viz. 18.67BG. Hope that clarifies.
> 
> [root@nfs1 ~]# gluster volume quota dist-rep list
>                   Path                   Hard-limit Soft-limit   Used 
> Available  Soft-limit exceeded? Hard-limit exceeded?
> -----------------------------------------------------------------------------
> ----------------------------------------------
> /                                         20.0GB       80%      18.7GB  
> 1.3GB             Yes                   No
> /dir1                                      5.0GB       80%       5.0GB 
> 0Bytes             Yes                  Yes
> 

Could you attach the sosreport to the BZ?

--- Additional comment from Raghavendra G on 2014-09-22 05:42:10 EDT ---

The actual bug here is mv not reporting any error even though it failed (as source is seen to be present even after mv). Below is the RCA for it:

From strace mv 5 2,

lstat("2", 0x7fffd2382400)              = -1 ENOENT (No such file or directory)
rename("5", "2")                        = 0

I am seeing brick failing the rename with EDQUOT, but the rename command itself "seems" to be succeeding, though rename was a failure, since ls after mv shows file 5 still existing and file 2 not existing.

[root@unused gfs]# ls
3  5  dir  newdir  sibling
[root@unused gfs]# ls 
3  5  dir  newdir  sibling
[root@unused gfs]# mv 5 2
[root@unused gfs]# ls
3  5  dir  newdir  sibling


The culprit is dht, which is not propagating back the error to application. In dht_rename_cbk, we have,

        if (op_ret == -1) {
                /* Critical failure: unable to rename the cached file */
                if (src_cached == dst_cached) {
                        gf_msg (this->name, GF_LOG_WARNING, op_errno,
                                DHT_MSG_RENAME_FAILED,
                                "%s: Rename on %s failed, (gfid = %s) ",
                                local->loc.path, prev->this->name,
                                local->loc.inode ?
                                uuid_utoa(local->loc.inode->gfid):"");
                        local->op_ret   = op_ret;
                        local->op_errno = op_errno;
                        goto cleanup;
                }

the above if (src_cached == dst_cached) makes dht not to store failures of any renames where destination doesn't exist as dht_cached will be NULL.

Comment 1 Anand Avati 2014-11-06 16:08:52 UTC
REVIEW: http://review.gluster.org/9063 (cluster/dht: Fix subvol check, to correctly determine cached file rename) posted (#1) for review on master by Shyamsundar Ranganathan (srangana)

Comment 2 Anand Avati 2014-11-10 15:37:01 UTC
REVIEW: http://review.gluster.org/9063 (cluster/dht: Fix subvol check, to correctly determine cached file rename) posted (#2) for review on master by Shyamsundar Ranganathan (srangana)

Comment 3 Anand Avati 2014-11-12 14:54:13 UTC
REVIEW: http://review.gluster.org/9063 (cluster/dht: Fix subvol check, to correctly determine cached file rename) posted (#3) for review on master by Shyamsundar Ranganathan (srangana)

Comment 4 Anand Avati 2014-11-17 08:25:46 UTC
COMMIT: http://review.gluster.org/9063 committed in master by Vijay Bellur (vbellur) 
------
commit dfc49143841fe84f846346a30dadce797940eebc
Author: Shyam <srangana>
Date:   Thu Nov 6 10:43:37 2014 -0500

    cluster/dht: Fix subvol check, to correctly determine cached file rename
    
    The check to treat rename as a critical failure ignored when the cached
    file is being renamed to new name, as the new name falls on the same
    subvol as the cached file. This is in addition to when the target of the
    rename does not exist.
    
    The current change is simpler, as the rename logic, renames the cached
    file in case the target exists and falls on the same subvol as source
    name, OR the target does not exist and the hash of target falls on the
    same subvol as source cached. These conditions mean we are renaming the
    source, other conditions mean we are renaming the source linkto file
    which we do not want to treat as a critical failure (and we also instruct
    marker that it is an internal FOP and to not account for the same).
    
    Change-Id: I4414e61a0d2b28a429fa747e545ef953e48cfb5b
    BUG: 1161156
    Signed-off-by: Shyam <srangana>
    Reviewed-on: http://review.gluster.org/9063
    Reviewed-by: N Balachandran <nbalacha>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: susant palai <spalai>
    Reviewed-by: venkatesh somyajulu <vsomyaju>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 5 Niels de Vos 2015-05-14 17:28:21 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 6 Niels de Vos 2015-05-14 17:35:42 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 7 Niels de Vos 2015-05-14 17:38:04 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 8 Niels de Vos 2015-05-14 17:44:36 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.