Bug 1258377

Summary: ACL created on a dht.linkto file on a files that skipped rebalance
Product: [Community] GlusterFS Reporter: Nithya Balachandran <nbalacha>
Component: distributeAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 3.7.3CC: asriram, bkunal, bugs, cbuissar, gluster-bugs, mlawrenc, nbalacha, nsathyan, rgowdapp, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1247563 Environment:
Last Closed: 2015-10-14 10:30:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1234610, 1247563    
Bug Blocks: 1216951    

Description Nithya Balachandran 2015-08-31 08:52:46 UTC
+++ This bug was initially created as a clone of Bug #1247563 +++

+++ This bug was initially created as a clone of Bug #1234610 +++

DESCRIPTION

When rebalancing a file with ACL, if you are unlucky enough so that the file is skipped due to __dht_check_free_space() failure (or maybe other reasons), the ACLs will still be migrated on the destination file, containing the dht.linkto XATTR.

This leads to issues where on the client, the file will be seen twice :

---8<----
[root@rhs30-node1 severalfiles]# echo bla > file
[root@rhs30-node1 severalfiles]# setfacl -m u:cedric:w file
[root@rhs30-node1 severalfiles]# ll file
-rw-rw-r--+ 1 root root 4 Jun 22 21:42 file
[root@rhs30-node1 severalfiles]# mv file renamed
[root@rhs30-node1 severalfiles]# gluster volume rebalance thingluster start
volume rebalance: thingluster: success: Rebalance on thingluster has been started successfully. Use rebalance status command to check status of the rebalance process.
ID: 9277c760-2495-4058-894e-12e95e005b90

[root@rhs30-node1 severalfiles]# ls -li | grep renamed
12641300739152549957 -rw-rw-r-T+ 1 root root        4 Jun 22 21:43 renamed
12641300739152549957 -rw-rw-r-T+ 1 root root        4 Jun 22 21:43 renamed
---->8----

looking on the bricks :

on my node 2 :
524419 -rw-rw-r--+ 2 root root 4 Jun 22 21:42 /mnt/bricks/thiny/gluster/severalfiles/renamed

On my node 4 :
524427 -rw-rw-r-T+ 2 root root 4 Jun 22 21:43 /mnt/bricks/thiny/gluster/severalfiles/renamed


We can see that the node 4 inherited from the ACLs, while this usually does not happen : regular dht.linkto files do not have ACLs.


--- Additional comment from Cedric Buissart on 2015-06-22 16:02:04 EDT ---

I suspect that the issue is a side effect of xlators/cluster/dht/src/dht-rebalance.c , function __dht_rebalance_create_dst_file :

static inline int
__dht_rebalance_create_dst_file (xlator_t *to, xlator_t *from, loc_t *loc, struct iatt *stbuf,
                                 dict_t *dict, fd_t **dst_fd, dict_t *xattr)
{
[...]
        /* TODO: move all xattr related operations to fd based operations */
        ret = syncop_listxattr (from, loc, &xattr);
        if (ret < 0) {
                gf_msg (this->name, GF_LOG_WARNING, 0,
                        DHT_MSG_MIGRATE_FILE_FAILED,
                        "Migrate file failed:"
                        "%s: failed to get xattr from %s (%s)",
                        loc->path, from->name, strerror (-ret));
                ret = -1;
        }

        /* create the destination, with required modes/xattr */
        ret = __dht_rebalance_create_dst_file (to, from, loc, &stbuf,
                                               dict, &dst_fd, xattr);
        if (ret)
                goto out;

        ret = __dht_check_free_space (to, from, loc, &stbuf, flag);
        if (ret) {
                goto out;
        }


Where the xattr are copied over (syncop_listxattr(); __dht_rebalance_create_dst_file()) but then we might cancel the migration if __dht_check_free_space() returns a failure ... leading to xattr having been modified on the destination.

or maybe we want to change the behavior or readdir/getdents ?

--- Additional comment from Bipin Kunal on 2015-06-23 03:47:08 EDT ---

(In reply to Cedric Buissart from comment #2)
> I suspect that the issue is a side effect of
> xlators/cluster/dht/src/dht-rebalance.c , function
> __dht_rebalance_create_dst_file :
> 
I think the function which Cedric wanted to point is :
int
dht_migrate_file (xlator_t *this, loc_t *loc, xlator_t *from, xlator_t *to,
                  int flag)
> {
> [...]
>         /* TODO: move all xattr related operations to fd based operations */
>         ret = syncop_listxattr (from, loc, &xattr);
>         if (ret < 0) {
>                 gf_msg (this->name, GF_LOG_WARNING, 0,
>                         DHT_MSG_MIGRATE_FILE_FAILED,
>                         "Migrate file failed:"
>                         "%s: failed to get xattr from %s (%s)",
>                         loc->path, from->name, strerror (-ret));
>                 ret = -1;
>         }
> 
>         /* create the destination, with required modes/xattr */
>         ret = __dht_rebalance_create_dst_file (to, from, loc, &stbuf,
>                                                dict, &dst_fd, xattr);
>         if (ret)
>                 goto out;
> 
>         ret = __dht_check_free_space (to, from, loc, &stbuf, flag);
>         if (ret) {
>                 goto out;
>         }
> 
> 
> Where the xattr are copied over (syncop_listxattr();
> __dht_rebalance_create_dst_file()) but then we might cancel the migration if
> __dht_check_free_space() returns a failure ... leading to xattr having been
> modified on the destination.
> 
> or maybe we want to change the behavior or readdir/getdents ?

I think we should be calling function __dht_check_free_space() at least prior to __dht_rebalance_create_dst_file else we should have proper cleanup activity in the "out" section.
 

Thanks,
Bipin Kunal

--- Additional comment from Nithya Balachandran on 2015-06-23 06:39:02 EDT ---

What Cedric and Bipin have said is correct. I can reproduce the issue and am working on a fix. I will update with progress tomorrow.

--- Additional comment from Cedric Buissart on 2015-06-23 10:00:39 EDT ---

(In reply to Bipin Kunal from comment #3)
> 
> I think we should be calling function __dht_check_free_space() at least
> prior to __dht_rebalance_create_dst_file else we should have proper cleanup
> activity in the "out" section.
>  
> 
> Thanks,
> Bipin Kunal

Yes, I was thinking the same. After a quick try, it seems to work :
I have the "data movement attempted from node [..] with higher disk space to a node" in the logs, but this time, did not lead to dups in the ls output.

However, additionally : should we also look at the readdir behaviour and check why the ACLs on the linkto file leads to showing dups ?

--- Additional comment from Nithya Balachandran on 2015-06-23 13:23:36 EDT ---

Cedric and Bipin, 
The change discussed above should certainly work but I want to look into a few more areas to understand the impact - specifically wrt to the new lookup-optimize option we are introducing in 3.1. I will update the BZ tomorrow again.

--- Additional comment from Nithya Balachandran on 2015-06-24 04:12:41 EDT ---

This happens only if posix acls are set on the file being rebalanced and is easily reproduced using the following steps:

1. Create a distributed volume, say vol1
2. Mount it using the following command:
 mount -t glusterfs -o acl <server>:vol1 /mnt/vol1

3. Create a file on the gluster volume from the mount point and set a posix acl on it
#> cd /mnt/vol1
#> touch file1
#> setfacl -m u:root:rw file1

#>l

total 0
-rw-rw-r--+ 1 root root    0 Jun 24 13:21 file1


4. Write some content into the file and rename the file so it hashes to a different brick, say file2. This will create a linkto file on the new hashed subvol.

5. Start a rebalance. Somehow force that file to be skipped (I used gdb).


Post the rebalance, it is seen that the linkto file on the brick now has additional permissions because of the posix acls that are set on it as part of the xattr copy in  __dht_rebalance_create_dst_file. Listing the file on the mount point will show duplicate entries as these additional permissions cause the IS_DHT_LINKFILE_MODE check to fail. The linkto file is then treated as a data file.


Before rebalance:

./x1:

---------T 2 root root 0 Jun 24 13:10 file2   <-- Valid linkto file

./x3:

-rw-rwxr--+ 2 root root 3214 Jun 24 13:10 file2



After rebalance:

./x1:

-rw-rwxr-T+ 2 root root 3214 Jun 24 13:11 file2  <---Linkto file perms changed

./x3:

-rw-rwxr--+ 2 root root 3214 Jun 24 13:10 file2  <---  actual data file




Impact:

The user sees 2 entries for the file listing on the mount point. In some cases, operations are sent to the wrong copy of the file causing data loss/corruption.



Workaround:

Use 

gluster volume rebalance <volname> start force

to prevent the file being skipped because of the space check.


Once this has occurred, the affected linkto files have to be identified and deleted from the brick (not the mount point).

--- Additional comment from Nithya Balachandran on 2015-06-24 04:29:03 EDT ---

Analysis:

1. The additional permissions set on the linkto file as a result of copying the posix acl xattrs in __dht_rebalance_create_dst_file cause the IS_DHT_LINKFILE_MODE check to fail.

2. If the __dht_check_free_space fails post this, the modified linkto file is not cleaned up.

3. Operations then treat this file as the data file and can cause data corruption/loss. I see the ops sometimes going to the correct file after  initially accessing the wrong file but I am still investigating why.


Possible fixes:
1. Reverse the order of calling __dht_rebalance_create_dst_file and __dht_check_free_space.

This is the simplest way and will fix this specific issue. It however does not address the problem of not cleaning up the dst file in case of failures later in the file migration  sequence.

2. Cleaning up a dst file in case of migration skip/failure.
This might have issues where, if acls are set on a file undergoing migration, it will be set on the target file anyway as part of FOP handling. Subsequent fops may end up being performed on the wrong file. That this actually happens however needs to be confirmed.


3. Modify the IS_DHT_LINKFILE_MODE to ignore the other permissions. This would require extensive study/testing to determine the impact and is not recommended at this point.

--- Additional comment from Anand Avati on 2015-07-28 06:55:45 EDT ---

REVIEW: http://review.gluster.org/11774 (cluster/dht: Check for space before creating migration dst file) posted (#1) for review on master by N Balachandran (nbalacha)

--- Additional comment from Anand Avati on 2015-08-27 03:49:41 EDT ---

REVIEW: http://review.gluster.org/12025 (cluster/dht: Don't set posix acls on linkto files) posted (#1) for review on master by N Balachandran (nbalacha)

--- Additional comment from Anand Avati on 2015-08-27 03:54:16 EDT ---

REVIEW: http://review.gluster.org/12025 (cluster/dht: Don't set posix acls on linkto files) posted (#2) for review on master by N Balachandran (nbalacha)

--- Additional comment from Anand Avati on 2015-08-27 06:01:36 EDT ---

REVIEW: http://review.gluster.org/12025 (cluster/dht: Don't set posix acls on linkto files) posted (#3) for review on master by N Balachandran (nbalacha)

--- Additional comment from Anand Avati on 2015-08-27 06:48:14 EDT ---

REVIEW: http://review.gluster.org/12025 (cluster/dht: Don't set posix acls on linkto files) posted (#4) for review on master by N Balachandran (nbalacha)

--- Additional comment from Anand Avati on 2015-08-27 06:56:53 EDT ---

REVIEW: http://review.gluster.org/12025 (cluster/dht: Don't set posix acls on linkto files) posted (#5) for review on master by N Balachandran (nbalacha)

--- Additional comment from Anand Avati on 2015-08-28 07:10:36 EDT ---

REVIEW: http://review.gluster.org/12025 (cluster/dht: Don't set posix acls on linkto files) posted (#6) for review on master by N Balachandran (nbalacha)

--- Additional comment from Anand Avati on 2015-08-31 03:28:04 EDT ---

COMMIT: http://review.gluster.org/12025 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit b9c730f3960efd454c8363ee39dc144e4c0dc835
Author: Nithya Balachandran <nbalacha>
Date:   Thu Aug 27 13:10:18 2015 +0530

    cluster/dht: Don't set posix acls on linkto files
    
    Posix acls on a linkto file change the file's permission
    bits and cause DHT to treat it as a non-linkto file.This
    happens on the migration failure of a file on which posix
     acls were set.
    
    The fix prevents posix acls from being set on a linkto
    file and copies them across only after a file has
    been successfully migrated.
    
    Change-Id: Iccf7ff6fba49fe05d691d9b83bf76a240848b212
    BUG: 1247563
    Signed-off-by: Nithya Balachandran <nbalacha>
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: http://review.gluster.org/12025
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 1 Anand Avati 2015-08-31 09:40:15 UTC
REVIEW: http://review.gluster.org/12062 (cluster/dht: Don't set posix acls on linkto files) posted (#1) for review on release-3.7 by N Balachandran (nbalacha)

Comment 2 Anand Avati 2015-08-31 09:54:29 UTC
REVIEW: http://review.gluster.org/12062 (cluster/dht: Don't set posix acls on linkto files) posted (#2) for review on release-3.7 by N Balachandran (nbalacha)

Comment 3 Pranith Kumar K 2015-10-14 10:30:48 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-glusterfs-3.7.5, please open a new bug report.

glusterfs-glusterfs-3.7.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/gluster-users/2015-October/023968.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 4 Pranith Kumar K 2015-10-14 10:39:03 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.5, please open a new bug report.

glusterfs-3.7.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/gluster-users/2015-October/023968.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user