Bug 1053537

Summary: BVT: Rebalance failed for symbolic link files
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Lalatendu Mohanty <lmohanty>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED DEFERRED QA Contact: shylesh <shmohan>
Severity: high Docs Contact: Lalatendu Mohanty <lmohanty>
Priority: high    
Version: 2.1CC: mzywusko, nlevinki, nsathyan, sdharane, spalai, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1286142 (view as bug list) Environment:
Last Closed: 2015-11-27 11:53:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1286142    
Attachments:
Description Flags
Rebalance logs none

Description Lalatendu Mohanty 2014-01-15 11:44:49 UTC
Description of problem:

Rebalance status command output shows failures. It should not show any failures as if any file does not get rebalanced because of space issue it should be in skip list.

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.56rhs-1389602694.el6.x86_64.rpm

How reproducible:
Intermittent

Steps to Reproduce:

1.Create a distribute volume

2. create the data i.e. symlinks on the mount point
   
   Refer "Additional info:" for code that creates the symlinks

3. Add brick
4. Start rebalance
5. Check rebalance status

Actual results:

Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost               46         1.2KB           366             0             1            completed               3.00
       rhsauto057.lab.eng.blr.redhat.com               43         1.1KB           382             1             0            completed               3.00
       rhsauto022.lab.eng.blr.redhat.com                0        0Bytes           330             0             0            completed               1.00

Expected results:


Additional info:

Code to create symbolic links:

mkdir -p $MOUNT_POINT/symlinks
            mkdir -p /symlinks
            mkdir -p $MOUNT_POINT/symlinks-gluster
            mkdir -p $MOUNT_POINT/symlinks-gluster-dest
            for i in `seq 1 100`; do
                echo "#!/bin/bash" > /symlinks/$i.sh
                echo "echo Hello World" >> /symlinks/$i.sh
                ln -s /symlinks/$i.sh $MOUNT_POINT/symlinks/$i
                echo "#!/bin/bash" >> $MOUNT_POINT/symlinks-gluster/$i.sh
                echo "echo Hello World" >> $MOUNT_POINT/symlinks-gluster/$i.sh
                ln -s $MOUNT_POINT/symlinks-gluster/$i.sh $MOUNT_POINT/symlinks-gluster-dest/$i                                                                                                                                        
            done

Comment 1 Lalatendu Mohanty 2014-01-15 11:47:36 UTC
Marked this bug as intermittent as this came twice in last few days BVT run.

Today's Run: https://beaker.engineering.redhat.com/jobs/575364

Previous Run: https://beaker.engineering.redhat.com/jobs/572378

Comment 2 Lalatendu Mohanty 2014-01-15 11:51:17 UTC
Created attachment 850449 [details]
Rebalance logs

Comment 3 Lalatendu Mohanty 2014-01-15 11:53:43 UTC
In the logs below error message is seen:

hosdu-rebalance.log.1:[2014-01-15 18:14:31.263392] E [dht-linkfile.c:287:dht_linkfile_setattr_cbk] 0-hosdu-dht: setattr of uid/gid on /93 :<gfid:00000000-0000-0000-0000-000000000000> failed (No such file or directory)

Comment 5 Lalatendu Mohanty 2014-02-07 09:41:45 UTC
In last couple of weeks of BVT run I haven't seen this issue. Hence lowering the severity

Comment 6 Vivek Agarwal 2014-02-20 08:36:56 UTC
adding 3.0 flag and removing 2.1.z

Comment 7 vsomyaju 2014-03-26 12:58:06 UTC
From the logs it seems that one of the rebalance process is not able to perform setattr because file was not present at the backend.

If it is a replicated volume, ideally lock should be taken. So not sure how it went into that situation.


But able to reproduce it for not-replicated volume.

1. Initially there are two bricks.
2. Created file. org_file
3. Created symbolic link to org_file. [sym_file]
4. Added new brick.
5. Ran rebalance.

NOTE: All bricks are on different node.



Now rebalance process  will run on all three nodes.
Lets assume file needs to be migrated from node-2 to node-3.


             Rebalance-1:              Rebalance-2           Rebalance-3


t1:         Lookup (sym_file)
t2:         Create dht-link(T)
             at node-3
            
                                      
t3:                                  Lookup(sym_file)

t4:                                  After some more
                                     operations delete
                                     the dht-link at
                                     node-3

t5:         Do setattr at dht-link
            created at t2 above.
            NOTE: It will fail
            as at t4, rebalance-2
            deleted dht-link file

t6:                                  Create symbolic-link
                                      at node3

Comment 11 Susant Kumar Palai 2015-11-27 11:53:16 UTC
Cloning this to 3.1. To be fixed in future.