Bug 1028287

Summary: DHT:REBALANCE- statfs failures are seen during rebalance
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: shylesh <shmohan>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED DEFERRED QA Contact: storage-qa-internal <storage-qa-internal>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.1CC: spalai, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1286200 1286207 (view as bug list) Environment:
Last Closed: 2015-11-27 12:26:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1286200, 1286207    

Description shylesh 2013-11-08 06:14:36 UTC
Description of problem:
while migrating files there are some statfs failures on some files while calculating free space. In the status this file is counted as  'skipped' 

Version-Release number of selected component (if applicable):
3.4.0.39rhs-1.el6rhs.x86_64

How reproducible:
Always

Steps to Reproduce:
1.created a 6x2 distributed-replicate volume
2.created deep directory  using following script upto 6 levels of depth and 6 width
------------------------------------------
makedir () {
        local depth=$1
        local n=$2
       
        if [ $depth -eq 0 ]; then
                return
        fi


        for i in `seq 0  $2`
        do
                mkdir $i
                dd if=/dev/urandom of=file.$i bs=512k count=1
        done


        depth=$(($depth - 1))

        for i in `seq 0 $2`
        do
                pushd .
                cd $i
                makedir  $depth $2
                popd
        done


}

makedir  $1 $2 
------------------------------------------------------------------------

3. add-brick and start rebalance

Actual results:
I could see some of the file count in 'skipped' column of status 
further investigation of logs says 

"[2013-11-07 11:36:07.527205] E [dht-rebalance.c:357:__dht_check_free_space] 0-dist-rep-dht: failed to get statfs of /1/5/0/0/6/file.6 on dist-rep-replicate-1 (No s
uch file or directory)"


 
[root@rhs-client4 mnt]# getfattr -d -m . -e text /home/dist-rep3//1/5/0/0/6/file.6
getfattr: Removing leading '/' from absolute path names
# file: home/dist-rep3//1/5/0/0/6/file.6
trusted.gfid="��k�E����GC�m�"
trusted.glusterfs.dht.linkto="dist-rep-replicate-2"
trusted.glusterfs.quota.ed8f09e1-46f2-4e0c-9bd5-5bd6c2f38cd1.contri="\000\000\000\000\000\000\000"
trusted.pgfid.ed8f09e1-46f2-4e0c-9bd5-5bd6c2f38cd1="\000\000\000"


from dist-rep-replicate-2
------------------------

[root@rhs-client9 ~]# getfattr -d -m . -e text /home/dist-rep4//1/5/0/0/6/file.6
getfattr: Removing leading '/' from absolute path names
# file: home/dist-rep4//1/5/0/0/6/file.6
trusted.afr.dist-rep-client-4="\000\000\000\000\000\000\000\000\000\000\000"
trusted.afr.dist-rep-client-5="\000\000\000\000\000\000\000\000\000\000\000"
trusted.gfid="��k�E����GC�m�"
trusted.glusterfs.quota.ed8f09e1-46f2-4e0c-9bd5-5bd6c2f38cd1.contri="\000\000\000\000\00\000"
trusted.pgfid.ed8f09e1-46f2-4e0c-9bd5-5bd6c2f38cd1="\000\000\000"

[root@rhs-client4 mnt]# gluster v rebalance dist-rep status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost            15885         7.8GB         59929             0             0      completed          6249.00
      rhs-client9.lab.eng.blr.redhat.com            11332         5.5GB         55572             0             1      completed          6255.00
     rhs-client39.lab.eng.blr.redhat.com            11973         5.8GB         57459             0             0      completed          6246.00
volume rebalance: dist-rep: success: 



volume info
===========
[root@rhs-client4 mnt]# gluster v info dist-rep
 
Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 4d0c8f97-2e0d-4c1d-9628-898df3de12ed
Status: Started
Number of Bricks: 8 x 2 = 16
Transport-type: tcp
Bricks:
Brick1: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep0
Brick2: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep1
Brick3: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep2
Brick4: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep3
Brick5: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep4
Brick6: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep5
Brick7: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep6
Brick8: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep7
Brick9: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep8
Brick10: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep9
Brick11: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep10
Brick12: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep11
Brick13: rhs-client4.lab.eng.blr.redhat.com:/home/dist-rep12
Brick14: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep13
Brick15: rhs-client9.lab.eng.blr.redhat.com:/home/dist-rep14
Brick16: rhs-client39.lab.eng.blr.redhat.com:/home/dist-rep15
Options Reconfigured:
features.quota: on


cluster info
------------
RHS nodes
----------
rhs-client9.lab.eng.blr.redhat.com
rhs-client39.lab.eng.blr.redhat.com
rhs-client4.lab.eng.blr.redhat.com

Mounted on 
----------
rhs-client4.lab.eng.blr.redhat.com:/mnt

 
 attached the sosreports
---------------------