Bug 765442 (GLUSTER-3710) - duplicate file names on replicated volume
Summary: duplicate file names on replicated volume
Keywords:
Status: CLOSED NEXTRELEASE
Alias: GLUSTER-3710
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.1.7
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-11 02:47 UTC by Jim
Modified: 2013-03-21 10:04 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-21 10:04:35 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Jim 2011-10-11 02:47:16 UTC
several instances of "ls -lah" at the same time will cause inconsistent file listing. 

--

gluster volume info

Volume Name: REPLICA_T2
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: moodle-tst19:/mnt/brick_t2/brick
Brick2: moodle-tst87:/mnt/brick_t2/brick

df -h
...
...
glusterfs#moodle-tst87:/REPLICA_T2   493G  198M  467G   1% /mnt/REPLICA_T2

using the following script, 10,000 files was created in /mnt/REPLICA_T2

makefiles.sh
--
#!/bin/bash
for (( i<$2 ; i<$3 ; i++ )) do touch "$1$i" ; done
--

sh makefiles.sh File 10000 10000

[root@brick01-b87 REPLICA_T2]# ls -lah | wc 
  10004   90029  509044

running several instances of ls at the same time causes this


[root@brick01-b87 REPLICA_T2]# ls -lah | wc &
[1] 17336
[root@brick01-b87 REPLICA_T2]# ls -lah | wc &
[2] 17338
[root@brick01-b87 REPLICA_T2]# ls -lah | wc &
[3] 17340
[root@brick01-b87 REPLICA_T2]# ls -lah | wc &
[4] 17342
[root@brick01-b87 REPLICA_T2]# ls -lah | wc &
[5] 17344
[root@brick01-b87 REPLICA_T2]# ls -lah | wc &
[6] 17346
   9966   89687  507115
   9857   88706  501557
   9854   88679  501421
   9822   88391  499781
   9845   88598  500937
   9935   89408  505505

Comment 1 Pranith Kumar K 2011-10-11 04:28:43 UTC
(In reply to comment #0)
> several instances of "ls -lah" at the same time will cause inconsistent file
> listing. 
> 
> --
> 
> gluster volume info
> 
> Volume Name: REPLICA_T2
> Type: Replicate
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: moodle-tst19:/mnt/brick_t2/brick
> Brick2: moodle-tst87:/mnt/brick_t2/brick
> 
> df -h
> ...
> ...
> glusterfs#moodle-tst87:/REPLICA_T2   493G  198M  467G   1% /mnt/REPLICA_T2
> 
> using the following script, 10,000 files was created in /mnt/REPLICA_T2
> 
> makefiles.sh
> --
> #!/bin/bash
> for (( i<$2 ; i<$3 ; i++ )) do touch "$1$i" ; done
> --
> 
> sh makefiles.sh File 10000 10000
> 
> [root@brick01-b87 REPLICA_T2]# ls -lah | wc 
>   10004   90029  509044
> 
> running several instances of ls at the same time causes this
> 
> 
> [root@brick01-b87 REPLICA_T2]# ls -lah | wc &
> [1] 17336
> [root@brick01-b87 REPLICA_T2]# ls -lah | wc &
> [2] 17338
> [root@brick01-b87 REPLICA_T2]# ls -lah | wc &
> [3] 17340
> [root@brick01-b87 REPLICA_T2]# ls -lah | wc &
> [4] 17342
> [root@brick01-b87 REPLICA_T2]# ls -lah | wc &
> [5] 17344
> [root@brick01-b87 REPLICA_T2]# ls -lah | wc &
> [6] 17346
>    9966   89687  507115
>    9857   88706  501557
>    9854   88679  501421
>    9822   88391  499781
>    9845   88598  500937
>    9935   89408  505505

hi jim,
    Thanks for the precise test case. Unfortunately it works fine on my machine
root @ /mnt/client
12:55:04 :) $   10003   90020  468994
  10003   90020  468994
  10003   90020  468994
  10003   90020  468994
  10003   90020  468994
  10003   90020  468994
  10003   90020  468994

Could you let me know if the client and brick both of them are working with the version 3.1.7? Could you let us know a bit more about your setup, like
what is the OS, filesystem types of the backends

Comment 2 Jim 2011-10-11 05:17:12 UTC
Client and Server both are on 3.1.7, I've also tried 3.1.6 as that was the version listing this problem as fixed.

both bricks and client are the same OS and same patch level:-

[root@brick01-b19 ~]# uname -r
2.6.18-238.5.1.0.1.el5PAE
[root@brick01-b19 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)

Each storage brick has a lun from an EVA 8000. B19 brick has a lun from B19EVA, B87 brick has a lun from B87EVA. 

The LUNs are multipathed.

[root@brick01-b19 ~]# multipath -ll
mpatha (3600508b400074a6b0000b000099d0000) dm-2 HP,HSV210
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 0:0:7:1 sdp        8:240 active ready  running
  |- 0:0:6:1 sdo        8:224 active ready  running
  |- 0:0:5:1 sdn        8:208 active ready  running
  |- 0:0:4:1 sdm        8:192 active ready  running
  |- 1:0:7:1 sdh        8:112 active ready  running
  |- 1:0:6:1 sdg        8:96  active ready  running
  |- 1:0:5:1 sdf        8:80  active ready  running
  `- 1:0:4:1 sde        8:64  active ready  running

I have tried using the LUN with LVM, I've also tried formatting and mounting the LUN directly. 

I have tried formatting the LUNs with ext3, and with ext4. 

In every case I can repeat the fault easily.

Comment 3 Jim 2011-10-11 20:16:32 UTC
I have done some additional testing. Basically the same test in a sub directory of the gluster file system, rather than in the root. This appears much more reliable, but still suffers the same problem.

[root@brick01-b19 ~]# ls -lah /mnt/REPLICA_T2/test2/ | wc &
[1] 25225
...
...
[root@brick01-b19 ~]# ls -lah /mnt/REPLICA_T2/test2/ | wc &
[19] 25227
[root@brick01-b19 ~]#   10004   90029  499041
  10004   90029  499041
  10004   90029  499041
   9999   89984  498792
  10004   90029  499041
  10004   90029  499041
  10004   90029  499041
  10004   90029  499041
  10004   90029  499041
  10004   90029  499041
  10004   90029  499041
  10004   90029  499041
  10004   90029  499041
  10004   90029  499041
  10004   90029  499041
  10004   90029  499041
  10004   90029  499041
  10004   90029  499041
  10004   90029  499041

this is the original test case, in the root of the file system.

[root@brick01-b19 ~]# ls -lah /mnt/REPLICA_T2/ | wc &
[1] 25298
...
...
[root@brick01-b19 ~]# ls -lah /mnt/REPLICA_T2/ | wc &
[5] 25306
[root@brick01-b19 ~]#    9997   89966  508687
  10020   90173  509851
  10001   90002  508888
   9983   89840  507977
   9996   89957  508634

Comment 4 Jim 2011-10-15 01:58:26 UTC
I have upgraded my environment to x86_64 EL6 and can still produce the same issue.

[root@brick01-b87 ~]# uname -r
2.6.32-100.34.1.el6uek.x86_64
[root@brick01-b87 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.1 (Santiago)
[root@brick01-b87 ~]# cd /mnt/REPLICA_T2
[root@brick01-b87 REPLICA_T2]# ls -lah | wc &
[1] 2818
[root@brick01-b87 REPLICA_T2]# ls -lah | wc &
[2] 2820
[root@brick01-b87 REPLICA_T2]# ls -lah | wc &
[3] 2822
[root@brick01-b87 REPLICA_T2]# ls -lah | wc &
[4] 2824
[root@brick01-b87 REPLICA_T2]# ls -lah | wc &
[5] 2827
[root@brick01-b87 REPLICA_T2]#   10001   90002  508895
  10007   90056  509202
   9998   89975  508741
   9998   89975  508740
  10007   90056  509195

Comment 5 Pranith Kumar K 2013-02-22 10:24:42 UTC
Jim,
     We were un-able to re-create the bug in our setup. Do you still observe this issue with 3.3.x releases?

Pranith

Comment 6 Pranith Kumar K 2013-03-14 08:55:45 UTC
Jim,
    Is the mount fuse mount or nfs mount?? Do you hangout on gluster IRC? Could you let me know your nick if you do.

Pranith.

Comment 7 Pranith Kumar K 2013-03-21 10:04:35 UTC
Jim,
     There are no similar bugs reported on future versions about the same issue. Please feel free to re-open if you see the issue in any of the future versions.
If you want to talk to me about this bug on gluster IRC, please add a comment with your nick and we shall discuss.

Pranith


Note You need to log in before you can comment on or make changes to this bug.