I have something impossible: same filenames are listed multiple times: # ls -la /mnt/VOLNAME/ ... -rwxrwxr-x 1 root root 3486 Jan 28 2016 check_connections.pl -rwxr-xr-x 1 root root 153 Dec 7 2014 sigtest.sh -rwxr-xr-x 1 root root 153 Dec 7 2014 sigtest.sh -rwxr-xr-x 1 root root 3466 Jan 5 2015 zabbix.pm -rwxr-xr-x 1 root root 3466 Jan 5 2015 zabbix.pm There're about 38981 duplicate files like that. The volume itself is a 3 x 2-replica: # gluster volume info VOLNAME Volume Name: VOLNAME Type: Distributed-Replicate Volume ID: 41f9096f-0d5f-4ea9-b369-89294cf1be99 Status: Started Snapshot Count: 0 Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: gfserver1:/srv/BRICK Brick2: gfserver2:/srv/BRICK Brick3: gfserver3:/srv/BRICK Brick4: gfserver4:/srv/BRICK Brick5: gfserver5:/srv/BRICK Brick6: gfserver6:/srv/BRICK Options Reconfigured: transport.address-family: inet nfs.disable: on cluster.self-heal-daemon: enable config.transport: tcp The "duplicated" file on individual bricks: [gfserver1]# ls -la /srv/BRICK/zabbix.pm ---------T 2 root root 0 Apr 23 2018 /srv/BRICK/zabbix.pm [gfserver2]# ls -la /srv/BRICK/zabbix.pm ---------T 2 root root 0 Apr 23 2018 /srv/BRICK/zabbix.pm [gfserver3]# ls -la /srv/BRICK/zabbix.pm -rwxr-xr-x 2 root root 3466 Jan 5 2015 /srv/BRICK/zabbix.pm [gfserver4]# ls -la /srv/BRICK/zabbix.pm -rwxr-xr-x 2 root root 3466 Jan 5 2015 /srv/BRICK/zabbix.pm [gfserver5]# ls -la /srv/BRICK/zabbix.pm -rwxr-xr-x 2 root root 3466 Jan 5 2015 /srv/BRICK/zabbix.pm [gfserver6]# ls -la /srv/BRICK/zabbix.pm -rwxr-xr-x. 2 root root 3466 Jan 5 2015 /srv/BRICK/zabbix.pm Attributes: [gfserver1]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm # file: srv/BRICK/zabbix.pm trusted.afr.VOLNAME-client-1=0x000000000000000000000000 trusted.afr.VOLNAME-client-4=0x000000000000000000000000 trusted.gfid=0x422a7ccf018242b58e162a65266326c3 trusted.glusterfs.dht.linkto=0x6678666565642d7265706c69636174652d3100 [gfserver2]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm # file: srv/BRICK/zabbix.pm trusted.gfid=0x422a7ccf018242b58e162a65266326c3 trusted.gfid2path.3b27d24cad4dceef=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f7a61626269782e706d trusted.glusterfs.dht.linkto=0x6678666565642d7265706c69636174652d3100 [gfserver3]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm # file: srv/BRICK/zabbix.pm trusted.afr.VOLNAME-client-2=0x000000000000000000000000 trusted.afr.VOLNAME-client-3=0x000000000000000000000000 trusted.gfid=0x422a7ccf018242b58e162a65266326c3 [gfserver4]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm # file: srv/BRICK/zabbix.pm trusted.gfid=0x422a7ccf018242b58e162a65266326c3 trusted.gfid2path.3b27d24cad4dceef=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f7a61626269782e706d [gfserver5]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm # file: srv/BRICK/zabbix.pm trusted.bit-rot.version=0x03000000000000005c4f813c000bc71b trusted.gfid=0x422a7ccf018242b58e162a65266326c3 [gfserver6]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm # file: srv/BRICK/zabbix.pm security.selinux=0x73797374656d5f753a6f626a6563745f723a7661725f743a733000 trusted.bit-rot.version=0x02000000000000005add0ffc000eb66a trusted.gfid=0x422a7ccf018242b58e162a65266326c3 Not sure why exactly it happened... Maybe because some nodes were suddenly upgraded from centos6's gluster ~3.7 to centos7's 4.1, and some files happened to be on nodes that they're not supposed to be on. Currently all the nodes are online: # gluster pool list UUID Hostname State aac9e1a5-018f-4d27-9d77-804f0f1b2f13 gfserver5 Connected 98b22070-b579-4a91-86e3-482cfcc9c8cf gfserver3 Connected 7a9841a1-c63c-49f2-8d6d-a90ae2ff4e04 gfserver4 Connected 955f5551-8b42-476c-9eaa-feab35b71041 gfserver6 Connected 7343d655-3527-4bcf-9d13-55386ccb5f9c gfserver1 Connected f9c79a56-830d-4056-b437-a669a1942626 gfserver2 Connected 45a72ab3-b91e-4076-9cf2-687669647217 localhost Connected and have glusterfs-3.12.14-1.el6.x86_64 (Centos 6) and glusterfs-4.1.7-1.el7.x86_64 (Centos 7) installed. Expected result --------------- This looks like a layout issue, so: gluster volume rebalance VOLNAME fix-layout start should fix it, right? Actual result ------------- I tried: gluster volume rebalance VOLNAME fix-layout start gluster volume rebalance VOLNAME start gluster volume rebalance VOLNAME start force gluster volume heal VOLNAME full Those took 5 to 40 minutes to complete, but the duplicates are still there.
(In reply to Sergey from comment #0) > I have something impossible: same filenames are listed multiple times: Based on the information provided for zabbix.pm, the files are listed twice because 2 separate copies of the files exist on different bricks. > > # ls -la /mnt/VOLNAME/ > ... > -rwxrwxr-x 1 root root 3486 Jan 28 2016 check_connections.pl > -rwxr-xr-x 1 root root 153 Dec 7 2014 sigtest.sh > -rwxr-xr-x 1 root root 153 Dec 7 2014 sigtest.sh > -rwxr-xr-x 1 root root 3466 Jan 5 2015 zabbix.pm > -rwxr-xr-x 1 root root 3466 Jan 5 2015 zabbix.pm > > There're about 38981 duplicate files like that. > > The volume itself is a 3 x 2-replica: > > # gluster volume info VOLNAME > Volume Name: VOLNAME > Type: Distributed-Replicate > Volume ID: 41f9096f-0d5f-4ea9-b369-89294cf1be99 > Status: Started > Snapshot Count: 0 > Number of Bricks: 3 x 2 = 6 > Transport-type: tcp > Bricks: > Brick1: gfserver1:/srv/BRICK > Brick2: gfserver2:/srv/BRICK > Brick3: gfserver3:/srv/BRICK > Brick4: gfserver4:/srv/BRICK > Brick5: gfserver5:/srv/BRICK > Brick6: gfserver6:/srv/BRICK > Options Reconfigured: > transport.address-family: inet > nfs.disable: on > cluster.self-heal-daemon: enable > config.transport: tcp > > The "duplicated" file on individual bricks: > > [gfserver1]# ls -la /srv/BRICK/zabbix.pm > ---------T 2 root root 0 Apr 23 2018 /srv/BRICK/zabbix.pm > > [gfserver2]# ls -la /srv/BRICK/zabbix.pm > ---------T 2 root root 0 Apr 23 2018 /srv/BRICK/zabbix.pm > These 2 are linkto files and they are pointing to the data files on gfserver3 and gfserver4. > [gfserver3]# ls -la /srv/BRICK/zabbix.pm > -rwxr-xr-x 2 root root 3466 Jan 5 2015 /srv/BRICK/zabbix.pm > > [gfserver4]# ls -la /srv/BRICK/zabbix.pm > -rwxr-xr-x 2 root root 3466 Jan 5 2015 /srv/BRICK/zabbix.pm > > [gfserver5]# ls -la /srv/BRICK/zabbix.pm > -rwxr-xr-x 2 root root 3466 Jan 5 2015 /srv/BRICK/zabbix.pm > > [gfserver6]# ls -la /srv/BRICK/zabbix.pm > -rwxr-xr-x. 2 root root 3466 Jan 5 2015 /srv/BRICK/zabbix.pm > These are the problematic files. I do not know why or how they ended up existing on these bricks as well. > Attributes: > > [gfserver1]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm > # file: srv/BRICK/zabbix.pm > trusted.afr.VOLNAME-client-1=0x000000000000000000000000 > trusted.afr.VOLNAME-client-4=0x000000000000000000000000 > trusted.gfid=0x422a7ccf018242b58e162a65266326c3 > trusted.glusterfs.dht.linkto=0x6678666565642d7265706c69636174652d3100 > > [gfserver2]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm > # file: srv/BRICK/zabbix.pm > trusted.gfid=0x422a7ccf018242b58e162a65266326c3 > > trusted.gfid2path. > 3b27d24cad4dceef=0x30303030303030302d303030302d303030302d303030302d3030303030 > 303030303030312f7a61626269782e706d > trusted.glusterfs.dht.linkto=0x6678666565642d7265706c69636174652d3100 > > [gfserver3]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm > # file: srv/BRICK/zabbix.pm > trusted.afr.VOLNAME-client-2=0x000000000000000000000000 > trusted.afr.VOLNAME-client-3=0x000000000000000000000000 > trusted.gfid=0x422a7ccf018242b58e162a65266326c3 > > [gfserver4]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm > # file: srv/BRICK/zabbix.pm > trusted.gfid=0x422a7ccf018242b58e162a65266326c3 > > trusted.gfid2path. > 3b27d24cad4dceef=0x30303030303030302d303030302d303030302d303030302d3030303030 > 303030303030312f7a61626269782e706d > > [gfserver5]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm > # file: srv/BRICK/zabbix.pm > trusted.bit-rot.version=0x03000000000000005c4f813c000bc71b > trusted.gfid=0x422a7ccf018242b58e162a65266326c3 > > [gfserver6]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm > # file: srv/BRICK/zabbix.pm > security.selinux=0x73797374656d5f753a6f626a6563745f723a7661725f743a733000 > trusted.bit-rot.version=0x02000000000000005add0ffc000eb66a > trusted.gfid=0x422a7ccf018242b58e162a65266326c3 > > Not sure why exactly it happened... Maybe because some nodes were suddenly > upgraded from centos6's gluster ~3.7 to centos7's 4.1, and some files > happened to be on nodes that they're not supposed to be on. > > Currently all the nodes are online: > > # gluster pool list > UUID Hostname State > aac9e1a5-018f-4d27-9d77-804f0f1b2f13 gfserver5 Connected > 98b22070-b579-4a91-86e3-482cfcc9c8cf gfserver3 Connected > 7a9841a1-c63c-49f2-8d6d-a90ae2ff4e04 gfserver4 Connected > 955f5551-8b42-476c-9eaa-feab35b71041 gfserver6 Connected > 7343d655-3527-4bcf-9d13-55386ccb5f9c gfserver1 Connected > f9c79a56-830d-4056-b437-a669a1942626 gfserver2 Connected > 45a72ab3-b91e-4076-9cf2-687669647217 localhost Connected > > and have glusterfs-3.12.14-1.el6.x86_64 (Centos 6) and > glusterfs-4.1.7-1.el7.x86_64 (Centos 7) installed. > > > Expected result > --------------- > > This looks like a layout issue, so: > > gluster volume rebalance VOLNAME fix-layout start > > should fix it, right? > No, fix layout only changes the layout and this is not a layout problem. This is a problem with duplicate files on the bricks. > > Actual result > ------------- > > I tried: > gluster volume rebalance VOLNAME fix-layout start > gluster volume rebalance VOLNAME start > gluster volume rebalance VOLNAME start force > gluster volume heal VOLNAME full > Those took 5 to 40 minutes to complete, but the duplicates are still there. Can you send the rebalance logs for this volume from all the nodes? How many clients to do you have accessing the volume? Are the duplicate files seen only in the root of the volume or in subdirs as well?
Hi Sergy, Do you have any updates on this? Thanks, Mohit Agrawal
This bug is moved to https://github.com/gluster/glusterfs/issues/907, and will be tracked there from now on. Visit GitHub issues URL for further details
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days