Bug 1707866
Summary: | Thousands of duplicate files in glusterfs mountpoint directory listing | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Sergey <sergemp> |
Component: | core | Assignee: | bugs <bugs> |
Status: | CLOSED UPSTREAM | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.1 | CC: | bugs, moagrawa, pasik |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-12 12:34:10 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sergey
2019-05-08 15:11:44 UTC
(In reply to Sergey from comment #0) > I have something impossible: same filenames are listed multiple times: Based on the information provided for zabbix.pm, the files are listed twice because 2 separate copies of the files exist on different bricks. > > # ls -la /mnt/VOLNAME/ > ... > -rwxrwxr-x 1 root root 3486 Jan 28 2016 check_connections.pl > -rwxr-xr-x 1 root root 153 Dec 7 2014 sigtest.sh > -rwxr-xr-x 1 root root 153 Dec 7 2014 sigtest.sh > -rwxr-xr-x 1 root root 3466 Jan 5 2015 zabbix.pm > -rwxr-xr-x 1 root root 3466 Jan 5 2015 zabbix.pm > > There're about 38981 duplicate files like that. > > The volume itself is a 3 x 2-replica: > > # gluster volume info VOLNAME > Volume Name: VOLNAME > Type: Distributed-Replicate > Volume ID: 41f9096f-0d5f-4ea9-b369-89294cf1be99 > Status: Started > Snapshot Count: 0 > Number of Bricks: 3 x 2 = 6 > Transport-type: tcp > Bricks: > Brick1: gfserver1:/srv/BRICK > Brick2: gfserver2:/srv/BRICK > Brick3: gfserver3:/srv/BRICK > Brick4: gfserver4:/srv/BRICK > Brick5: gfserver5:/srv/BRICK > Brick6: gfserver6:/srv/BRICK > Options Reconfigured: > transport.address-family: inet > nfs.disable: on > cluster.self-heal-daemon: enable > config.transport: tcp > > The "duplicated" file on individual bricks: > > [gfserver1]# ls -la /srv/BRICK/zabbix.pm > ---------T 2 root root 0 Apr 23 2018 /srv/BRICK/zabbix.pm > > [gfserver2]# ls -la /srv/BRICK/zabbix.pm > ---------T 2 root root 0 Apr 23 2018 /srv/BRICK/zabbix.pm > These 2 are linkto files and they are pointing to the data files on gfserver3 and gfserver4. > [gfserver3]# ls -la /srv/BRICK/zabbix.pm > -rwxr-xr-x 2 root root 3466 Jan 5 2015 /srv/BRICK/zabbix.pm > > [gfserver4]# ls -la /srv/BRICK/zabbix.pm > -rwxr-xr-x 2 root root 3466 Jan 5 2015 /srv/BRICK/zabbix.pm > > [gfserver5]# ls -la /srv/BRICK/zabbix.pm > -rwxr-xr-x 2 root root 3466 Jan 5 2015 /srv/BRICK/zabbix.pm > > [gfserver6]# ls -la /srv/BRICK/zabbix.pm > -rwxr-xr-x. 2 root root 3466 Jan 5 2015 /srv/BRICK/zabbix.pm > These are the problematic files. I do not know why or how they ended up existing on these bricks as well. > Attributes: > > [gfserver1]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm > # file: srv/BRICK/zabbix.pm > trusted.afr.VOLNAME-client-1=0x000000000000000000000000 > trusted.afr.VOLNAME-client-4=0x000000000000000000000000 > trusted.gfid=0x422a7ccf018242b58e162a65266326c3 > trusted.glusterfs.dht.linkto=0x6678666565642d7265706c69636174652d3100 > > [gfserver2]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm > # file: srv/BRICK/zabbix.pm > trusted.gfid=0x422a7ccf018242b58e162a65266326c3 > > trusted.gfid2path. > 3b27d24cad4dceef=0x30303030303030302d303030302d303030302d303030302d3030303030 > 303030303030312f7a61626269782e706d > trusted.glusterfs.dht.linkto=0x6678666565642d7265706c69636174652d3100 > > [gfserver3]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm > # file: srv/BRICK/zabbix.pm > trusted.afr.VOLNAME-client-2=0x000000000000000000000000 > trusted.afr.VOLNAME-client-3=0x000000000000000000000000 > trusted.gfid=0x422a7ccf018242b58e162a65266326c3 > > [gfserver4]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm > # file: srv/BRICK/zabbix.pm > trusted.gfid=0x422a7ccf018242b58e162a65266326c3 > > trusted.gfid2path. > 3b27d24cad4dceef=0x30303030303030302d303030302d303030302d303030302d3030303030 > 303030303030312f7a61626269782e706d > > [gfserver5]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm > # file: srv/BRICK/zabbix.pm > trusted.bit-rot.version=0x03000000000000005c4f813c000bc71b > trusted.gfid=0x422a7ccf018242b58e162a65266326c3 > > [gfserver6]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm > # file: srv/BRICK/zabbix.pm > security.selinux=0x73797374656d5f753a6f626a6563745f723a7661725f743a733000 > trusted.bit-rot.version=0x02000000000000005add0ffc000eb66a > trusted.gfid=0x422a7ccf018242b58e162a65266326c3 > > Not sure why exactly it happened... Maybe because some nodes were suddenly > upgraded from centos6's gluster ~3.7 to centos7's 4.1, and some files > happened to be on nodes that they're not supposed to be on. > > Currently all the nodes are online: > > # gluster pool list > UUID Hostname State > aac9e1a5-018f-4d27-9d77-804f0f1b2f13 gfserver5 Connected > 98b22070-b579-4a91-86e3-482cfcc9c8cf gfserver3 Connected > 7a9841a1-c63c-49f2-8d6d-a90ae2ff4e04 gfserver4 Connected > 955f5551-8b42-476c-9eaa-feab35b71041 gfserver6 Connected > 7343d655-3527-4bcf-9d13-55386ccb5f9c gfserver1 Connected > f9c79a56-830d-4056-b437-a669a1942626 gfserver2 Connected > 45a72ab3-b91e-4076-9cf2-687669647217 localhost Connected > > and have glusterfs-3.12.14-1.el6.x86_64 (Centos 6) and > glusterfs-4.1.7-1.el7.x86_64 (Centos 7) installed. > > > Expected result > --------------- > > This looks like a layout issue, so: > > gluster volume rebalance VOLNAME fix-layout start > > should fix it, right? > No, fix layout only changes the layout and this is not a layout problem. This is a problem with duplicate files on the bricks. > > Actual result > ------------- > > I tried: > gluster volume rebalance VOLNAME fix-layout start > gluster volume rebalance VOLNAME start > gluster volume rebalance VOLNAME start force > gluster volume heal VOLNAME full > Those took 5 to 40 minutes to complete, but the duplicates are still there. Can you send the rebalance logs for this volume from all the nodes? How many clients to do you have accessing the volume? Are the duplicate files seen only in the root of the volume or in subdirs as well? Hi Sergy, Do you have any updates on this? Thanks, Mohit Agrawal This bug is moved to https://github.com/gluster/glusterfs/issues/907, and will be tracked there from now on. Visit GitHub issues URL for further details The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |