Description of problem: I have a pretty basic two-node gluster 3.7 setup on CentOS 7, with a volume replicated/mirrored to both servers. One of the gluster servers was down for hardware maintenance, and later when it got back up, the healing process started, re-syncing files. In the beginning there was some 200 files that had to be synced, and after a while the number of files got down to 10, but then healing stopped.. it seems the last 10 files don't seem to get synced no matter what. So the problem is the healing/re-sync never ends for these files.. Log entries reveal the actual problem: [2016-09-21 12:41:43.063209] E [MSGID: 113002] [posix.c:252:posix_lookup] 0-gvol1-posix: buf->ia_gfid is null for /bricks/vol1/brick1/foo [No data available] [2016-09-21 12:41:43.063266] E [MSGID: 115050] [server-rpc-fops.c:179:server_lookup_cbk] 0-gvol1-server: 1484202: LOOKUP /foo (00000000-0000-0000-0000-000000000001/foo) ==> (No data available) [No data available] Manually checking the file in question confirms the problem: # getfattr -m . -d -e hex /bricks/vol1/brick1/foo getfattr: Removing leading '/' from absolute path names # file: bricks/vol1/brick1/foo security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000 no trusted.gfid attribute for the file in question.. I have no clear reason why this happened, but it could be because I've killed the gluster daemons/services exactly on the "wrong" moment while preparing the node for maintenance, exactly when this file in question was being created. But I'm not sure about that.. It seems there was no hardlink either.. nothing in /bricks/vol1/brick1/.glusterfs/c1/ca/ directory. Checking on another node: # getfattr -m . -d -e hex /bricks/vol1/brick1/foo getfattr: Removing leading '/' from absolute path names # file: bricks/vol1/brick1/foo security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gvol1-client-1=0x000016620000000100000000 trusted.bit-rot.version=0x020000000000000057e00db5000624ed trusted.gfid=0xc1ca778ed2af4828b981171c0c5bd45e So there we have the gfid.. After manually setting the trusted.gfid attribute value on the file, and launching heal again, now gluster was able to heal the file OK, and continue with next files. Healing got fully completed now, and there's no out-of-sync files anymore. Pranith Kumar Karampuri on gluster-users mailinglist asked me to create this bugzilla entry. Version-Release number of selected component (if applicable): gluster 3.7.15 from centos7 storage SIG gluster37 repo. Steps to Reproduce: 1. See above. 2. 3. Actual results: healing doesn't finish if there are files without gfid. Expected results: Healing continues even if there are files without gfid.
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.