Bug 1379838 - gluster missing gfid attribute, healing doesn't work
Summary: gluster missing gfid attribute, healing doesn't work
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: selfheal
Version: 3.7.15
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-27 20:08 UTC by Pasi Karkkainen
Modified: 2017-08-14 14:40 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-03-08 11:03:57 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Pasi Karkkainen 2016-09-27 20:08:38 UTC
Description of problem:

I have a pretty basic two-node gluster 3.7 setup on CentOS 7, with a volume replicated/mirrored to both servers.

One of the gluster servers was down for hardware maintenance, and later when it got back up, the healing process started, re-syncing files.

In the beginning there was some 200 files that had to be synced, and after a while the number of files got down to 10, but then healing stopped.. it seems the last 10 files don't seem to get synced no matter what.

So the problem is the healing/re-sync never ends for these files..


Log entries reveal the actual problem:

[2016-09-21 12:41:43.063209] E [MSGID: 113002] [posix.c:252:posix_lookup] 0-gvol1-posix: buf->ia_gfid is null for /bricks/vol1/brick1/foo [No data available]

[2016-09-21 12:41:43.063266] E [MSGID: 115050] [server-rpc-fops.c:179:server_lookup_cbk] 0-gvol1-server: 1484202: LOOKUP /foo
 (00000000-0000-0000-0000-000000000001/foo) ==> (No data available) [No data available]

Manually checking the file in question confirms the problem:

# getfattr -m . -d -e hex /bricks/vol1/brick1/foo
getfattr: Removing leading '/' from absolute path names
# file: bricks/vol1/brick1/foo
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000

no trusted.gfid attribute for the file in question.. I have no clear reason why this happened, but it could be because I've killed the gluster daemons/services exactly on the "wrong" moment while preparing the node for maintenance, exactly when this file in question was being created. But I'm not sure about that..

It seems there was no hardlink either.. nothing in /bricks/vol1/brick1/.glusterfs/c1/ca/ directory.


Checking on another node:

# getfattr -m . -d -e hex /bricks/vol1/brick1/foo
getfattr: Removing leading '/' from absolute path names
# file: bricks/vol1/brick1/foo
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gvol1-client-1=0x000016620000000100000000
trusted.bit-rot.version=0x020000000000000057e00db5000624ed
trusted.gfid=0xc1ca778ed2af4828b981171c0c5bd45e

So there we have the gfid..

After manually setting the trusted.gfid attribute value on the file, and launching heal again,  now gluster was able to heal the file OK, and continue with next files. Healing got fully completed now, and there's no out-of-sync files anymore.


Pranith Kumar Karampuri on gluster-users mailinglist asked me to create this bugzilla entry.


Version-Release number of selected component (if applicable):
gluster 3.7.15 from centos7 storage SIG gluster37 repo.


Steps to Reproduce:
1. See above.
2.
3.

Actual results:
healing doesn't finish if there are files without gfid.

Expected results:
Healing continues even if there are files without gfid.

Comment 1 Kaushal 2017-03-08 11:03:57 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.


Note You need to log in before you can comment on or make changes to this bug.