Bug 1364551 - GlusterFS lost track of 7,800+ file paths preventing self-heal
Summary: GlusterFS lost track of 7,800+ file paths preventing self-heal
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Krutika Dhananjay
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1351515
TreeView+ depends on / blocked
 
Reported: 2016-08-05 17:00 UTC by Peter Portante
Modified: 2017-03-23 05:44 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.8.4-11
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-23 05:44:24 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Peter Portante 2016-08-05 17:00:44 UTC
GlusterFS lost track of the parent directory paths for about 7,800  +files over a three day period where it was unable to track the process of replicating them to one member of the cluster.  The data was all present on two other nodes, so once we rediscovered the paths to those files, GlusterFS was able to restore proper operations.

We suspect there was (and very well still is) networking problems with one or more nodes, due to some frame errors on the nics.  And we have seen evidence of links being dropped and restored at the physical layer.

The self-heal process was only able to list GFIDs, and did not know the file paths.  So we spent about 24 hours scouring the volume based on the dates of the GFID files  to find their respective paths.  Once we were able to "stat" the file via a gluster mount point, all of the files were healed.


Notes taken by Ravi regarding this pbench volume heal issue:
----------------------

August 4th/5th, 2016

Written by Ravi:

List of T-files that need data and metadata heal  on pbench-replicate-5:
<gfid:3a1d56d0-9945-450f-858c-84314dff9f6a> 
<gfid:cc6c1b79-92f9-4728-a7b6-d62c5de50084> 
<gfid:cc2dd7f2-532b-4d15-9dbd-0561cffd50bf> 
<gfid:973cb698-d3d0-45a3-874b-94134c4afd5a> 
<gfid:89ae4bf6-941e-430e-b786-964815e1d70f> 

Subvolumes of pbench-replicate-5 are:
pbench-client-15 
pbench-client-16 
pbench-client-17

Heal is not happening because pbench-client-16  doesn't contain the entry itself.

I have run the find command on the following nodes in a screen session to help identify the file path:

3a1d56d0-9945-450f-858c-84314dff9f6a ==>  gprfs002
cc6c1b79-92f9-4728-a7b6-d62c5de50084 ==>  gprfs012
cc2dd7f2-532b-4d15-9dbd-0561cffd50bf ==>  gprfs001
973cb698-d3d0-45a3-874b-94134c4afd5a ==>  gprfs009 
89ae4bf6-941e-430e-b786-964815e1d70f ==>  gprfs011

Once I have the path, we should be able to do a temporary mount and  stat the files so that the entry gets created. Then the data/metadata  heal must be able to complete. Out of the 5 entries, 3 of them happen to  be symbolic links. I'll update once I get the file paths for those  entries. 

Notes:
The cached subvol for the 5 T  files are pbench-replicate-4 and pbench-replicate-11. The result of ls -l on the bricks of those cached subvols:

1) ls -l .glusterfs/3a/1d/3a1d56d0-9945-450f-858c-84314dff9f6a ==>symlink file
lrwxrwxrwx 2 17932 17932 78 Aug  1 16:13 .glusterfs/3a/1d/3a1d56d0-9945-450f-858c-84314dff9f6a -> /pbench/archive/fs-version-001/overcloud-controller-0/20160801-1341_cbt.tar.xz

2) ls -l .glusterfs/cc/6c/cc6c1b79-92f9-4728-a7b6-d62c5de50084 ==>symlink file
lrwxrwxrwx 2 17932 17932 97 Aug  2 03:23 .glusterfs/cc/6c/cc6c1b79-92f9-4728-a7b6-d62c5de50084 -> /pbench/archive/fs-version-001/dhcp31-124/fio_sdb-sdc-1-job-iodepth-32_2016-08-01_18:59:09.tar.xz

 
3) ls -l .glusterfs/cc/2d/cc2dd7f2-532b-4d15-9dbd-0561cffd50bf ==> Regular file.

4) ls -l .glusterfs/97/3c/973cb698-d3d0-45a3-874b-94134c4afd5a ==> Regular file.

5) ls -l .glusterfs/89/ae/89ae4bf6-941e-430e-b786-964815e1d70f ==> s==>symlink file
lrwxrwxrwx 2 17932 17932 78 Aug  2 23:42 .glusterfs/89/ae/89ae4bf6-941e-430e-b786-964815e1d70f -> /pbench/archive/fs-version-001/overcloud-controller-0/20160802-2300_cbt.tar.xz

From Peter:

To speed things up, we leveraged the knowledge that the top level directories would have timestamps in the same time range as the GFID files, so instead of searching the entire volume, in each of the five cases, we just looked for the most recent top level directories and then performed the find from there.

Comment 2 Peter Portante 2016-08-05 17:02:32 UTC
We were running RHGS 3.1.2 with a set of patches from Vijay, and then upgraded to 3.1.3 during the self-heal process we applied.

Comment 3 Pranith Kumar K 2016-08-13 01:13:18 UTC
hi Peter,
       Based on the info so far it seems to have happened because of a feature called optimistic changelog for directory operations where the entries are marked bad after a failure happens, i.e. there is no pre-operation marking that is done. So if the failure happens in such a way that before the marking is done we lose network connectivity to both the bricks then we lose track of which directory needs healing. We would need sosreports of the machines and sample gfids which went into this state to confirm the theory. If we confirm the theory we will give a volume set option to turn optimistic change log off. There is a way to mount the filesystem to turn this off as well, which we can use until this patch is merged.

Pranith

Comment 15 errata-xmlrpc 2017-03-23 05:44:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.