Bug 765492 (GLUSTER-3760) - [glusterfs-3.2.5qa2]: gfid differs on different subvolumes
Summary: [glusterfs-3.2.5qa2]: gfid differs on different subvolumes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3760
Product: GlusterFS
Classification: Community
Component: replicate
Version: pre-release
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-28 09:17 UTC by Raghavendra Bhat
Modified: 2012-02-20 15:21 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Raghavendra Bhat 2011-10-28 09:17:33 UTC
gfid of an entry is different on different subvolumes.

2x2 distribted replicate setup. 2 fuse and 1 nfs client. 1 fuse client was running untarring of linux kernel tarball and find <mount point> | xargs stat, other fuse client was running fileop. nfs client running fs-perf test.

brought down one brick, slept and brought it up. volume set operations were going on.

Sometime after the brick was bought up did ls on one of the fuse clients.


ls
ls: cannot access fileop_L1_2: Input/output error
ls: cannot access fileop_L1_5: Input/output error
ls: cannot access fileop_L1_6: Input/output error
ls: cannot access fileop_L1_7: Input/output error
ls: cannot access fileop_L1_9: Input/output error
ls: cannot access fileop_L1_11: Input/output error
a.out       fileop_L1_0   fileop_L1_11  fileop_L1_14  fileop_L1_2  fileop_L1_5  fileop_L1_8    in                 okpa
core.25797  fileop_L1_1   fileop_L1_12  fileop_L1_15  fileop_L1_3  fileop_L1_6  fileop_L1_9    kernel_compile.sh  out
dir         fileop_L1_10  fileop_L1_13  fileop_L1_17  fileop_L1_4  fileop_L1_7  glusterfs.git  linux-2.6.31.1     rdd.c



[2011-10-28 05:05:13.466838] I [afr-common.c:982:afr_launch_self_heal] 0-mirror-replicate-1: background  meta-data entry missing-entry self-he
al triggered. path: /dir/fileop_L1_13
[2011-10-28 05:05:13.468197] I [afr-self-heal-common.c:1826:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-mirror-replicate-1: Non blocking entr
ylks failed.
[2011-10-28 05:05:13.469708] W [afr-common.c:1065:afr_conflicting_iattrs] 0-mirror-replicate-1: /dir/fileop_L1_13: gfid differs on subvolume 1
 (7b4931cc-39c7-4bc6-a259-7983fa802ca2, 101162b2-485f-4e2d-9af6-d04fffb4acb6)
[2011-10-28 05:05:13.469732] E [afr-self-heal-common.c:1310:afr_sh_common_lookup_cbk] 0-mirror-replicate-1: Conflicting entries for /dir/fileo
p_L1_13
[2011-10-28 05:05:13.474530] W [afr-common.c:1065:afr_conflicting_iattrs] 0-mirror-replicate-1: /dir/fileop_L1_13: gfid differs on subvolume 1
 (7b4931cc-39c7-4bc6-a259-7983fa802ca2, 101162b2-485f-4e2d-9af6-d04fffb4acb6)
[2011-10-28 05:05:13.474561] E [afr-self-heal-common.c:1310:afr_sh_common_lookup_cbk] 0-mirror-replicate-1: Conflicting entries for /dir/fileo
p_L1_13
[2011-10-28 05:05:13.475275] E [afr-self-heal-common.c:2041:afr_self_heal_completion_cbk] 0-mirror-replicate-1: background  meta-data entry mi
ssing-entry self-heal failed on /dir/fileop_L1_13
[2011-10-28 05:05:13.475299] I [dht-layout.c:581:dht_layout_normalize] 0-mirror-dht: found anomalies in /dir/fileop_L1_13. holes=1 overlaps=0
[2011-10-28 05:05:13.475309] I [dht-selfheal.c:576:dht_selfheal_directory] 0-mirror-dht: 1 subvolumes have unrecoverable errors
[2011-10-28 05:05:13.476253] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-mirror-client-2: remote operation failed: Stale NFS file handle
[2011-10-28 05:05:13.483881] W [afr-common.c:1065:afr_conflicting_iattrs] 0-mirror-replicate-1: /dir/fileop_L1_14: gfid differs on subvolume 1
 (c9a75a75-fd3f-478b-a5b7-a220a8fa1081, a173bd67-dbfc-489e-a5bd-0add55a0dbe1)
[2011-10-28 05:05:13.483929] W [afr-common.c:1065:afr_conflicting_iattrs] 0-mirror-replicate-1: /dir/fileop_L1_14: gfid differs on subvolume 1
 (c9a75a75-fd3f-478b-a5b7-a220a8fa1081, a173bd67-dbfc-489e-a5bd-0add55a0dbe1)
[2011-10-28 05:05:13.483945] W [afr-common.c:826:afr_detect_self_heal_by_iatt] 0-mirror-replicate-1: /dir/fileop_L1_14: gfid different on subv
olume
[2011-10-28 05:05:13.483962] I [afr-common.c:982:afr_launch_self_heal] 0-mirror-replicate-1: background  meta-data entry missing-entry self-he
al triggered. path: /dir/fileop_L1_14
[2011-10-28 05:05:13.484175] I [afr-self-heal-common.c:1826:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-mirror-replicate-1: Non blocking entr
ylks failed.
[2011-10-28 05:05:13.485294] W [afr-common.c:1065:afr_conflicting_iattrs] 0-mirror-replicate-1: /dir/fileop_L1_14: gfid differs on subvolume 1
 (c9a75a75-fd3f-478b-a5b7-a220a8fa1081, a173bd67-dbfc-489e-a5bd-0add55a0dbe1)
[2011-10-28 05:05:13.485317] E [afr-self-heal-common.c:1310:afr_sh_common_lookup_cbk] 0-mirror-replicate-1: Conflicting entries for /dir/fileo
p_L1_14
[2011-10-28 05:05:13.487465] W [afr-common.c:1065:afr_conflicting_iattrs] 0-mirror-replicate-1: /dir/fileop_L1_14: gfid differs on subvolume 1
 (c9a75a75-fd3f-478b-a5b7-a220a8fa1081, a173bd67-dbfc-489e-a5bd-0add55a0dbe1)
[2011-10-28 05:05:13.487489] E [afr-self-heal-common.c:1310:afr_sh_common_lookup_cbk] 0-mirror-replicate-1: Conflicting entries for /dir/fileo
p_L1_14
[2011-10-28 05:05:13.488109] E [afr-self-heal-common.c:2041:afr_self_heal_completion_cbk] 0-mirror-replicate-1: background  meta-data entry mi
ssing-entry self-heal failed on /dir/fileop_L1_14



getfattr -d -m . -e hex /export/mirror/dir/fileop_L1_13
getfattr: Removing leading '/' from absolute path names
# file: export/mirror/dir/fileop_L1_13
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.gfid=0x101162b2485f4e2d9af6d04fffb4acb6
trusted.glusterfs.dht=0x00000001000000007fffffffffffffff
trusted.glusterfs.quota.a395ef87-af01-4871-ae1a-1577fdc43b71.contri=0x0000000002320000
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x0000000002320000



getfattr -d -m . -e hex /export/mirror/dir/fileop_L1_13
getfattr: Removing leading '/' from absolute path names
# file: export/mirror/dir/fileop_L1_13
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.gfid=0x101162b2485f4e2d9af6d04fffb4acb6
trusted.glusterfs.dht=0x00000001000000007fffffffffffffff
trusted.glusterfs.quota.a395ef87-af01-4871-ae1a-1577fdc43b71.contri=0x0000000002320000
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x0000000002320000



getfattr -d -m . -e hex /export/mirror/dir/fileop_L1_13
getfattr: Removing leading '/' from absolute path names
# file: export/mirror/dir/fileop_L1_13
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.gfid=0x7b4931cc39c74bc6a2597983fa802ca2
trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
trusted.glusterfs.quota.a395ef87-af01-4871-ae1a-1577fdc43b71.contri=0x0000000000000000
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x0000000000000000


getfattr -d -m . -e hex /export/mirror/dir/fileop_L1_13
getfattr: Removing leading '/' from absolute path names
# file: export/mirror/dir/fileop_L1_13
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.mirror-client-2=0x000000000000000100000021
trusted.afr.mirror-client-3=0x000000000000000000000000
trusted.gfid=0x101162b2485f4e2d9af6d04fffb4acb6
trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
trusted.glusterfs.quota.a395ef87-af01-4871-ae1a-1577fdc43b71.contri=0x00000000020f0000
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x00000000020f000

Comment 1 Pranith Kumar K 2011-10-28 13:50:26 UTC
[2011-10-28 05:05:13.483962] I [afr-common.c:982:afr_launch_self_heal]
0-mirror-replicate-1: background  meta-data entry missing-entry self-he
al triggered. path: /dir/fileop_L1_14
[2011-10-28 05:05:13.484175] I
[afr-self-heal-common.c:1826:afr_sh_post_nb_entrylk_conflicting_sh_cbk]
0-mirror-replicate-1: Non blocking entr
ylks failed.

Johnny,
       The logs are not complete, seems like other self-heal is in progress, until it is complete the files would give gfid mismatches I think. I think we would be able to improve this situation to avoid this error, but I need the full logs to confirm my assumptions

Pranith.

Comment 2 Anand Avati 2011-11-04 12:55:57 UTC
CHANGE: http://review.gluster.com/672 (Change-Id: I8a43b5fbe7a90344f490090df853d47b651bc0ff) merged in release-3.2 by Vijay Bellur (vijay)

Comment 3 Anand Avati 2011-11-04 13:54:04 UTC
CHANGE: http://review.gluster.com/673 (*) removed uuid_generate usage in pump and afr) merged in release-3.2 by Vijay Bellur (vijay)

Comment 4 Anand Avati 2011-11-17 02:09:25 UTC
CHANGE: http://review.gluster.com/678 (Change-Id: I7a8bd3b3f9600ced4a945f07447698876933ade0) merged in master by Vijay Bellur (vijay)

Comment 5 Anand Avati 2011-11-18 09:20:37 UTC
CHANGE: http://review.gluster.com/679 (*) removed uuid_generate usage in pump and afr, self-heald) merged in master by Vijay Bellur (vijay)

Comment 6 Anand Avati 2011-11-18 09:21:00 UTC
CHANGE: http://review.gluster.com/680 (Change-Id: I2319258743e478cc3a932d8ff0b2204a97cd4f8e) merged in master by Vijay Bellur (vijay)


Note You need to log in before you can comment on or make changes to this bug.