Description of problem: some entry can not be healed because of empty gfid Version-Release number of selected component (if applicable): 3.12.15 # gluster v info services Volume Name: services Type: Replicate Volume ID: 32b6bb97-4d0a-4096-9cfa-4cf0385bed31 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 169.254.0.31:/mnt/bricks/services/brick Brick2: 169.254.0.49:/mnt/bricks/services/brick Options Reconfigured: performance.client-io-threads: off server.allow-insecure: on network.ping-timeout: 42 cluster.consistent-metadata: on cluster.favorite-child-policy: mtime cluster.server-quorum-type: none transport.address-family: inet nfs.disable: on cluster.server-quorum-ratio: 51% How reproducible: Steps to Reproduce: 1.start io on one glusterfs client node 2.hard reboot all 3 storage nodes (sn-0 sn-1 has brick, sn-2 is quorum) 3.sometimes this problem appear Actual results: Expected results: Additional info: 1>"/" keeps showing up in command "gluster v heal services info",seems glustershd can not finish healing this "/" of services volume. when i check the glutershd log on sn-0 node, there are following output, repeatedly. 2>there is one entry fstest_6491509c4500d56f6fc4a621efc970bd___symlink_00_t existing only on sn-1 node(not exist on sn-0 node) "/mnt/bricks/services/brick" directory, and the xattr of it is empty. [question]: 1>i check the glusterfs heal related code, do not find much difference between glusterfs 3.12.15(we are using) and the latest version 6.5,is this issue a known one? do you think this issue also exits on latest version? 2>in this case sn-0 "/" accuse sn-1, and sn-0 shd try to remove this entry from sn-1, but failed, is this the error and the cause of this issue? [glustershd log on sn-0]: [2019-09-03 07:10:50.003265] I [MSGID: 108026] [afr-self-heald.c:432:afr_shd_index_heal] 0-services-replicate-0: got entry: 00000000-0000-0000-0000-000000000001 from services-client-0 [2019-09-03 07:10:50.003476] I [MSGID: 108026] [afr-self-heald.c:341:afr_shd_selfheal] 0-services-replicate-0: entry: path /, gfid: 00000000-0000-0000-0000-000000000001 [2019-09-03 07:10:50.006066] I [MSGID: 108026] [afr-self-heal-entry.c:893:afr_selfheal_entry_do] 0-services-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001 [2019-09-03 07:10:50.017819] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-services-replicate-0: expunging file 00000000-0000-0000-0000-000000000001/fstest_6491509c4500d56f6fc4a621efc970bd___symlink_00_t (00000000-0000-0000-0000-000000000000) on services-client-1 [root@SN-0(RCP-1234) /mnt/bricks/services/brick] # gluster v heal services info Brick 169.254.0.31:/mnt/bricks/services/brick / Status: Connected Number of entries: 1 Brick 169.254.0.49:/mnt/bricks/services/brick Status: Connected Number of entries: 0 [root@SN-0(RCP-1234) /mnt/bricks/services/brick] # ls -l total 92 drwxr-xr-x 9 _nokfssysalarmprocessor _nokfssysalarmprocessor 4096 Sep 2 14:04 AlarmFileSystem drw------- 2 root root 4096 Sep 2 14:03 backup drwxr-xr-x 3 root root 4096 Sep 2 14:04 CLM drwxr-xr-x 3 root root 4096 Sep 2 14:03 cmf drwxr-xr-x 3 root root 4096 Sep 2 14:03 commandcalendar drwxrwx--- 2 root _nokrcpsysdcif 4096 Sep 2 14:06 commoncollector drwxr-xr-x 2 root root 4096 Sep 2 14:01 coredumper drwxr-xr-x 3 root root 4096 Sep 2 14:04 db drwx------ 5 root root 4096 Sep 2 14:04 EventCorrelationEngine drwx------ 8 root root 4096 Sep 2 14:15 hypertracer drwxrwx---+ 2 root root 4096 Sep 2 14:02 LCM drwxr-xr-x+ 2 root root 4096 Sep 2 14:01 LDAPUserInfo drwxr-xr-x 4 root root 4096 Sep 2 14:01 lightcm -rw-r--r-- 2 root root 0 Sep 2 14:01 LMN-0_recover_flag -rw-r--r-- 2 root root 0 Sep 2 14:04 LMN-1_recover_flag drwxr-xr-x 2 root root 4096 Sep 2 14:05 lockd drwxr-xr-x 2 root root 4096 Sep 2 14:04 Log drwxr-xr-x 3 _nokfssyspm9 _nokfssyspm9 4096 Sep 2 14:04 PM9 drw------- 2 root root 4096 Sep 2 14:03 RCP_Backup drwxr-xr-x 4 root root 4096 Sep 2 14:04 RCPPTEngine drwxr-xr-x 2 root root 4096 Sep 2 14:01 TestDBDump [root@SN-0(RCP-1234) /mnt/bricks/services/brick] [root@SN-0(RCP-1234) /mnt/bricks/services/brick] # getfattr -m . -d -e hex . # file: . system.posix_acl_access=0x0200000001000700ffffffff04000500ffffffff08000500f103000010000500ffffffff20000500ffffffff trusted.afr.dirty=0x000000000000000000000000 trusted.afr.services-client-1=0x00000000000000000000010a trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x32b6bb974d0a40969cfa4cf0385bed31 [root@SN-0(RCP-1234) /mnt/bricks/services/brick/.glusterfs/indices/xattrop] # ls 00000000-0000-0000-0000-000000000001 xattrop-7006a00e-edbc-4e0c-862b-0c58b2974487 ///////////////////////////////////////////////// [root@SN-1(RCP-1234) /root] # cd /mnt/bricks/services/brick/ [root@SN-1(RCP-1234) /mnt/bricks/services/brick] # ls -la total 108 drwxr-xr-x+ 22 root root 4096 Sep 3 14:56 . drwxr-xr-x 4 root root 4096 Sep 2 14:00 .. drwxr-xr-x 9 _nokfssysalarmprocessor _nokfssysalarmprocessor 4096 Sep 2 14:04 AlarmFileSystem drw------- 2 root root 4096 Sep 2 14:03 backup drwxr-xr-x 3 root root 4096 Sep 2 14:04 CLM drwxr-xr-x 3 root root 4096 Sep 2 14:03 cmf drwxr-xr-x 3 root root 4096 Sep 2 14:03 commandcalendar drwxrwx--- 2 root _nokrcpsysdcif 4096 Sep 2 14:06 commoncollector drwxr-xr-x 2 root root 4096 Sep 2 14:01 coredumper drwxr-xr-x 3 root root 4096 Sep 2 14:04 db drwx------ 5 root root 4096 Sep 2 14:04 EventCorrelationEngine -rw-r--r-- 1 root root 0 Sep 3 14:33 fstest_6491509c4500d56f6fc4a621efc970bd___symlink_00_t drw------- 263 root root 4096 Sep 2 14:23 .glusterfs drwx------ 8 root root 4096 Sep 2 14:15 hypertracer drwxrwx---+ 2 root root 4096 Sep 2 14:02 LCM drwxr-xr-x+ 2 root root 4096 Sep 2 14:01 LDAPUserInfo drwxr-xr-x 4 root root 4096 Sep 2 14:01 lightcm -rw-r--r-- 2 root root 0 Sep 2 14:01 LMN-0_recover_flag -rw-r--r-- 2 root root 0 Sep 2 14:04 LMN-1_recover_flag drwxr-xr-x 2 root root 4096 Sep 2 14:05 lockd drwxr-xr-x 2 root root 4096 Sep 2 14:04 Log drwxr-xr-x 3 _nokfssyspm9 _nokfssyspm9 4096 Sep 2 14:04 PM9 drw------- 2 root root 4096 Sep 2 14:03 RCP_Backup drwxr-xr-x 4 root root 4096 Sep 2 14:04 RCPPTEngine drwxr-xr-x 2 root root 4096 Sep 2 14:01 TestDBDump [root@SN-1(RCP-1234) /mnt/bricks/services/brick] # getfattr -m . -d -e hex fstest_6491509c4500d56f6fc4a621efc970bd___symlink_00_t [root@SN-1(RCP-1234) /mnt/bricks/services/brick] # stat fstest_6491509c4500d56f6fc4a621efc970bd___symlink_00_t File: fstest_6491509c4500d56f6fc4a621efc970bd___symlink_00_t Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: fd71h/64881d Inode: 8767 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-09-03 14:33:48.468224170 +0800 Modify: 2019-09-03 14:33:48.468224170 +0800 Change: 2019-09-03 14:33:48.468224170 +0800 Birth: 2019-09-03 14:33:48.468224170 +0800 [root@SN-1(RCP-1234) /mnt/bricks/services/brick] # getfattr -m . -d -e hex . # file: . system.posix_acl_access=0x0200000001000700ffffffff04000500ffffffff08000500f103000010000500ffffffff20000500ffffffff trusted.afr.dirty=0x000000000000000000000000 trusted.afr.services-client-0=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x32b6bb974d0a40969cfa4cf0385bed31 [root@SN-1(RCP-1234) /mnt/bricks/services/brick] in sn-1 services brick process log, there is following error prints: [2019-09-03 07:20:51.018870] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-services-posix: buf->ia_gfid is null for /mnt/bricks/services/brick/fstest_6491509c4500d56f6fc4a621efc970bd___symlink_00_t [No data available] [2019-09-03 07:20:51.018910] W [MSGID: 115005] [server-resolve.c:70:resolve_gfid_entry_cbk] 0-services-server: 00000000-0000-0000-0000-000000000001/fstest_6491509c4500d56f6fc4a621efc970bd___symlink_00_t: failed to resolve (No data available) [No data available]
Created attachment 1611038 [details] sn-1 glusterfs log
Created attachment 1611039 [details] sn-0 glusterfs log
This bug is moved to https://github.com/gluster/glusterfs/issues/848, and will be tracked there from now on. Visit GitHub issues URL for further details