Bug 1554509

Summary: DHT errors in FUSE mount when using 4.0.0 client bits
Product: [Community] GlusterFS Reporter: Shyamsundar <srangana>
Component: distributeAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.0CC: bugs, nbalacha, rgowdapp
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-20 18:24:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shyamsundar 2018-03-12 19:37:36 UTC
Description of problem:

Upgraded a 3.12 3 servers + 1 client setup to a 4.0.0 version and noticed the following errors in the client mount logs,

NOTE: not sure if a fresh 4.0.0 setup also gives the same errors, but the upgrade case did.

mkdir causes the following message
I [dict.c:491:dict_get] (-->/usr/lib64/glusterfs/4.0.0/xlator/cluster/distribute.so(+0x21740) [0x7f6361e54740] -->/usr/lib64/glusterfs/4.0.0/xlator/cluster/distribute.so(+0x42d55) [0x7f6361e75d55] -->/lib64/libglusterfs.so.0(dict_get+0x10c) [0x7f63699c987c] ) 0-dict: !this || key=trusted.glusterfs.dht.mds [Invalid argument]

Post upgrade, client mount is seeing this error
I [MSGID: 109005] [dht-selfheal.c:2328:dht_selfheal_directory] 0-patchy-dht: Directory selfheal failed: Unable to form layout for directory /

and, this message
I [MSGID: 109063] [dht-layout.c:693:dht_layout_normalize] 0-patchy-dht: Found anomalies in (null) (gfid = 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0

Inspection of brick root on the servers, shows a proper layout that is fully formed,
# getfattr -d  -e hex -m . /d/brick1     
getfattr: Removing leading '/' from absolute path names
# file: d/brick1
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.patchy-client-1=0x000000000000000000000000
trusted.afr.patchy-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0xb43003c35dde4cf0915ac7d9ac94764e

Version-Release number of selected component (if applicable):
4.0.0

How reproducible:
Always

Steps to Reproduce:
Setup a cluster using docker containers as detailed here: https://hackmd.io/-yC3Ol68SwaRWr8bzaL8pw#


Actual results:
None of the operations failed, the logs were having the above messages, even after a remount on the client.

Comment 1 Raghavendra G 2018-03-12 22:00:13 UTC
(In reply to Shyamsundar from comment #0)
> Description of problem:
> 
> Upgraded a 3.12 3 servers + 1 client setup to a 4.0.0 version and noticed
> the following errors in the client mount logs,
> 
> NOTE: not sure if a fresh 4.0.0 setup also gives the same errors, but the
> upgrade case did.
> 
> 
> Post upgrade, client mount is seeing this error
> I [MSGID: 109005] [dht-selfheal.c:2328:dht_selfheal_directory] 0-patchy-dht:
> Directory selfheal failed: Unable to form layout for directory /
> 
> and, this message
> I [MSGID: 109063] [dht-layout.c:693:dht_layout_normalize] 0-patchy-dht:
> Found anomalies in (null) (gfid = 00000000-0000-0000-0000-000000000000).
> Holes=1 overlaps=0

This is a false positive of an error. These errors are normally seen when a lookup races with mkdir and finds a directory with layout _yet_ to be set. IOW, its a transient condition which gets resolved once either mkdir or lookup completes.

> 
> Inspection of brick root on the servers, shows a proper layout that is fully
> formed,
> # getfattr -d  -e hex -m . /d/brick1     
> getfattr: Removing leading '/' from absolute path names
> # file: d/brick1
> security.
> selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733
> 000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.patchy-client-1=0x000000000000000000000000
> trusted.afr.patchy-client-2=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0xb43003c35dde4cf0915ac7d9ac94764e
> 
> Version-Release number of selected component (if applicable):
> 4.0.0
> 
> How reproducible:
> Always
> 
> Steps to Reproduce:
> Setup a cluster using docker containers as detailed here:
> https://hackmd.io/-yC3Ol68SwaRWr8bzaL8pw#
> 
> 
> Actual results:
> None of the operations failed, the logs were having the above messages, even
> after a remount on the client.

Comment 2 Shyamsundar 2018-03-13 01:22:40 UTC
(In reply to Raghavendra G from comment #1)
> (In reply to Shyamsundar from comment #0)
> > and, this message
> > I [MSGID: 109063] [dht-layout.c:693:dht_layout_normalize] 0-patchy-dht:
> > Found anomalies in (null) (gfid = 00000000-0000-0000-0000-000000000000).
> > Holes=1 overlaps=0
> 
> This is a false positive of an error. These errors are normally seen when a
> lookup races with mkdir and finds a directory with layout _yet_ to be set.
> IOW, its a transient condition which gets resolved once either mkdir or
> lookup completes.

Note, on multiple mounts and unmounts, this did not go away.

Comment 3 Raghavendra G 2018-03-13 01:32:01 UTC
> Note, on multiple mounts and unmounts, this did not go away.

During every mkdir this error msg can be seen (mounting/umounting has no impact). However, if you are seeing this error during lookup on an existing directory, its a cause for concern.

Comment 4 Shyamsundar 2018-03-13 01:42:29 UTC
(In reply to Raghavendra G from comment #3)
> > Note, on multiple mounts and unmounts, this did not go away.
> 
> During every mkdir this error msg can be seen (mounting/umounting has no
> impact). However, if you are seeing this error during lookup on an existing
> directory, its a cause for concern.

Ah! apologies, I meant this for the first message which I repeat here,

> Post upgrade, client mount is seeing this error
> I [MSGID: 109005] [dht-selfheal.c:2328:dht_selfheal_directory] 0-patchy-dht:
> Directory selfheal failed: Unable to form layout for directory /

Also, the anomalies message even though a false positive, should be suppressed, as this is noise.

Further, I forgot to report one more error in the logs as follows for every mkdir, consider this as well a part of this bug report.

I [dict.c:491:dict_get] (-->/usr/lib64/glusterfs/4.0.0/xlator/cluster/distribute.so(+0x21740) [0x7fce5c1cc740] -->/usr/lib64/glusterfs/4.0.0/xlator/cluster/distribute.so(+0x42d55) [0x7fce5c1edd55] -->/lib64/libglusterfs.so.0(dict_get+0x10c) [0x7fce63d4187c] ) 0-dict: !this || key=trusted.glusterfs.dht.mds [Invalid argument]

Comment 5 Shyamsundar 2018-06-20 18:24:42 UTC
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.