1554509 – DHT errors in FUSE mount when using 4.0.0 client bits

Bug 1554509 - DHT errors in FUSE mount when using 4.0.0 client bits

Summary: DHT errors in FUSE mount when using 4.0.0 client bits

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-12 19:37 UTC by Shyamsundar
Modified:	2018-06-20 18:24 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-06-20 18:24:42 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Shyamsundar 2018-03-12 19:37:36 UTC

Description of problem:

Upgraded a 3.12 3 servers + 1 client setup to a 4.0.0 version and noticed the following errors in the client mount logs,

NOTE: not sure if a fresh 4.0.0 setup also gives the same errors, but the upgrade case did.

mkdir causes the following message
I [dict.c:491:dict_get] (-->/usr/lib64/glusterfs/4.0.0/xlator/cluster/distribute.so(+0x21740) [0x7f6361e54740] -->/usr/lib64/glusterfs/4.0.0/xlator/cluster/distribute.so(+0x42d55) [0x7f6361e75d55] -->/lib64/libglusterfs.so.0(dict_get+0x10c) [0x7f63699c987c] ) 0-dict: !this || key=trusted.glusterfs.dht.mds [Invalid argument]

Post upgrade, client mount is seeing this error
I [MSGID: 109005] [dht-selfheal.c:2328:dht_selfheal_directory] 0-patchy-dht: Directory selfheal failed: Unable to form layout for directory /

and, this message
I [MSGID: 109063] [dht-layout.c:693:dht_layout_normalize] 0-patchy-dht: Found anomalies in (null) (gfid = 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0

Inspection of brick root on the servers, shows a proper layout that is fully formed,
# getfattr -d  -e hex -m . /d/brick1     
getfattr: Removing leading '/' from absolute path names
# file: d/brick1
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.patchy-client-1=0x000000000000000000000000
trusted.afr.patchy-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0xb43003c35dde4cf0915ac7d9ac94764e

Version-Release number of selected component (if applicable):
4.0.0

How reproducible:
Always

Steps to Reproduce:
Setup a cluster using docker containers as detailed here: https://hackmd.io/-yC3Ol68SwaRWr8bzaL8pw#


Actual results:
None of the operations failed, the logs were having the above messages, even after a remount on the client.

Comment 1 Raghavendra G 2018-03-12 22:00:13 UTC

(In reply to Shyamsundar from comment #0)
> Description of problem:
> 
> Upgraded a 3.12 3 servers + 1 client setup to a 4.0.0 version and noticed
> the following errors in the client mount logs,
> 
> NOTE: not sure if a fresh 4.0.0 setup also gives the same errors, but the
> upgrade case did.
> 
> 
> Post upgrade, client mount is seeing this error
> I [MSGID: 109005] [dht-selfheal.c:2328:dht_selfheal_directory] 0-patchy-dht:
> Directory selfheal failed: Unable to form layout for directory /
> 
> and, this message
> I [MSGID: 109063] [dht-layout.c:693:dht_layout_normalize] 0-patchy-dht:
> Found anomalies in (null) (gfid = 00000000-0000-0000-0000-000000000000).
> Holes=1 overlaps=0

This is a false positive of an error. These errors are normally seen when a lookup races with mkdir and finds a directory with layout _yet_ to be set. IOW, its a transient condition which gets resolved once either mkdir or lookup completes.

> 
> Inspection of brick root on the servers, shows a proper layout that is fully
> formed,
> # getfattr -d  -e hex -m . /d/brick1     
> getfattr: Removing leading '/' from absolute path names
> # file: d/brick1
> security.
> selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733
> 000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.patchy-client-1=0x000000000000000000000000
> trusted.afr.patchy-client-2=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0xb43003c35dde4cf0915ac7d9ac94764e
> 
> Version-Release number of selected component (if applicable):
> 4.0.0
> 
> How reproducible:
> Always
> 
> Steps to Reproduce:
> Setup a cluster using docker containers as detailed here:
> https://hackmd.io/-yC3Ol68SwaRWr8bzaL8pw#
> 
> 
> Actual results:
> None of the operations failed, the logs were having the above messages, even
> after a remount on the client.

Comment 2 Shyamsundar 2018-03-13 01:22:40 UTC

(In reply to Raghavendra G from comment #1)
> (In reply to Shyamsundar from comment #0)
> > and, this message
> > I [MSGID: 109063] [dht-layout.c:693:dht_layout_normalize] 0-patchy-dht:
> > Found anomalies in (null) (gfid = 00000000-0000-0000-0000-000000000000).
> > Holes=1 overlaps=0
> 
> This is a false positive of an error. These errors are normally seen when a
> lookup races with mkdir and finds a directory with layout _yet_ to be set.
> IOW, its a transient condition which gets resolved once either mkdir or
> lookup completes.

Note, on multiple mounts and unmounts, this did not go away.

Comment 3 Raghavendra G 2018-03-13 01:32:01 UTC

> Note, on multiple mounts and unmounts, this did not go away.

During every mkdir this error msg can be seen (mounting/umounting has no impact). However, if you are seeing this error during lookup on an existing directory, its a cause for concern.

Comment 4 Shyamsundar 2018-03-13 01:42:29 UTC

(In reply to Raghavendra G from comment #3)
> > Note, on multiple mounts and unmounts, this did not go away.
> 
> During every mkdir this error msg can be seen (mounting/umounting has no
> impact). However, if you are seeing this error during lookup on an existing
> directory, its a cause for concern.

Ah! apologies, I meant this for the first message which I repeat here,

> Post upgrade, client mount is seeing this error
> I [MSGID: 109005] [dht-selfheal.c:2328:dht_selfheal_directory] 0-patchy-dht:
> Directory selfheal failed: Unable to form layout for directory /

Also, the anomalies message even though a false positive, should be suppressed, as this is noise.

Further, I forgot to report one more error in the logs as follows for every mkdir, consider this as well a part of this bug report.

I [dict.c:491:dict_get] (-->/usr/lib64/glusterfs/4.0.0/xlator/cluster/distribute.so(+0x21740) [0x7fce5c1cc740] -->/usr/lib64/glusterfs/4.0.0/xlator/cluster/distribute.so(+0x42d55) [0x7fce5c1edd55] -->/lib64/libglusterfs.so.0(dict_get+0x10c) [0x7fce63d4187c] ) 0-dict: !this || key=trusted.glusterfs.dht.mds [Invalid argument]

Comment 5 Shyamsundar 2018-06-20 18:24:42 UTC

This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.

Note You need to log in before you can comment on or make changes to this bug.