Bug 1022759 - subvols-per-directory floods client logs with "disk layout missing" messages
Summary: subvols-per-directory floods client logs with "disk layout missing" messages
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.4.0
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-24 00:34 UTC by Kenno
Modified: 2015-10-07 12:14 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-10-07 12:14:37 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Kenno 2013-10-24 00:34:37 UTC
Description of problem:
When creating a distributed gluster filesystem and setting subvols-per-directory to a number lower than the number of bricks (we set it to 1 in our specific case), for each subdirectory within the volume, the trusted.glusterfs.dht attribute is only created on the bricks that are chosen to receive files, and is absent entirely on the other bricks. Consequently, every time a subdirectory is listed on the client side, The following lines appear (#bricks - subvols-per-directory) times in the mount log in /var/log/glusterfs :
[2013-10-23 23:08:28.297937] I [dht-layout.c:718:dht_layout_dir_mismatch] 0-glusterfs-volume-dht: /some/subdir - disk layout missing
[2013-10-23 23:08:28.297964] I [dht-common.c:623:dht_revalidate_cbk] 0-glusterfs-volume-dht: mismatching layouts for /some/subdir

We are aware that these are harmless informational messages, but their sheer volume is causing issues. Specifically, we're maintaining a locate database of this gluster filesystem, and every day, the database is updated and the size of the log grows by more than 10GB. This is a problem because it makes it difficult to find other information in the log and because /var/log is mounted on a rather small partition.

The problem can be worked around by running the following oneliner on all the server nodes:
find /glusterbricks/brick* -type d -exec sh -c "getfattr -n trusted.glusterfs.dht -e hex '{}' 2>&1 >/dev/null | grep -q 'No such attribute$' && setfattr -n trusted.glusterfs.dht -v 0x00000001000000000000000000000000 '{}'" \;
However, this is not a very satisfactory workaround because:
- it is somewhat time-consuming,
- it needs to be repeated every time a significant number of directories is added to the filesystem,
- we dislike messing with Gluster's internals.

Version-Release number of selected component (if applicable):
glusterfs release 3.4.0

How reproducible:
Always, so far.

Steps to Reproduce:
1. Create distributed volume with multiple bricks
2. gluster volume set glusterfs-volume subvols-per-directory 1
3. gluster volume rebalance glusterfs-volume start force
4. Mount volume on a client and populate it with directories and files
5. Wait some time, then list one of these directories (on client)
6. Observe log in mount log in /var/log/glusterfs (on client)

Actual results:
Log is full of the following messages
[2013-10-23 23:08:28.297937] I [dht-layout.c:718:dht_layout_dir_mismatch] 0-glusterfs-volume-dht: /some/subdir - disk layout missing
[2013-10-23 23:08:28.297964] I [dht-common.c:623:dht_revalidate_cbk] 0-glusterfs-volume-dht: mismatching layouts for /some/subdir

Expected results:
Log doesn't grow so fast.

Additional info:
Problem can be worked around by running the following oneliner on server:
find /bricks/brick* -type d -exec sh -c "getfattr -n trusted.glusterfs.dht -e hex '{}' 2>&1 >/dev/null | grep -q 'No such attribute$' && setfattr -n trusted.glusterfs.dht -v 0x00000001000000000000000000000000 '{}'" \;
but workaround needs to be re-applied every time directories are added.

It would be nice if a developer could tell us whether this workaround will have side effects, ie. whether there's a good reason trusted.glusterfs.dht is missing. If so, we feel the messages in the log should be suppressed.

Comment 2 Peter B. 2015-01-26 11:21:18 UTC
I think I'm experiencing the same logfile-flooding on our setup. Logfile in /var/log/glusterfs/ grew to more than 5 GB in a few days.


We're running a GlusterFS 3.4.6 (CentOS build) on RHEL6.6 (64bit) with 2 hosts in distributed mode.

Is there any update on this?
And is there anything I can change/fix with our configuration to make it run smoothly?

Thank you very much in advance.

Comment 3 Joonas Vilenius 2015-01-26 13:45:03 UTC
Somewhat known issue (logs flooding) and i have also experienced troubles in production due to this.

Have you run fix-layout to fix the directory layout which should stop the flood once as the structure is consistent after that operation? Depending on the size of the volume might take quite long.

Also you can change the log level into WARNING to prevent the flood, can be done either from the mount command / fstab (man mount.glusterfs, log-level=LOG-LEVEL) or by using gluster volume set VOLNAME OPTION PARAMETER (see diagnostics.client-log-level), at least mitigates the issues for now.

Comment 4 Peter B. 2015-01-26 14:15:34 UTC
I must read up on "fix-layout", since I have never encountered or used this functionality before. Is there anything I could have done to avoid this in the future? 

It sounds like something in my setup is "not clean" or misconfigured, although the system was recently set up from scratch and the data was directly copied to GlusterFS. The data size is currently around 130 TB (and growing).


For now, I'll set the log-level to WARNING in /etc/fstab right away.

Thank you very much for your quick response!

Comment 5 Peter B. 2015-05-04 10:02:02 UTC
I finally have some time now to look at this issue again in our setup.
After reading Kenno's post once more, I'm still not sure what is causing this log message.

We've been using the glusterfs default volume options for our setup, and haven't touched the "subvols-per-directory" parameter.

Is there anything one can do to prevent this warning when setting up a fresh install?

Thanks in advance :)

Comment 6 Niels de Vos 2015-05-17 21:59:49 UTC
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5.

This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs".

If there is no response by the end of the month, this bug will get automatically closed.

Comment 7 Peter B. 2015-05-18 09:49:30 UTC
Thanks Niels for pointing that out.
Is it possible to directly upgrade from 3.4 to 3.7 - or rather do it like a dist-upgrade: 3.4 > 3.5 > 3.6 > 3.7?

Comment 8 André Bauer 2015-07-24 08:40:23 UTC
I have the same error messages in 3.5.5.

Comment 9 Kaleb KEITHLEY 2015-10-07 12:14:37 UTC
GlusterFS 3.4.x has reached end-of-life.

If this bug still exists in a later release please reopen this and change the version or open a new bug.


Note You need to log in before you can comment on or make changes to this bug.