Bug 1366495 - 1 mkdir generates tons of log messages from dht xlator
Summary: 1 mkdir generates tons of log messages from dht xlator
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Susant Kumar Palai
QA Contact:
URL:
Whiteboard:
Depends On: 1215816 1258267
Blocks: 1366496
TreeView+ depends on / blocked
 
Reported: 2016-08-12 06:44 UTC by Nithya Balachandran
Modified: 2017-03-06 17:22 UTC (History)
20 users (show)

Fixed In Version: glusterfs-3.10.0
Clone Of: 1258267
: 1366496 (view as bug list)
Environment:
Last Closed: 2017-03-06 17:22:10 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Nithya Balachandran 2016-08-12 06:44:36 UTC
+++ This bug was initially created as a clone of Bug #1258267 +++

+++ This bug was initially created as a clone of Bug #1215816 +++

Description of problem:

a single mkdir command in a Gluster mountpoint generates all the below log messages, just think of what untar would do!

This should be assigned to the disperse translator but I can't find it in list.

Version-Release number of selected component (if applicable):

RHS 3.1 - glusterfs-3.7dev-0.1017.git7fb85e3.el6.x86_64
RHEL6.6 - kernel-2.6.32-504.el6.x86_64

How reproducible:

every time

Steps to Reproduce:
1. create distributed-disperse volume shown below
2. mount it and do "tail -f /var/log/glusterfs/your-mountpoint.log
3. mkdir /your/mountpoint/d

Actual results:

See log messages below, and imagine untarring a large tarball.  excessive logging slows down Gluster and can fill system disk leading to damage to volume

Why is default log level INFO instead of WARNING?  

ec translator:
- Why do we get "MIsmatching xdata in answers of 'LOOKUP'?
- Why do we get operation failed on some subvolumes?  Why isn't this an ERROR?
- Why do we get Mismatching dictionary in answers of 'GF_FOP_XATTROP'?  why isn't this a WARNING?  what is log level 'N' ? 

dht translator:
- why does it have to tell us directory layout?  If anyone is interested they can read it from the xattr.

Expected results:

NOTHING!  Gluster should not be normally be logging events that are commonplace and do not represent a significant failure or state change, unless the user has requested it to do so by raising the log level.

Additional info:

log messages generated from 1 mkdir command are:

[2015-04-27 19:45:14.177042] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-0: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.177282] W [ec-common.c:162:ec_check_status] 0-ec42-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=2F, bad=10)
[2015-04-27 19:45:14.177324] I [ec-heal.c:495:ec_heal_init] 0-ec: Healing '/', gfid 00000000-0000-0000-0000-000000000001
[2015-04-27 19:45:14.177281] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-1: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.177373] W [ec-common.c:162:ec_check_status] 0-ec42-disperse-1: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=1F, bad=20)
[2015-04-27 19:45:14.177406] I [ec-heal.c:495:ec_heal_init] 0-ec: Healing '/', gfid 00000000-0000-0000-0000-000000000001
[2015-04-27 19:45:14.178187] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-0: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.178316] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-1: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.178657] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-11: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.178745] W [ec-common.c:162:ec_check_status] 0-ec42-disperse-11: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=2F, bad=10)
[2015-04-27 19:45:14.178787] I [ec-heal.c:495:ec_heal_init] 0-ec: Healing '/', gfid 00000000-0000-0000-0000-000000000001
[2015-04-27 19:45:14.179487] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-11: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.234873] I [dht-selfheal.c:1587:dht_selfheal_layout_new_directory] 0-ec42-dht: chunk size = 0xffffffff / 43428960 = 0x62
[2015-04-27 19:45:14.234906] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-9
[2015-04-27 19:45:14.234921] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-0
[2015-04-27 19:45:14.234934] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-1
[2015-04-27 19:45:14.234948] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-10
[2015-04-27 19:45:14.234963] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-11
[2015-04-27 19:45:14.234970] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-2
[2015-04-27 19:45:14.234978] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-3
[2015-04-27 19:45:14.234986] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-4
[2015-04-27 19:45:14.234993] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-5
[2015-04-27 19:45:14.235002] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-6
[2015-04-27 19:45:14.235009] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-7
[2015-04-27 19:45:14.235016] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-8
[2015-04-27 19:45:15.718230] N [ec-generic.c:1303:ec_combine_xattrop] 0-ec42-disperse-0: Mismatching dictionary in answers of 'GF_FOP_XATTROP'
[2015-04-27 19:45:15.718811] N [ec-generic.c:1303:ec_combine_xattrop] 0-ec42-disperse-1: Mismatching dictionary in answers of 'GF_FOP_XATTROP'
[2015-04-27 19:45:15.718846] N [ec-generic.c:1303:ec_combine_xattrop] 0-ec42-disperse-11: Mismatching dictionary in answers of 'GF_FOP_XATTROP'
[2015-04-27 19:45:14.241533] I [MSGID: 109036] [dht-common.c:6476:dht_log_new_layout_for_dir_selfheal] 0-ec42-dht: Setting layout of /ben with [Subvol_name: ec42-disperse-0, Err: -1 , Start: 354669840 , Stop: 709339679 ], [Subvol_name: ec42-disperse-1, Err: -1 , Start: 709339680 , Stop: 1064009519 ], [Subvol_name: ec42-disperse-10, Err: -1 , Start: 1064009520 , Stop: 1418679359 ], [Subvol_name: ec42-disperse-11, Err: -1 , Start: 1418679360 , Stop: 1773349199 ], [Subvol_name: ec42-disperse-2, Err: -1 , Start: 1773349200 , Stop: 2128019039 ], [Subvol_name: ec42-disperse-3, Err: -1 , Start: 2128019040 , Stop: 2482688879 ], [Subvol_name: ec42-disperse-4, Err: -1 , Start: 2482688880 , Stop: 2837358719 ], [Subvol_name: ec42-disperse-5, Err: -1 , Start: 2837358720 , Stop: 3192028559 ], [Subvol_name: ec42-disperse-6, Err: -1 , Start: 3192028560 , Stop: 3546698399 ], [Subvol_name: ec42-disperse-7, Err: -1 , Start: 3546698400 , Stop: 3901368239 ], [Subvol_name: ec42-disperse-8, Err: -1 , Start: 3901368240 , Stop: 4294967295 ], [Subvol_name: ec42-disperse-9, Err: -1 , Start: 0 , Stop: 354669839 ], 

The mountpoint process is:

root     28619  0.0  0.3 1155156 193956 ?      Ssl  Apr26   0:09 /usr/sbin/glusterfs --volfile-server=gprfs022-10ge --volfile-id=/ec42 /mnt/ec42

The volume is:

[root@gprfs024 ~]# gluster v i
 
Volume Name: ec42
Type: Distributed-Disperse
Volume ID: 61ffe960-77b7-4f4d-9988-2fc3ca65b561
Status: Started
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: gprfs022-10ge:/bricks/b01/g
Brick2: gprfs024-10ge:/bricks/b01/g
Brick3: gprfs037-10ge:/bricks/b01/g
Brick4: gprfs038-10ge:/bricks/b01/g
Brick5: gprfs039-10ge:/bricks/b01/g
Brick6: gprfs040-10ge:/bricks/b01/g
Brick7: gprfs022-10ge:/bricks/b02/g
Brick8: gprfs024-10ge:/bricks/b02/g
Brick9: gprfs037-10ge:/bricks/b02/g
Brick10: gprfs038-10ge:/bricks/b02/g
Brick11: gprfs039-10ge:/bricks/b02/g
Brick12: gprfs040-10ge:/bricks/b02/g
Brick13: gprfs022-10ge:/bricks/b03/g
Brick14: gprfs024-10ge:/bricks/b03/g
Brick15: gprfs037-10ge:/bricks/b03/g
Brick16: gprfs038-10ge:/bricks/b03/g
Brick17: gprfs039-10ge:/bricks/b03/g
Brick18: gprfs040-10ge:/bricks/b03/g
Brick19: gprfs022-10ge:/bricks/b04/g
Brick20: gprfs024-10ge:/bricks/b04/g
Brick21: gprfs037-10ge:/bricks/b04/g
Brick22: gprfs038-10ge:/bricks/b04/g
Brick23: gprfs039-10ge:/bricks/b04/g
Brick24: gprfs040-10ge:/bricks/b04/g
Brick25: gprfs022-10ge:/bricks/b05/g
Brick26: gprfs024-10ge:/bricks/b05/g
Brick27: gprfs037-10ge:/bricks/b05/g
Brick28: gprfs038-10ge:/bricks/b05/g
Brick29: gprfs039-10ge:/bricks/b05/g
Brick30: gprfs040-10ge:/bricks/b05/g
Brick31: gprfs022-10ge:/bricks/b06/g
Brick32: gprfs024-10ge:/bricks/b06/g
Brick33: gprfs037-10ge:/bricks/b06/g
Brick34: gprfs038-10ge:/bricks/b06/g
Brick35: gprfs039-10ge:/bricks/b06/g
Brick36: gprfs040-10ge:/bricks/b06/g
Brick37: gprfs022-10ge:/bricks/b07/g
Brick38: gprfs024-10ge:/bricks/b07/g
Brick39: gprfs037-10ge:/bricks/b07/g
Brick40: gprfs038-10ge:/bricks/b07/g
Brick41: gprfs039-10ge:/bricks/b07/g
Brick42: gprfs040-10ge:/bricks/b07/g
Brick43: gprfs022-10ge:/bricks/b08/g
Brick44: gprfs024-10ge:/bricks/b08/g
Brick45: gprfs037-10ge:/bricks/b08/g
Brick46: gprfs038-10ge:/bricks/b08/g
Brick47: gprfs039-10ge:/bricks/b08/g
Brick48: gprfs040-10ge:/bricks/b08/g
Brick49: gprfs022-10ge:/bricks/b09/g
Brick50: gprfs024-10ge:/bricks/b09/g
Brick51: gprfs037-10ge:/bricks/b09/g
Brick52: gprfs038-10ge:/bricks/b09/g
Brick53: gprfs039-10ge:/bricks/b09/g
Brick54: gprfs040-10ge:/bricks/b09/g
Brick55: gprfs022-10ge:/bricks/b10/g
Brick56: gprfs024-10ge:/bricks/b10/g
Brick57: gprfs037-10ge:/bricks/b10/g
Brick58: gprfs038-10ge:/bricks/b10/g
Brick59: gprfs039-10ge:/bricks/b10/g
Brick60: gprfs040-10ge:/bricks/b10/g
Brick61: gprfs022-10ge:/bricks/b11/g
Brick62: gprfs024-10ge:/bricks/b11/g
Brick63: gprfs037-10ge:/bricks/b11/g
Brick64: gprfs038-10ge:/bricks/b11/g
Brick65: gprfs039-10ge:/bricks/b11/g
Brick66: gprfs040-10ge:/bricks/b11/g
Brick67: gprfs022-10ge:/bricks/b12/g
Brick68: gprfs024-10ge:/bricks/b12/g
Brick69: gprfs037-10ge:/bricks/b12/g
Brick70: gprfs038-10ge:/bricks/b12/g
Brick71: gprfs039-10ge:/bricks/b12/g
Brick72: gprfs040-10ge:/bricks/b12/g
Options Reconfigured:
cluster.lookup-unhashed: off
client.event-threads: 8
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on

--- Additional comment from RHEL Product and Program Management on 2015-04-27 16:22:40 EDT ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Rejy M Cyriac on 2015-08-07 02:58:48 EDT ---

Since this bug has been approved for the z-stream release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.1.z+', and has been marked for RHGS 3.1 Update 1 release through the Internal Whiteboard entry of '3.1.1', the Target Release is being set to 'RHGS 3.1.1'

--- Additional comment from Rejy M Cyriac on 2015-08-10 00:12:30 EDT ---

Attached to RHGS 3.1 Update 1 (z-stream) Tracker BZ

--- Additional comment from Ashish Pandey on 2015-08-19 06:14:42 EDT ---

Fix for this bug  is present in following patch  - 

https://code.engineering.redhat.com/gerrit/55582

--- Additional comment from Pranith Kumar K on 2015-08-30 14:48:55 EDT ---

All ec logs in the common code path are not appearing. Raising a new bug for dht.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-08-30 14:49:27 EDT ---

This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-08-30 14:49:27 EDT ---

Since this bug does not have release flag 'rhgs-3.1.z+', the Target Release is being automatically reset to '---'

--- Additional comment from Sakshi on 2015-10-30 03:41:04 EDT ---

Of the 2 dht related logs mentioned above:
1) Log in this function dht_log_new_layout_for_dir_selfheal is important and if the directory has been healed its new layout must be logged
2) Log in the function dht_selfheal_layout_new_directory has debug mode from 3.1 and I suppose should not be removed.

--- Additional comment from Pranith Kumar K on 2016-06-03 03:21:58 EDT ---

Sakshi,
        Ashish worked on this logging patch and I believe they are merged now. So this shouldn't happen.

Pranith

--- Additional comment from Nithya Balachandran on 2016-08-01 04:44:26 EDT ---

The following message should have their log levels changed to DEBUG:



In dht_fix_layout_of_directory ():
gf_msg (this->name, GF_LOG_INFO, 0,
                                DHT_MSG_SUBVOL_INFO,
                                "subvolume %d (%s): %u chunks", i,
                                priv->subvolumes[i]->name,
                                priv->du_stats[i].chunks);


This message alone took up about 2GB of a 2.9GB rebalance log file.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-08-09 07:17:39 EDT ---

Since this bug has been approved for the RHGS 3.2.0 release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.2.0+', and through the Internal Whiteboard entry of '3.2.0', the Target Release is being automatically set to 'RHGS 3.2.0'

Comment 1 Worker Ant 2016-08-30 07:23:59 UTC
REVIEW: http://review.gluster.org/15343 (cluster/dht: move layout logs to DEBUG level) posted (#1) for review on master by Susant Palai (spalai)

Comment 2 Worker Ant 2016-09-01 12:41:13 UTC
REVIEW: http://review.gluster.org/15343 (cluster/dht: move layout logs to DEBUG level) posted (#2) for review on master by Susant Palai (spalai)

Comment 3 Worker Ant 2016-09-06 06:51:36 UTC
REVIEW: http://review.gluster.org/15343 (cluster/dht: move layout logs to DEBUG level) posted (#3) for review on master by Atin Mukherjee (amukherj)

Comment 4 Worker Ant 2016-09-06 16:21:19 UTC
COMMIT: http://review.gluster.org/15343 committed in master by Shyamsundar Ranganathan (srangana) 
------
commit 15c790b502ba92caa17f2d1870c3d75d547e6bad
Author: Susant Palai <spalai>
Date:   Tue Aug 30 12:44:04 2016 +0530

    cluster/dht: move layout logs to DEBUG level
    
    Change-Id: Iad96256218be643b272762b5638a3f6837aff28d
    BUG: 1366495
    Signed-off-by: Susant Palai <spalai>
    Reviewed-on: http://review.gluster.org/15343
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: N Balachandran <nbalacha>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 5 Shyamsundar 2017-03-06 17:22:10 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.