Bug 1215816 - 1 mkdir generates tons of log messages from dht and disperse xlators
Summary: 1 mkdir generates tons of log messages from dht and disperse xlators
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: disperse
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: RHGS 3.1.1
Assignee: Ashish Pandey
QA Contact: Bhaskarakiran
URL:
Whiteboard:
Depends On:
Blocks: 1251815 1258267 1366495 1366496
TreeView+ depends on / blocked
 
Reported: 2015-04-27 20:03 UTC by Ben England
Modified: 2016-11-23 23:11 UTC (History)
12 users (show)

Fixed In Version: glusterfs-3.7.1-13
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1258267 (view as bug list)
Environment:
Last Closed: 2015-10-05 07:08:29 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1845 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.1 update 2015-10-05 11:06:22 UTC

Description Ben England 2015-04-27 20:03:41 UTC
Description of problem:

a single mkdir command in a Gluster mountpoint generates all the below log messages, just think of what untar would do!

This should be assigned to the disperse translator but I can't find it in list.

Version-Release number of selected component (if applicable):

RHS 3.1 - glusterfs-3.7dev-0.1017.git7fb85e3.el6.x86_64
RHEL6.6 - kernel-2.6.32-504.el6.x86_64

How reproducible:

every time

Steps to Reproduce:
1. create distributed-disperse volume shown below
2. mount it and do "tail -f /var/log/glusterfs/your-mountpoint.log
3. mkdir /your/mountpoint/d

Actual results:

See log messages below, and imagine untarring a large tarball.  excessive logging slows down Gluster and can fill system disk leading to damage to volume

Why is default log level INFO instead of WARNING?  

ec translator:
- Why do we get "MIsmatching xdata in answers of 'LOOKUP'?
- Why do we get operation failed on some subvolumes?  Why isn't this an ERROR?
- Why do we get Mismatching dictionary in answers of 'GF_FOP_XATTROP'?  why isn't this a WARNING?  what is log level 'N' ? 

dht translator:
- why does it have to tell us directory layout?  If anyone is interested they can read it from the xattr.

Expected results:

NOTHING!  Gluster should not be normally be logging events that are commonplace and do not represent a significant failure or state change, unless the user has requested it to do so by raising the log level.

Additional info:

log messages generated from 1 mkdir command are:

[2015-04-27 19:45:14.177042] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-0: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.177282] W [ec-common.c:162:ec_check_status] 0-ec42-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=2F, bad=10)
[2015-04-27 19:45:14.177324] I [ec-heal.c:495:ec_heal_init] 0-ec: Healing '/', gfid 00000000-0000-0000-0000-000000000001
[2015-04-27 19:45:14.177281] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-1: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.177373] W [ec-common.c:162:ec_check_status] 0-ec42-disperse-1: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=1F, bad=20)
[2015-04-27 19:45:14.177406] I [ec-heal.c:495:ec_heal_init] 0-ec: Healing '/', gfid 00000000-0000-0000-0000-000000000001
[2015-04-27 19:45:14.178187] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-0: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.178316] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-1: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.178657] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-11: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.178745] W [ec-common.c:162:ec_check_status] 0-ec42-disperse-11: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=2F, bad=10)
[2015-04-27 19:45:14.178787] I [ec-heal.c:495:ec_heal_init] 0-ec: Healing '/', gfid 00000000-0000-0000-0000-000000000001
[2015-04-27 19:45:14.179487] W [ec-combine.c:867:ec_combine_check] 0-ec42-disperse-11: Mismatching xdata in answers of 'LOOKUP'
[2015-04-27 19:45:14.234873] I [dht-selfheal.c:1587:dht_selfheal_layout_new_directory] 0-ec42-dht: chunk size = 0xffffffff / 43428960 = 0x62
[2015-04-27 19:45:14.234906] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-9
[2015-04-27 19:45:14.234921] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-0
[2015-04-27 19:45:14.234934] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-1
[2015-04-27 19:45:14.234948] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-10
[2015-04-27 19:45:14.234963] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-11
[2015-04-27 19:45:14.234970] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-2
[2015-04-27 19:45:14.234978] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-3
[2015-04-27 19:45:14.234986] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-4
[2015-04-27 19:45:14.234993] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-5
[2015-04-27 19:45:14.235002] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-6
[2015-04-27 19:45:14.235009] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-7
[2015-04-27 19:45:14.235016] I [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-ec42-dht: assigning range size 0x1523d510 to ec42-disperse-8
[2015-04-27 19:45:15.718230] N [ec-generic.c:1303:ec_combine_xattrop] 0-ec42-disperse-0: Mismatching dictionary in answers of 'GF_FOP_XATTROP'
[2015-04-27 19:45:15.718811] N [ec-generic.c:1303:ec_combine_xattrop] 0-ec42-disperse-1: Mismatching dictionary in answers of 'GF_FOP_XATTROP'
[2015-04-27 19:45:15.718846] N [ec-generic.c:1303:ec_combine_xattrop] 0-ec42-disperse-11: Mismatching dictionary in answers of 'GF_FOP_XATTROP'
[2015-04-27 19:45:14.241533] I [MSGID: 109036] [dht-common.c:6476:dht_log_new_layout_for_dir_selfheal] 0-ec42-dht: Setting layout of /ben with [Subvol_name: ec42-disperse-0, Err: -1 , Start: 354669840 , Stop: 709339679 ], [Subvol_name: ec42-disperse-1, Err: -1 , Start: 709339680 , Stop: 1064009519 ], [Subvol_name: ec42-disperse-10, Err: -1 , Start: 1064009520 , Stop: 1418679359 ], [Subvol_name: ec42-disperse-11, Err: -1 , Start: 1418679360 , Stop: 1773349199 ], [Subvol_name: ec42-disperse-2, Err: -1 , Start: 1773349200 , Stop: 2128019039 ], [Subvol_name: ec42-disperse-3, Err: -1 , Start: 2128019040 , Stop: 2482688879 ], [Subvol_name: ec42-disperse-4, Err: -1 , Start: 2482688880 , Stop: 2837358719 ], [Subvol_name: ec42-disperse-5, Err: -1 , Start: 2837358720 , Stop: 3192028559 ], [Subvol_name: ec42-disperse-6, Err: -1 , Start: 3192028560 , Stop: 3546698399 ], [Subvol_name: ec42-disperse-7, Err: -1 , Start: 3546698400 , Stop: 3901368239 ], [Subvol_name: ec42-disperse-8, Err: -1 , Start: 3901368240 , Stop: 4294967295 ], [Subvol_name: ec42-disperse-9, Err: -1 , Start: 0 , Stop: 354669839 ], 

The mountpoint process is:

root     28619  0.0  0.3 1155156 193956 ?      Ssl  Apr26   0:09 /usr/sbin/glusterfs --volfile-server=gprfs022-10ge --volfile-id=/ec42 /mnt/ec42

The volume is:

[root@gprfs024 ~]# gluster v i
 
Volume Name: ec42
Type: Distributed-Disperse
Volume ID: 61ffe960-77b7-4f4d-9988-2fc3ca65b561
Status: Started
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: gprfs022-10ge:/bricks/b01/g
Brick2: gprfs024-10ge:/bricks/b01/g
Brick3: gprfs037-10ge:/bricks/b01/g
Brick4: gprfs038-10ge:/bricks/b01/g
Brick5: gprfs039-10ge:/bricks/b01/g
Brick6: gprfs040-10ge:/bricks/b01/g
Brick7: gprfs022-10ge:/bricks/b02/g
Brick8: gprfs024-10ge:/bricks/b02/g
Brick9: gprfs037-10ge:/bricks/b02/g
Brick10: gprfs038-10ge:/bricks/b02/g
Brick11: gprfs039-10ge:/bricks/b02/g
Brick12: gprfs040-10ge:/bricks/b02/g
Brick13: gprfs022-10ge:/bricks/b03/g
Brick14: gprfs024-10ge:/bricks/b03/g
Brick15: gprfs037-10ge:/bricks/b03/g
Brick16: gprfs038-10ge:/bricks/b03/g
Brick17: gprfs039-10ge:/bricks/b03/g
Brick18: gprfs040-10ge:/bricks/b03/g
Brick19: gprfs022-10ge:/bricks/b04/g
Brick20: gprfs024-10ge:/bricks/b04/g
Brick21: gprfs037-10ge:/bricks/b04/g
Brick22: gprfs038-10ge:/bricks/b04/g
Brick23: gprfs039-10ge:/bricks/b04/g
Brick24: gprfs040-10ge:/bricks/b04/g
Brick25: gprfs022-10ge:/bricks/b05/g
Brick26: gprfs024-10ge:/bricks/b05/g
Brick27: gprfs037-10ge:/bricks/b05/g
Brick28: gprfs038-10ge:/bricks/b05/g
Brick29: gprfs039-10ge:/bricks/b05/g
Brick30: gprfs040-10ge:/bricks/b05/g
Brick31: gprfs022-10ge:/bricks/b06/g
Brick32: gprfs024-10ge:/bricks/b06/g
Brick33: gprfs037-10ge:/bricks/b06/g
Brick34: gprfs038-10ge:/bricks/b06/g
Brick35: gprfs039-10ge:/bricks/b06/g
Brick36: gprfs040-10ge:/bricks/b06/g
Brick37: gprfs022-10ge:/bricks/b07/g
Brick38: gprfs024-10ge:/bricks/b07/g
Brick39: gprfs037-10ge:/bricks/b07/g
Brick40: gprfs038-10ge:/bricks/b07/g
Brick41: gprfs039-10ge:/bricks/b07/g
Brick42: gprfs040-10ge:/bricks/b07/g
Brick43: gprfs022-10ge:/bricks/b08/g
Brick44: gprfs024-10ge:/bricks/b08/g
Brick45: gprfs037-10ge:/bricks/b08/g
Brick46: gprfs038-10ge:/bricks/b08/g
Brick47: gprfs039-10ge:/bricks/b08/g
Brick48: gprfs040-10ge:/bricks/b08/g
Brick49: gprfs022-10ge:/bricks/b09/g
Brick50: gprfs024-10ge:/bricks/b09/g
Brick51: gprfs037-10ge:/bricks/b09/g
Brick52: gprfs038-10ge:/bricks/b09/g
Brick53: gprfs039-10ge:/bricks/b09/g
Brick54: gprfs040-10ge:/bricks/b09/g
Brick55: gprfs022-10ge:/bricks/b10/g
Brick56: gprfs024-10ge:/bricks/b10/g
Brick57: gprfs037-10ge:/bricks/b10/g
Brick58: gprfs038-10ge:/bricks/b10/g
Brick59: gprfs039-10ge:/bricks/b10/g
Brick60: gprfs040-10ge:/bricks/b10/g
Brick61: gprfs022-10ge:/bricks/b11/g
Brick62: gprfs024-10ge:/bricks/b11/g
Brick63: gprfs037-10ge:/bricks/b11/g
Brick64: gprfs038-10ge:/bricks/b11/g
Brick65: gprfs039-10ge:/bricks/b11/g
Brick66: gprfs040-10ge:/bricks/b11/g
Brick67: gprfs022-10ge:/bricks/b12/g
Brick68: gprfs024-10ge:/bricks/b12/g
Brick69: gprfs037-10ge:/bricks/b12/g
Brick70: gprfs038-10ge:/bricks/b12/g
Brick71: gprfs039-10ge:/bricks/b12/g
Brick72: gprfs040-10ge:/bricks/b12/g
Options Reconfigured:
cluster.lookup-unhashed: off
client.event-threads: 8
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on

Comment 5 Pranith Kumar K 2015-08-30 18:48:55 UTC
All ec logs in the common code path are not appearing. Raising a new bug for dht.

Comment 6 Bhaskarakiran 2015-08-31 06:13:02 UTC
verified this on glusterfs-3.7.1-13 build and its now showing only 1 message for 1 mkdir. Moving this to verified.
 
[2015-08-31 06:16:41.241515] I [MSGID: 109036] [dht-common.c:7106:dht_log_new_layout_for_dir_selfheal] 13-vol1-dht: Setting layout of /renames with [Subvol_name: vol1-disperse-0, Err: -1 , Start: 2147474804 , Stop: 4294967295 , Hash: 1 ], [Subvol_n

Comment 8 errata-xmlrpc 2015-10-05 07:08:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html


Note You need to log in before you can comment on or make changes to this bug.