1473259 – [Scale] : Rebalance Logs are bulky.

Bug 1473259 - [Scale] : Rebalance Logs are bulky.

Summary: [Scale] : Rebalance Logs are bulky.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	rhgs-3.3
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.3.0
Assignee:	Susant Kumar Palai
QA Contact:	Ambarish
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1417151 1474639 1475662
TreeView+	depends on / blocked

Reported:	2017-07-20 10:42 UTC by Ambarish
Modified:	2017-09-21 05:04 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.8.4-36
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1474639 (view as bug list)
Environment:
Last Closed:	2017-09-21 05:04:21 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:2774	0	normal	SHIPPED_LIVE	glusterfs bug fix and enhancement update	2017-09-21 08:16:29 UTC

Description Ambarish 2017-07-20 10:42:50 UTC

Description of problem:
-----------------------

Rebalance logs are huge and contain information that can be suppressed.

These are an example :

<snip>

[2017-07-20 03:47:00.891430] I [dht-rebalance.c:2858:gf_defrag_task] 0-DHT: Thread sleeping. current thread count: 2
[2017-07-20 03:47:00.891484] I [dht-rebalance.c:2858:gf_defrag_task] 0-DHT: Thread sleeping. current thread count: 2
[2017-07-20 03:47:00.891525] I [dht-rebalance.c:2858:gf_defrag_task] 0-DHT: Thread sleeping. current thread count: 2
[2017-07-20 03:47:00.892125] I [dht-rebalance.c:2858:gf_defrag_task] 0-DHT: Thread sleeping. current thread count: 2
[2017-07-20 03:47:00.892197] I [dht-rebalance.c:2858:gf_defrag_task] 0-DHT: Thread sleeping. current thread count: 2
[2017-07-20 03:47:00.913433] I [dht-rebalance.c:2858:gf_defrag_task] 0-DHT: Thread sleeping. current thread count: 2

</snip>


Version-Release number of selected component (if applicable):
-------------------------------------------------------------

3.8.4-34

How reproducible:
-----------------

Every which way I try.


Additional info:
-----------------

Volume Name: butcher
Type: Distributed-Disperse
Volume ID: 1ad04ea0-4b0c-44d2-aa32-15ca6b1657eb
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (4 + 2) = 24
Transport-type: tcp
Bricks:
Brick1: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick2: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick3: gqas009.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick4: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick5: gqas007.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick6: gqas003.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick7: gqas009.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick8: gqas016.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick9: gqas007.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick10: gqas003.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick11: gqas009.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick12: gqas016.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick13: gqas007.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick14: gqas003.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick15: gqas009.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick16: gqas016.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick17: gqas007.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick18: gqas003.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick19: gqas007.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick20: gqas003.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick21: gqas009.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick22: gqas016.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick23: gqas007.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick24: gqas003.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Options Reconfigured:
cluster.rebal-throttle: aggressive
features.uss: enable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
transport.address-family: inet
nfs.disable: off
[root@gqas003 glusterfs]#

Comment 2 Ambarish 2017-07-20 10:44:19 UTC

This impacts Usability of Gluster in the field,so I am proposing this to be a blocker for the current release.

Comment 3 Susant Kumar Palai 2017-07-21 07:06:33 UTC

Checked log files from rebalance. We are logging thread count information when rebalance queue count goes from 1 -> 0, hence one update saying the thread is sleeping. And similarly one more update when the queue count goes from 0 -> 1, waking up all the threads. Will send a patch to move these logs to DEBUG.

Comment 8 Susant Kumar Palai 2017-07-25 06:20:43 UTC

upstream patch: https://review.gluster.org/#/c/17866/1

Comment 9 Atin Mukherjee 2017-07-27 09:28:39 UTC

upstream 3-12 patch : https://review.gluster.org/17893
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/113660/

Comment 11 Ambarish 2017-08-05 06:00:36 UTC

Verified on glusterfs-3.8.4-36.

Comment 13 errata-xmlrpc 2017-09-21 05:04:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Note You need to log in before you can comment on or make changes to this bug.