Description of problem: ----------------------- Rebalance logs are huge and contain information that can be suppressed. These are an example : <snip> [2017-07-20 03:47:00.891430] I [dht-rebalance.c:2858:gf_defrag_task] 0-DHT: Thread sleeping. current thread count: 2 [2017-07-20 03:47:00.891484] I [dht-rebalance.c:2858:gf_defrag_task] 0-DHT: Thread sleeping. current thread count: 2 [2017-07-20 03:47:00.891525] I [dht-rebalance.c:2858:gf_defrag_task] 0-DHT: Thread sleeping. current thread count: 2 [2017-07-20 03:47:00.892125] I [dht-rebalance.c:2858:gf_defrag_task] 0-DHT: Thread sleeping. current thread count: 2 [2017-07-20 03:47:00.892197] I [dht-rebalance.c:2858:gf_defrag_task] 0-DHT: Thread sleeping. current thread count: 2 [2017-07-20 03:47:00.913433] I [dht-rebalance.c:2858:gf_defrag_task] 0-DHT: Thread sleeping. current thread count: 2 </snip> Version-Release number of selected component (if applicable): ------------------------------------------------------------- 3.8.4-34 How reproducible: ----------------- Every which way I try. Additional info: ----------------- Volume Name: butcher Type: Distributed-Disperse Volume ID: 1ad04ea0-4b0c-44d2-aa32-15ca6b1657eb Status: Started Snapshot Count: 0 Number of Bricks: 4 x (4 + 2) = 24 Transport-type: tcp Bricks: Brick1: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick2: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick3: gqas009.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick4: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick5: gqas007.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick6: gqas003.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick7: gqas009.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick8: gqas016.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick9: gqas007.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick10: gqas003.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick11: gqas009.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick12: gqas016.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick13: gqas007.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick14: gqas003.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick15: gqas009.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick16: gqas016.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick17: gqas007.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick18: gqas003.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick19: gqas007.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick20: gqas003.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick21: gqas009.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick22: gqas016.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick23: gqas007.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Brick24: gqas003.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Options Reconfigured: cluster.rebal-throttle: aggressive features.uss: enable features.quota-deem-statfs: on features.inode-quota: on features.quota: on network.inode-lru-limit: 50000 performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on transport.address-family: inet nfs.disable: off [root@gqas003 glusterfs]#
This impacts Usability of Gluster in the field,so I am proposing this to be a blocker for the current release.
Checked log files from rebalance. We are logging thread count information when rebalance queue count goes from 1 -> 0, hence one update saying the thread is sleeping. And similarly one more update when the queue count goes from 0 -> 1, waking up all the threads. Will send a patch to move these logs to DEBUG.
upstream patch: https://review.gluster.org/#/c/17866/1
upstream 3-12 patch : https://review.gluster.org/17893 downstream patch : https://code.engineering.redhat.com/gerrit/#/c/113660/
Verified on glusterfs-3.8.4-36.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774