Bug 1631851

Summary: Why do all of the gluster logs contain the exact same messages and so many of them?
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Andrew Robinson <anrobins>
Component: coreAssignee: Ravishankar N <ravishankar>
Status: CLOSED ERRATA QA Contact: Anees Patel <anepatel>
Severity: medium Docs Contact:
Priority: medium    
Version: rhgs-3.4CC: anrobins, bkunal, khiremat, kramdoss, nbalacha, psony, ravishankar, rhs-bugs, sankarshan, sheggodu, srangana, storage-qa-internal, sunkumar
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.4.z Batch Update 3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-33 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-04 07:41:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrew Robinson 2018-09-21 17:35:34 UTC
Description of problem:

In trying to troubleshoot a customer problem, I find the gluster logs all have the same set of very verbose messages, repeated very many times. That raises these questions:

1. What is generating the messages?
2. What are they trying to tell me?
3. Why are the same messages repeated in so many different message logs?
4. How do I stop them so the logs become useful again?

Generating and uploading sosreports is a lot of trouble for this customer and these messages seem to be the reason why. 

Version-Release number of selected component (if applicable):

How reproducible:

Open nearly any log file in /var/log/glusterfs

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Messages in the heal logs pertain just to healing, messages in the rebalance logs pertain just to rebalancing, etc.

Additional info:

Examples and statistics to follow.

Comment 4 Amar Tumballi 2018-09-26 06:17:51 UTC
Thanks for these reports Andrew!

Some of these logs are around mainly because the network disconnects caused most of the user issues with the project, and we needed to identify the user's issues to network disruption, and hence each of these log files would be having those logs. We would like to pick these enhancements later cycle.

Comment 17 Nithya Balachandran 2018-10-29 07:48:07 UTC
Andrew,

I'm assuming your concern is mainly with the glfsheal logs.


I checked with Ravishankar (who handles AFR) and he explained that glfsheal is a gfapi client which connects to the bricks, executes the heal info command and then disconnects. It is not a long running daemon so the above steps are performed every time you run heal info. From the logs on rstore13, it looks like something is calling heal info periodically (every 3 to 5 mins) on these volumes, causing these messages to be written to the logs. The heal info has been run over 2K times on this volume.

[nbalacha@dhcp35-62 glusterfs]$ grep Final glfsheal-sequencers.log |wc -l
2266

[nbalacha@myserver glusterfs]$ grep -B2 Final glfsheal-sequencers.log


<snip>

[2018-09-22 23:07:15.785150] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds
[2018-09-22 23:07:15.785325] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1
Final graph:
--
[2018-09-22 23:10:15.684104] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds
[2018-09-22 23:10:15.684301] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1
Final graph:
--
[2018-09-22 23:13:15.811615] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds
[2018-09-22 23:13:15.811761] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1
Final graph:
--
[2018-09-22 23:15:15.589131] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds
[2018-09-22 23:15:15.589249] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1
Final graph:
--
[2018-09-22 23:18:15.567704] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds
[2018-09-22 23:18:15.567949] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1


</snip>


So the number of times these messages show up is dependent on the number of times the heal info command was run.



Ravi has posted a patch upstream to suppress these logs. 

https://bugzilla.redhat.com/show_bug.cgi?id=1643519
https://review.gluster.org/#/c/glusterfs/+/21501/


I am assigning this to him based on the above.

Comment 26 Anees Patel 2018-12-24 07:16:35 UTC
Verified the fix per https://code.engineering.redhat.com/gerrit/#/c/158640/

Steps to verify:

1. Check for glfsheal log count 
55 ~]# wc -l /var/log/glusterfs/glfsheal-arbitervol.log
5305 /var/log/glusterfs/glfsheal-arbitervol.log

2. Execute heal info with  --nolog flag
 55 ~]# for i in {1..10}; do gluster v heal arbitervol info --nolog | head -2; done

3. Re-check log count
# wc -l /var/log/glusterfs/glfsheal-arbitervol.log
5305 /var/log/glusterfs/glfsheal-arbitervol.log

4. Execute
55 ~]# for i in {1..10}; do gluster v heal arbitervol info split-brain --nolog | head -2; done

5. Re-check log count 
# wc -l /var/log/glusterfs/glfsheal-arbitervol.log
5305 /var/log/glusterfs/glfsheal-arbitervol.log

Setting this to verified.

Comment 27 Anees Patel 2018-12-26 05:24:52 UTC
Verified the fix as per the above comment on the latest build,

# rpm -qa | grep gluster
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-rdma-3.12.2-34.el7rhgs.x86_64
glusterfs-server-3.12.2-34.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-34.el7rhgs.x86_64
glusterfs-fuse-3.12.2-34.el7rhgs.x86_64
glusterfs-events-3.12.2-34.el7rhgs.x86_64
glusterfs-3.12.2-34.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64
glusterfs-api-3.12.2-34.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-34.el7rhgs.x86_64
glusterfs-libs-3.12.2-34.el7rhgs.x86_64

Comment 29 errata-xmlrpc 2019-02-04 07:41:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0263