1631851 – Why do all of the gluster logs contain the exact same messages and so many of them?

Bug 1631851 - Why do all of the gluster logs contain the exact same messages and so many of them?

Summary: Why do all of the gluster logs contain the exact same messages and so many of...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.4.z Batch Update 3
Assignee:	Ravishankar N
QA Contact:	Anees Patel
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-09-21 17:35 UTC by Andrew Robinson
Modified:	2021-12-10 17:37 UTC (History)
CC List:	13 users (show)
Fixed In Version:	glusterfs-3.12.2-33
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-02-04 07:41:25 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0263	0	None	None	None	2019-02-04 07:41:38 UTC

Description Andrew Robinson 2018-09-21 17:35:34 UTC

Description of problem:

In trying to troubleshoot a customer problem, I find the gluster logs all have the same set of very verbose messages, repeated very many times. That raises these questions:

1. What is generating the messages?
2. What are they trying to tell me?
3. Why are the same messages repeated in so many different message logs?
4. How do I stop them so the logs become useful again?

Generating and uploading sosreports is a lot of trouble for this customer and these messages seem to be the reason why. 

Version-Release number of selected component (if applicable):

How reproducible:

Open nearly any log file in /var/log/glusterfs

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Messages in the heal logs pertain just to healing, messages in the rebalance logs pertain just to rebalancing, etc.

Additional info:

Examples and statistics to follow.

Comment 4 Amar Tumballi 2018-09-26 06:17:51 UTC

Thanks for these reports Andrew!

Some of these logs are around mainly because the network disconnects caused most of the user issues with the project, and we needed to identify the user's issues to network disruption, and hence each of these log files would be having those logs. We would like to pick these enhancements later cycle.

Comment 17 Nithya Balachandran 2018-10-29 07:48:07 UTC

Andrew,

I'm assuming your concern is mainly with the glfsheal logs.


I checked with Ravishankar (who handles AFR) and he explained that glfsheal is a gfapi client which connects to the bricks, executes the heal info command and then disconnects. It is not a long running daemon so the above steps are performed every time you run heal info. From the logs on rstore13, it looks like something is calling heal info periodically (every 3 to 5 mins) on these volumes, causing these messages to be written to the logs. The heal info has been run over 2K times on this volume.

[nbalacha@dhcp35-62 glusterfs]$ grep Final glfsheal-sequencers.log |wc -l
2266

[nbalacha@myserver glusterfs]$ grep -B2 Final glfsheal-sequencers.log


<snip>

[2018-09-22 23:07:15.785150] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds
[2018-09-22 23:07:15.785325] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1
Final graph:
--
[2018-09-22 23:10:15.684104] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds
[2018-09-22 23:10:15.684301] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1
Final graph:
--
[2018-09-22 23:13:15.811615] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds
[2018-09-22 23:13:15.811761] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1
Final graph:
--
[2018-09-22 23:15:15.589131] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds
[2018-09-22 23:15:15.589249] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1
Final graph:
--
[2018-09-22 23:18:15.567704] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds
[2018-09-22 23:18:15.567949] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1


</snip>


So the number of times these messages show up is dependent on the number of times the heal info command was run.



Ravi has posted a patch upstream to suppress these logs. 

https://bugzilla.redhat.com/show_bug.cgi?id=1643519
https://review.gluster.org/#/c/glusterfs/+/21501/


I am assigning this to him based on the above.

Comment 26 Anees Patel 2018-12-24 07:16:35 UTC

Verified the fix per https://code.engineering.redhat.com/gerrit/#/c/158640/

Steps to verify:

1. Check for glfsheal log count 
55 ~]# wc -l /var/log/glusterfs/glfsheal-arbitervol.log
5305 /var/log/glusterfs/glfsheal-arbitervol.log

2. Execute heal info with  --nolog flag
 55 ~]# for i in {1..10}; do gluster v heal arbitervol info --nolog | head -2; done

3. Re-check log count
# wc -l /var/log/glusterfs/glfsheal-arbitervol.log
5305 /var/log/glusterfs/glfsheal-arbitervol.log

4. Execute
55 ~]# for i in {1..10}; do gluster v heal arbitervol info split-brain --nolog | head -2; done

5. Re-check log count 
# wc -l /var/log/glusterfs/glfsheal-arbitervol.log
5305 /var/log/glusterfs/glfsheal-arbitervol.log

Setting this to verified.

Comment 27 Anees Patel 2018-12-26 05:24:52 UTC

Verified the fix as per the above comment on the latest build,

# rpm -qa | grep gluster
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-rdma-3.12.2-34.el7rhgs.x86_64
glusterfs-server-3.12.2-34.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-34.el7rhgs.x86_64
glusterfs-fuse-3.12.2-34.el7rhgs.x86_64
glusterfs-events-3.12.2-34.el7rhgs.x86_64
glusterfs-3.12.2-34.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64
glusterfs-api-3.12.2-34.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-34.el7rhgs.x86_64
glusterfs-libs-3.12.2-34.el7rhgs.x86_64

Comment 29 errata-xmlrpc 2019-02-04 07:41:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0263

Note You need to log in before you can comment on or make changes to this bug.