Description of problem: In trying to troubleshoot a customer problem, I find the gluster logs all have the same set of very verbose messages, repeated very many times. That raises these questions: 1. What is generating the messages? 2. What are they trying to tell me? 3. Why are the same messages repeated in so many different message logs? 4. How do I stop them so the logs become useful again? Generating and uploading sosreports is a lot of trouble for this customer and these messages seem to be the reason why. Version-Release number of selected component (if applicable): How reproducible: Open nearly any log file in /var/log/glusterfs Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Messages in the heal logs pertain just to healing, messages in the rebalance logs pertain just to rebalancing, etc. Additional info: Examples and statistics to follow.
Thanks for these reports Andrew! Some of these logs are around mainly because the network disconnects caused most of the user issues with the project, and we needed to identify the user's issues to network disruption, and hence each of these log files would be having those logs. We would like to pick these enhancements later cycle.
Andrew, I'm assuming your concern is mainly with the glfsheal logs. I checked with Ravishankar (who handles AFR) and he explained that glfsheal is a gfapi client which connects to the bricks, executes the heal info command and then disconnects. It is not a long running daemon so the above steps are performed every time you run heal info. From the logs on rstore13, it looks like something is calling heal info periodically (every 3 to 5 mins) on these volumes, causing these messages to be written to the logs. The heal info has been run over 2K times on this volume. [nbalacha@dhcp35-62 glusterfs]$ grep Final glfsheal-sequencers.log |wc -l 2266 [nbalacha@myserver glusterfs]$ grep -B2 Final glfsheal-sequencers.log <snip> [2018-09-22 23:07:15.785150] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds [2018-09-22 23:07:15.785325] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1 Final graph: -- [2018-09-22 23:10:15.684104] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds [2018-09-22 23:10:15.684301] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1 Final graph: -- [2018-09-22 23:13:15.811615] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds [2018-09-22 23:13:15.811761] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1 Final graph: -- [2018-09-22 23:15:15.589131] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds [2018-09-22 23:15:15.589249] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1 Final graph: -- [2018-09-22 23:18:15.567704] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds [2018-09-22 23:18:15.567949] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1 </snip> So the number of times these messages show up is dependent on the number of times the heal info command was run. Ravi has posted a patch upstream to suppress these logs. https://bugzilla.redhat.com/show_bug.cgi?id=1643519 https://review.gluster.org/#/c/glusterfs/+/21501/ I am assigning this to him based on the above.
Verified the fix per https://code.engineering.redhat.com/gerrit/#/c/158640/ Steps to verify: 1. Check for glfsheal log count 55 ~]# wc -l /var/log/glusterfs/glfsheal-arbitervol.log 5305 /var/log/glusterfs/glfsheal-arbitervol.log 2. Execute heal info with --nolog flag 55 ~]# for i in {1..10}; do gluster v heal arbitervol info --nolog | head -2; done 3. Re-check log count # wc -l /var/log/glusterfs/glfsheal-arbitervol.log 5305 /var/log/glusterfs/glfsheal-arbitervol.log 4. Execute 55 ~]# for i in {1..10}; do gluster v heal arbitervol info split-brain --nolog | head -2; done 5. Re-check log count # wc -l /var/log/glusterfs/glfsheal-arbitervol.log 5305 /var/log/glusterfs/glfsheal-arbitervol.log Setting this to verified.
Verified the fix as per the above comment on the latest build, # rpm -qa | grep gluster gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-rdma-3.12.2-34.el7rhgs.x86_64 glusterfs-server-3.12.2-34.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-34.el7rhgs.x86_64 glusterfs-fuse-3.12.2-34.el7rhgs.x86_64 glusterfs-events-3.12.2-34.el7rhgs.x86_64 glusterfs-3.12.2-34.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64 glusterfs-api-3.12.2-34.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-34.el7rhgs.x86_64 glusterfs-libs-3.12.2-34.el7rhgs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0263