Bug 1631851
Summary: | Why do all of the gluster logs contain the exact same messages and so many of them? | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Andrew Robinson <anrobins> |
Component: | core | Assignee: | Ravishankar N <ravishankar> |
Status: | CLOSED ERRATA | QA Contact: | Anees Patel <anepatel> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | rhgs-3.4 | CC: | anrobins, bkunal, khiremat, kramdoss, nbalacha, psony, ravishankar, rhs-bugs, sankarshan, sheggodu, srangana, storage-qa-internal, sunkumar |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | RHGS 3.4.z Batch Update 3 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.12.2-33 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-02-04 07:41:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Andrew Robinson
2018-09-21 17:35:34 UTC
Thanks for these reports Andrew! Some of these logs are around mainly because the network disconnects caused most of the user issues with the project, and we needed to identify the user's issues to network disruption, and hence each of these log files would be having those logs. We would like to pick these enhancements later cycle. Andrew, I'm assuming your concern is mainly with the glfsheal logs. I checked with Ravishankar (who handles AFR) and he explained that glfsheal is a gfapi client which connects to the bricks, executes the heal info command and then disconnects. It is not a long running daemon so the above steps are performed every time you run heal info. From the logs on rstore13, it looks like something is calling heal info periodically (every 3 to 5 mins) on these volumes, causing these messages to be written to the logs. The heal info has been run over 2K times on this volume. [nbalacha@dhcp35-62 glusterfs]$ grep Final glfsheal-sequencers.log |wc -l 2266 [nbalacha@myserver glusterfs]$ grep -B2 Final glfsheal-sequencers.log <snip> [2018-09-22 23:07:15.785150] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds [2018-09-22 23:07:15.785325] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1 Final graph: -- [2018-09-22 23:10:15.684104] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds [2018-09-22 23:10:15.684301] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1 Final graph: -- [2018-09-22 23:13:15.811615] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds [2018-09-22 23:13:15.811761] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1 Final graph: -- [2018-09-22 23:15:15.589131] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds [2018-09-22 23:15:15.589249] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1 Final graph: -- [2018-09-22 23:18:15.567704] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-sequencers-client-15: Server and Client lk-version numbers are not same, reopening the fds [2018-09-22 23:18:15.567949] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-sequencers-client-15: Server lk version = 1 </snip> So the number of times these messages show up is dependent on the number of times the heal info command was run. Ravi has posted a patch upstream to suppress these logs. https://bugzilla.redhat.com/show_bug.cgi?id=1643519 https://review.gluster.org/#/c/glusterfs/+/21501/ I am assigning this to him based on the above. Verified the fix per https://code.engineering.redhat.com/gerrit/#/c/158640/ Steps to verify: 1. Check for glfsheal log count 55 ~]# wc -l /var/log/glusterfs/glfsheal-arbitervol.log 5305 /var/log/glusterfs/glfsheal-arbitervol.log 2. Execute heal info with --nolog flag 55 ~]# for i in {1..10}; do gluster v heal arbitervol info --nolog | head -2; done 3. Re-check log count # wc -l /var/log/glusterfs/glfsheal-arbitervol.log 5305 /var/log/glusterfs/glfsheal-arbitervol.log 4. Execute 55 ~]# for i in {1..10}; do gluster v heal arbitervol info split-brain --nolog | head -2; done 5. Re-check log count # wc -l /var/log/glusterfs/glfsheal-arbitervol.log 5305 /var/log/glusterfs/glfsheal-arbitervol.log Setting this to verified. Verified the fix as per the above comment on the latest build, # rpm -qa | grep gluster gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-rdma-3.12.2-34.el7rhgs.x86_64 glusterfs-server-3.12.2-34.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-34.el7rhgs.x86_64 glusterfs-fuse-3.12.2-34.el7rhgs.x86_64 glusterfs-events-3.12.2-34.el7rhgs.x86_64 glusterfs-3.12.2-34.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64 glusterfs-api-3.12.2-34.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-34.el7rhgs.x86_64 glusterfs-libs-3.12.2-34.el7rhgs.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0263 |