Bug 1385525 - Continuous warning messages getting when one of the cluster node is down on SSL setup.
Summary: Continuous warning messages getting when one of the cluster node is down on S...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Mohit Agrawal
QA Contact: Byreddy
URL:
Whiteboard:
Depends On:
Blocks: 1351528 1386450 1387975 1387976
TreeView+ depends on / blocked
 
Reported: 2016-10-17 09:10 UTC by Byreddy
Modified: 2017-03-23 06:10 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.8.4-4
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1386450 1387975 1387976 (view as bug list)
Environment:
Last Closed: 2017-03-23 06:10:46 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Byreddy 2016-10-17 09:10:29 UTC
Description of problem:
=======================
Getting the below warning messages in glusterd and volume mount logs continuously on SSL SETUP when one of the cluster node glusterd is down/node is down.

Message frequency is very high and if cluster node is down for any known reason for days, it will consume lot of log storage.


[2016-10-17 08:15:55.008898] W [socket.c:590:__socket_rwv] 0-management: readv on 10.70.43.190:24007 failed (No data available)
[2016-10-17 08:15:55.009082] W [socket.c:590:__socket_rwv] 0-management: readv on 10.70.43.190:24007 failed (No data available)
[2016-10-17 08:15:55.009289] W [socket.c:590:__socket_rwv] 0-management: readv on 10.70.43.190:24007 failed (No data available)
[2016-10-17 08:15:55.009546] W [socket.c:590:__socket_rwv] 0-management: readv on 10.70.43.190:24007 failed (No data available)
[2016-10-17 08:15:55.009736] W [socket.c:590:__socket_rwv] 0-management: readv on 10.70.43.190:24007 failed (No data available) 





Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-2


How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Have two node cluster 
2. Create a SSL configuration.
3. Create and start any volume type
4. fuse mount the volume.
5. Stop glusterd on vol file server node/ shutdown the node
6. Check the glusterd logs in the peer nodes and the volume mount log for the above mentioned warning messages  


Actual results:
===============
Continuous warning messages getting when one of the cluster node is down on SSL setup.


Expected results:
=================
There should be control on throwing the warning messages or it should not throw the warning messages.


Additional info:
================
I used backup vol file server while mounting the volume then also issue remains same in the volume mount log.

Comment 2 Byreddy 2016-10-18 05:16:56 UTC
This issue is not there in the last GA build.

Comment 3 Mohit Agrawal 2016-10-18 11:26:58 UTC
Hi,

  At the time of start glusterd process it returns a port to the other peer node on which it will communicate with glusterd(24007).After stop glusterd on a node means one end point of port is closed and other point is still used by socket(glusterd 24007),glusterd(socket_poller) on other node call's continuously socket_event_poll_in that calls __socket_rwv to read data from socket unless buffer is not clear.
Because other end point of socket is disconnected so it prints the message "0-management: readv on 10.65.7.252:24007 failed (No data available)".

After changed the condition to log the message issue will resolve.


Regards
Mohit Agrawal

Comment 5 Atin Mukherjee 2016-10-25 11:25:34 UTC
upstream mainline : http://review.gluster.org/15677
upstream 3.9 : http://review.gluster.org/15711

Comment 7 Byreddy 2016-10-28 09:10:04 UTC
I am seeing little diff in the issue with the build 3.8.4-3 with same scenario, updating the result here.

Getting the same readv messages in the mount log for the above scenarios mentioned BUT in peer nodes glusterd log, getting the below error messages.


[2016-10-28 09:03:31.016907] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-10-28 09:03:31.017272] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-10-28 09:03:31.017368] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-10-28 09:03:34.026491] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-10-28 09:03:34.026861] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-10-28 09:03:34.027166] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-10-28 09:03:37.035856] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-10-28 09:03:37.036416] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-10-28 09:03:37.036545] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-10-28 09:03:40.045129] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-10-28 09:03:40.045473] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)

in the previous build i was getting readv messages in peer nodes glusterd log  and in volume mount log.

Comment 8 Atin Mukherjee 2016-11-07 05:15:35 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/89220/

Comment 10 Byreddy 2016-11-15 10:08:37 UTC
Verified this bug using glusterfs-3.8.4-5 build.
Fix is working good.

Reported issue is not seen with this build

Moving to verified state.

Comment 12 errata-xmlrpc 2017-03-23 06:10:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.