Bug 1395158

Summary: Getting continuous error messages when glusterd is down in SSL setup
Product: Red Hat Gluster Storage Reporter: Byreddy <bsrirama>
Component: glusterdAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED NOTABUG QA Contact: Byreddy <bsrirama>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, bsrirama, moagrawa, rhs-bugs, storage-qa-internal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-17 04:26:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Byreddy 2016-11-15 10:04:14 UTC
Description of problem:
=======================
When one of the cluster node glusterd is down in SSL setup, getting the below error messages continuously in all peer nodes glusterd logs.


[2016-11-15 09:47:05.474535] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:05.474913] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:05.475047] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:08.483896] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:08.484246] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:08.484356] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:11.493022] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:11.493301] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:11.493613] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:14.502569] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:14.502908] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:14.503219] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:17.511671] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:17.512033] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:17.512272] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:20.520948] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:20.521335] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:20.521542] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:23.530622] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:240


This will consume huge log storage unnecessarily when nodes are taken for maintenance  



Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-5


How reproducible:
=================
Always

Steps to Reproduce:
===================
1. Create a SSL setup using 2 or 3 nodes cluster
2. Have one simple volume 
3. down glusterd in one of cluster node.
4. Check the glusterd log in the peer nodes where glusterd is running // you will see above error messages 

Actual results:
===============
Getting continuous error messages when  peer glusterd is down  in SSL setup


Expected results:
=================
We should have control on flooding these error messages


Additional info:
================
In non SSL setup, we won't see this problem

Comment 3 Mohit Agrawal 2016-11-17 02:34:16 UTC
Hi,

I have setup SSL environment on release(glusterfs-3.7.9-12.el7rhgs.x86_64) to check the logs in case of glusterd is down on one node.

Below are the message those are coming in previous release.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

2016-11-17 02:05:06.260500] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument
[2016-11-17 02:05:06.260554] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-11-17 02:05:06.260967] E [socket.c:3147:socket_connect] 0-management: connection attempt on 10.65.7.253:24007 failed, (Connection refused)
[2016-11-17 02:05:06.261012] W [socket.c:3221:socket_connect] 0-: failed to register the event

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Below are logs coming in new release.

>>>>>>>>>>>>>>>>>>>>>>>>>>>

[2016-11-15 09:47:05.474535] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:05.474913] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:05.475047] E [socket.c:2436:socket_poller] 0-management: client setup failed

>>>>>>>>>>>>>>>>>>>>>>>>>>>>

It shows clearly one message is common and the other two are different but more informative compare to previous release.
We had done some changes in 3.8 socket code that's why message are more informative in 3.8 release.
I think it is clear from above logs in earlier release(3.7) no. of message are more as compare to 3.8 release that's why i think it is expected behavior.


Regards
Mohit Agrawal

Comment 5 Byreddy 2016-11-17 04:26:35 UTC
(In reply to Atin Mukherjee from comment #4)
> Byreddy - based on comment 3, I propose this bug to be closed once you
> retest this with rhgs-3.1.3.

I tested this in 3.1.3 build, getting similar messages when glusterd is down.

Closing as NOT A BUG.