Bug 1395158 - Getting continuous error messages when glusterd is down in SSL setup
Summary: Getting continuous error messages when glusterd is down in SSL setup
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Mohit Agrawal
QA Contact: Byreddy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-15 10:04 UTC by Byreddy
Modified: 2016-11-17 04:26 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-17 04:26:35 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Byreddy 2016-11-15 10:04:14 UTC
Description of problem:
=======================
When one of the cluster node glusterd is down in SSL setup, getting the below error messages continuously in all peer nodes glusterd logs.


[2016-11-15 09:47:05.474535] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:05.474913] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:05.475047] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:08.483896] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:08.484246] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:08.484356] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:11.493022] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:11.493301] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:11.493613] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:14.502569] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:14.502908] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:14.503219] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:17.511671] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:17.512033] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:17.512272] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:20.520948] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:20.521335] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:20.521542] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:23.530622] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:240


This will consume huge log storage unnecessarily when nodes are taken for maintenance  



Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-5


How reproducible:
=================
Always

Steps to Reproduce:
===================
1. Create a SSL setup using 2 or 3 nodes cluster
2. Have one simple volume 
3. down glusterd in one of cluster node.
4. Check the glusterd log in the peer nodes where glusterd is running // you will see above error messages 

Actual results:
===============
Getting continuous error messages when  peer glusterd is down  in SSL setup


Expected results:
=================
We should have control on flooding these error messages


Additional info:
================
In non SSL setup, we won't see this problem

Comment 3 Mohit Agrawal 2016-11-17 02:34:16 UTC
Hi,

I have setup SSL environment on release(glusterfs-3.7.9-12.el7rhgs.x86_64) to check the logs in case of glusterd is down on one node.

Below are the message those are coming in previous release.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

2016-11-17 02:05:06.260500] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument
[2016-11-17 02:05:06.260554] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-11-17 02:05:06.260967] E [socket.c:3147:socket_connect] 0-management: connection attempt on 10.65.7.253:24007 failed, (Connection refused)
[2016-11-17 02:05:06.261012] W [socket.c:3221:socket_connect] 0-: failed to register the event

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Below are logs coming in new release.

>>>>>>>>>>>>>>>>>>>>>>>>>>>

[2016-11-15 09:47:05.474535] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:05.474913] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:05.475047] E [socket.c:2436:socket_poller] 0-management: client setup failed

>>>>>>>>>>>>>>>>>>>>>>>>>>>>

It shows clearly one message is common and the other two are different but more informative compare to previous release.
We had done some changes in 3.8 socket code that's why message are more informative in 3.8 release.
I think it is clear from above logs in earlier release(3.7) no. of message are more as compare to 3.8 release that's why i think it is expected behavior.


Regards
Mohit Agrawal

Comment 5 Byreddy 2016-11-17 04:26:35 UTC
(In reply to Atin Mukherjee from comment #4)
> Byreddy - based on comment 3, I propose this bug to be closed once you
> retest this with rhgs-3.1.3.

I tested this in 3.1.3 build, getting similar messages when glusterd is down.

Closing as NOT A BUG.


Note You need to log in before you can comment on or make changes to this bug.