1395158 – Getting continuous error messages when glusterd is down in SSL setup

Bug 1395158 - Getting continuous error messages when glusterd is down in SSL setup

Summary: Getting continuous error messages when glusterd is down in SSL setup

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Mohit Agrawal
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-11-15 10:04 UTC by Byreddy
Modified:	2016-11-17 04:26 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-17 04:26:35 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Byreddy 2016-11-15 10:04:14 UTC

Description of problem:
=======================
When one of the cluster node glusterd is down in SSL setup, getting the below error messages continuously in all peer nodes glusterd logs.


[2016-11-15 09:47:05.474535] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:05.474913] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:05.475047] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:08.483896] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:08.484246] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:08.484356] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:11.493022] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:11.493301] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:11.493613] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:14.502569] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:14.502908] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:14.503219] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:17.511671] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:17.512033] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:17.512272] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:20.520948] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:20.521335] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:20.521542] E [socket.c:2436:socket_poller] 0-management: client setup failed
[2016-11-15 09:47:23.530622] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:240


This will consume huge log storage unnecessarily when nodes are taken for maintenance  



Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-5


How reproducible:
=================
Always

Steps to Reproduce:
===================
1. Create a SSL setup using 2 or 3 nodes cluster
2. Have one simple volume 
3. down glusterd in one of cluster node.
4. Check the glusterd log in the peer nodes where glusterd is running // you will see above error messages 

Actual results:
===============
Getting continuous error messages when  peer glusterd is down  in SSL setup


Expected results:
=================
We should have control on flooding these error messages


Additional info:
================
In non SSL setup, we won't see this problem

Comment 3 Mohit Agrawal 2016-11-17 02:34:16 UTC

Hi,

I have setup SSL environment on release(glusterfs-3.7.9-12.el7rhgs.x86_64) to check the logs in case of glusterd is down on one node.

Below are the message those are coming in previous release.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

2016-11-17 02:05:06.260500] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument
[2016-11-17 02:05:06.260554] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-11-17 02:05:06.260967] E [socket.c:3147:socket_connect] 0-management: connection attempt on 10.65.7.253:24007 failed, (Connection refused)
[2016-11-17 02:05:06.261012] W [socket.c:3221:socket_connect] 0-: failed to register the event

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Below are logs coming in new release.

>>>>>>>>>>>>>>>>>>>>>>>>>>>

[2016-11-15 09:47:05.474535] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused)
[2016-11-15 09:47:05.474913] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007)
[2016-11-15 09:47:05.475047] E [socket.c:2436:socket_poller] 0-management: client setup failed

>>>>>>>>>>>>>>>>>>>>>>>>>>>>

It shows clearly one message is common and the other two are different but more informative compare to previous release.
We had done some changes in 3.8 socket code that's why message are more informative in 3.8 release.
I think it is clear from above logs in earlier release(3.7) no. of message are more as compare to 3.8 release that's why i think it is expected behavior.


Regards
Mohit Agrawal

Comment 5 Byreddy 2016-11-17 04:26:35 UTC

(In reply to Atin Mukherjee from comment #4)
> Byreddy - based on comment 3, I propose this bug to be closed once you
> retest this with rhgs-3.1.3.

I tested this in 3.1.3 build, getting similar messages when glusterd is down.

Closing as NOT A BUG.

Note You need to log in before you can comment on or make changes to this bug.