Bug 1412930

Summary: [SSL] - when a node or glusternw is down all the clients logs are flooded with SSL connect error and client setup failed messages
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: RamaKasturi <knarra>
Component: coreAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED ERRATA QA Contact: Bala Konda Reddy M <bmekala>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, asrivast, rhinduja, rhs-bugs, storage-qa-internal, vbellur
Target Milestone: ---   
Target Release: RHGS 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: ssl
Fixed In Version: glusterfs-3.8.4-19 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-21 04:30:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1277939, 1417147, 1433896    

Description RamaKasturi 2017-01-13 07:13:46 UTC
Description of problem:
Have SSL enabled on the volumes. Bring down one of the node/glusternw in the cluster. until the node or glusternw comes back up client logs are flooded with the messages below.

[2017-01-13 06:28:09.111117] E [socket.c:3135:socket_connect] 0-engine-client-0: connection attempt on 10.70.36.79:49153 failed, (No route to host)
[2017-01-13 06:28:09.111325] E [socket.c:353:ssl_setup_connection] 0-engine-client-0: SSL connect error (client: 10.70.36.79:49153)
[2017-01-13 06:28:09.111344] E [socket.c:2443:socket_poller] 0-engine-client-0: client setup failed


Version-Release number of selected component (if applicable):
glusterfs-3.8.4-11.el7rhgs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Have SSL enabled on the volumes on a three node cluster
2. Now bring down one of the node or glusternw.
3. 

Actual results:
client logs are flooded with the messages put in the description until the node or network comes back up.

Expected results:
client logs should just log one instance of not able to reach the port and the log should not be flooded with these messages.

Additional info:

Comment 3 Atin Mukherjee 2017-02-27 06:08:29 UTC
upstream patch : https://review.gluster.org/16767

Comment 5 Atin Mukherjee 2017-03-24 09:57:24 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/101323/

Comment 7 Bala Konda Reddy M 2017-06-19 15:18:34 UTC
BUILD : 3.8.4-26

Followed the steps mentioned in the description.
1. When the node is down or glusterd is stopped still getting the following error messages in Client logs
[2017-06-19 15:08:14.321261] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-cross2-client-0: disconnected from cross2-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2017-06-19 15:08:17.327234] E [socket.c:3219:socket_connect] 0-glusterfs: connection attempt on 10.70.37.135:24007 failed, (No route to host)
[2017-06-19 15:08:20.333123] E [socket.c:3219:socket_connect] 0-glusterfs: connection attempt on 10.70.37.135:24007 failed, (No route to host)
[2017-06-19 15:08:26.343173] E [socket.c:3219:socket_connect] 0-glusterfs: connection attempt on 10.70.37.135:24007 failed, (No route to host)
[2017-06-19 15:08:29.349299] E [socket.c:3219:socket_connect] 0-cross2-client-0: connection attempt on 10.70.37.135:49152 failed, (No route to host)
[2017-06-19 15:08:32.355294] E [socket.c:3219:socket_connect] 0-glusterfs: connection attempt on 10.70.37.135:24007 failed, (No route to host)
[2017-06-19 15:08:35.361269] E [socket.c:3219:socket_connect] 0-cross2-client-0: connection attempt on 10.70.37.135:49152 failed, (No route to host)
[2017-06-19 15:08:38.367180] E [socket.c:3219:socket_connect] 0-glusterfs: connection attempt on 10.70.37.135:24007 failed, (No route to host)
[2017-06-19 15:08:41.373326] E [socket.c:3219:socket_connect] 0-cross2-client-0: connection attempt on 10.70.37.135:49152 failed, (No route to host)

The following are glusterd logs of one node which is part of trusted storage pool

[2017-06-19 15:08:16.360551] E [socket.c:3219:socket_connect] 0-management: connection attempt on 10.70.37.135:24007 failed, (No route to host)
[2017-06-19 15:08:22.370603] E [socket.c:3219:socket_connect] 0-management: connection attempt on 10.70.37.135:24007 failed, (No route to host)
[2017-06-19 15:08:25.376630] E [socket.c:3219:socket_connect] 0-management: connection attempt on 10.70.37.135:24007 failed, (No route to host)
[2017-06-19 15:08:31.386711] E [socket.c:3219:socket_connect] 0-management: connection attempt on 10.70.37.135:24007 failed, (No route to host)
[2017-06-19 15:08:37.400693] E [socket.c:3219:socket_connect] 0-management: connection attempt on 10.70.37.135:24007 failed, (No route to host)
[2017-06-19 15:08:40.406546] E [socket.c:3219:socket_connect] 0-management: connection attempt on 10.70.37.135:24007 failed, (No route to host)
[2017-06-19 15:08:46.416718] E [socket.c:3219:socket_connect] 0-management: connection attempt on 10.70.37.135:24007 failed, (No route to host)
[2017-06-19 15:08:49.422555] E [socket.c:3219:socket_connect] 0-management: connection attempt on 10.70.37.135:24007 failed, (No route to host)
[2017-06-19 15:08:55.432719] E [socket.c:3219:socket_connect] 0-management: connection attempt on 10.70.37.135:24007 failed, (No route to host)
[2017-06-19 15:08:58.439598] E [socket.c:3219:socket_connect] 0-management: connection attempt on 10.70.37.135:24007 failed, (No route to host)

Hence marking it failed qa

Comment 8 Bala Konda Reddy M 2017-06-19 15:23:00 UTC
The error messages in client and server nodes are not related to ssl but the logs are flooded with error messages mentioned in comment7(In reply to Bala Konda Reddy M from comment #7)
> BUILD : 3.8.4-26
> 
> Followed the steps mentioned in the description.
> 1. When the node is down or glusterd is stopped still getting the following
> error messages in Client logs
> [2017-06-19 15:08:14.321261] I [MSGID: 114018]
> [client.c:2280:client_rpc_notify] 0-cross2-client-0: disconnected from
> cross2-client-0. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2017-06-19 15:08:17.327234] E [socket.c:3219:socket_connect] 0-glusterfs:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> [2017-06-19 15:08:20.333123] E [socket.c:3219:socket_connect] 0-glusterfs:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> [2017-06-19 15:08:26.343173] E [socket.c:3219:socket_connect] 0-glusterfs:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> [2017-06-19 15:08:29.349299] E [socket.c:3219:socket_connect]
> 0-cross2-client-0: connection attempt on 10.70.37.135:49152 failed, (No
> route to host)
> [2017-06-19 15:08:32.355294] E [socket.c:3219:socket_connect] 0-glusterfs:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> [2017-06-19 15:08:35.361269] E [socket.c:3219:socket_connect]
> 0-cross2-client-0: connection attempt on 10.70.37.135:49152 failed, (No
> route to host)
> [2017-06-19 15:08:38.367180] E [socket.c:3219:socket_connect] 0-glusterfs:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> [2017-06-19 15:08:41.373326] E [socket.c:3219:socket_connect]
> 0-cross2-client-0: connection attempt on 10.70.37.135:49152 failed, (No
> route to host)
> 
> The following are glusterd logs of one node which is part of trusted storage
> pool
> 
> [2017-06-19 15:08:16.360551] E [socket.c:3219:socket_connect] 0-management:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> [2017-06-19 15:08:22.370603] E [socket.c:3219:socket_connect] 0-management:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> [2017-06-19 15:08:25.376630] E [socket.c:3219:socket_connect] 0-management:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> [2017-06-19 15:08:31.386711] E [socket.c:3219:socket_connect] 0-management:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> [2017-06-19 15:08:37.400693] E [socket.c:3219:socket_connect] 0-management:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> [2017-06-19 15:08:40.406546] E [socket.c:3219:socket_connect] 0-management:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> [2017-06-19 15:08:46.416718] E [socket.c:3219:socket_connect] 0-management:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> [2017-06-19 15:08:49.422555] E [socket.c:3219:socket_connect] 0-management:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> [2017-06-19 15:08:55.432719] E [socket.c:3219:socket_connect] 0-management:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> [2017-06-19 15:08:58.439598] E [socket.c:3219:socket_connect] 0-management:
> connection attempt on 10.70.37.135:24007 failed, (No route to host)
> 
> Hence marking it failed qa

Comment 9 Mohit Agrawal 2017-06-19 15:32:55 UTC
Hi Bala,

  I fixed the problem only to avoid SSL specific errors not socket_connect errors.
  As you can see in problem description logs earlier there were ssl_setup connection errors after failed socket_connect so i resolved that issue from the patch. 
  I did not change the log specific to socket_connect error, i think it is required for user to know about the root cause of not establish the connection.

  IMO it is working as expected.

Regards
Mohit Agrawal

Comment 10 Atin Mukherjee 2017-06-19 15:36:35 UTC
Based on comment 9, moving this back to ON_QA

Comment 11 Mohit Agrawal 2017-06-19 15:38:46 UTC
Even in case of non-ssl also you will see same kind of error logs, these logs are not specific to SSL.



Regards
Mohit Agrawal

Comment 12 Bala Konda Reddy M 2017-06-20 02:55:17 UTC
As per Mohit's comment, the changes are made respective to SSL, these errors messages are expected hence marking it to verified


(In reply to Mohit Agrawal from comment #9)
> Hi Bala,
> 
> I fixed the problem only to avoid SSL specific errors not socket_connect
> errors.
> As you can see in problem description logs earlier there were ssl_setup
> connection errors after failed socket_connect so i resolved that issue from
> the patch. 
> I did not change the log specific to socket_connect error, i think it is
> required for user to know about the root cause of not establish the
> connection.
> 
> IMO it is working as expected.
> 
> Regards
> Mohit Agrawal

Comment 14 errata-xmlrpc 2017-09-21 04:30:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 15 errata-xmlrpc 2017-09-21 04:56:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774