Hide Forgot
Description of problem: ======================= When one of the cluster node glusterd is down in SSL setup, getting the below error messages continuously in all peer nodes glusterd logs. [2016-11-15 09:47:05.474535] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused) [2016-11-15 09:47:05.474913] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007) [2016-11-15 09:47:05.475047] E [socket.c:2436:socket_poller] 0-management: client setup failed [2016-11-15 09:47:08.483896] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused) [2016-11-15 09:47:08.484246] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007) [2016-11-15 09:47:08.484356] E [socket.c:2436:socket_poller] 0-management: client setup failed [2016-11-15 09:47:11.493022] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused) [2016-11-15 09:47:11.493301] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007) [2016-11-15 09:47:11.493613] E [socket.c:2436:socket_poller] 0-management: client setup failed [2016-11-15 09:47:14.502569] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused) [2016-11-15 09:47:14.502908] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007) [2016-11-15 09:47:14.503219] E [socket.c:2436:socket_poller] 0-management: client setup failed [2016-11-15 09:47:17.511671] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused) [2016-11-15 09:47:17.512033] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007) [2016-11-15 09:47:17.512272] E [socket.c:2436:socket_poller] 0-management: client setup failed [2016-11-15 09:47:20.520948] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused) [2016-11-15 09:47:20.521335] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007) [2016-11-15 09:47:20.521542] E [socket.c:2436:socket_poller] 0-management: client setup failed [2016-11-15 09:47:23.530622] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:240 This will consume huge log storage unnecessarily when nodes are taken for maintenance Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.8.4-5 How reproducible: ================= Always Steps to Reproduce: =================== 1. Create a SSL setup using 2 or 3 nodes cluster 2. Have one simple volume 3. down glusterd in one of cluster node. 4. Check the glusterd log in the peer nodes where glusterd is running // you will see above error messages Actual results: =============== Getting continuous error messages when peer glusterd is down in SSL setup Expected results: ================= We should have control on flooding these error messages Additional info: ================ In non SSL setup, we won't see this problem
Hi, I have setup SSL environment on release(glusterfs-3.7.9-12.el7rhgs.x86_64) to check the logs in case of glusterd is down on one node. Below are the message those are coming in previous release. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2016-11-17 02:05:06.260500] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument [2016-11-17 02:05:06.260554] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2016-11-17 02:05:06.260967] E [socket.c:3147:socket_connect] 0-management: connection attempt on 10.65.7.253:24007 failed, (Connection refused) [2016-11-17 02:05:06.261012] W [socket.c:3221:socket_connect] 0-: failed to register the event >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Below are logs coming in new release. >>>>>>>>>>>>>>>>>>>>>>>>>>> [2016-11-15 09:47:05.474535] E [socket.c:3097:socket_connect] 0-management: connection attempt on 10.70.41.198:24007 failed, (Connection refused) [2016-11-15 09:47:05.474913] E [socket.c:353:ssl_setup_connection] 0-management: SSL connect error (client: 10.70.41.198:24007) [2016-11-15 09:47:05.475047] E [socket.c:2436:socket_poller] 0-management: client setup failed >>>>>>>>>>>>>>>>>>>>>>>>>>>> It shows clearly one message is common and the other two are different but more informative compare to previous release. We had done some changes in 3.8 socket code that's why message are more informative in 3.8 release. I think it is clear from above logs in earlier release(3.7) no. of message are more as compare to 3.8 release that's why i think it is expected behavior. Regards Mohit Agrawal
(In reply to Atin Mukherjee from comment #4) > Byreddy - based on comment 3, I propose this bug to be closed once you > retest this with rhgs-3.1.3. I tested this in 3.1.3 build, getting similar messages when glusterd is down. Closing as NOT A BUG.