Bug 1300241
| Summary: | `gluster peer status' on one node shows one peer as disconnected but it appears to be connected to other peers | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Shruti Sampat <ssampat> | 
| Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> | 
| Status: | CLOSED WORKSFORME | QA Contact: | SATHEESARAN <sasundar> | 
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.1 | CC: | olim, rcyriac, rhs-bugs, sasundar, storage-qa-internal, vbellur | 
| Target Milestone: | --- | Keywords: | ZStream | 
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-02-19 11:37:28 UTC | Type: | Bug | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| 
        
          Description
        
        
          Shruti Sampat
        
        
        
        
        
          2016-01-20 10:38:41 UTC
        
       After debugging the left over setup I found that rpc_clnt_reconnect was getting triggered after every 3 seconds which means the node was trying to establish the connection with the disconnected peer. However socket_connect () was returning a failure saying that underlying transport is already connected. Basically the code expect the socket to be set to -1 but its set to 17. We tried to debug this problem further but could not conclude on anything concrete. However it seems like (DIS)CONNECT event(s) raced and because of which notifyfn to the upper layer was never called. We'd need to add few logs into the code and see whether we can reproduce it to get to the RCA. By no means its a blocker for RHGS 3.1.2 After going through the log files once again it seems like multi threaded epoll was enabled for GlusterD in the set up. Ideally once GlusterD starts up the final graph would look like this with an INFO log: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option transport.socket.listen-backlog 128 7: option rpc-auth-allow-insecure on 8: option event-threads 1 9: option ping-timeout 0 10: option transport.socket.read-fail-log off 11: option transport.socket.keepalive-interval 2 12: option transport.socket.keepalive-time 10 13: option transport-type rdma 14: option working-directory /var/lib/glusterd 15: end-volume 16: +------------------------------------------------------------------------------+ [2016-02-11 13:36:36.167635] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 But here in the logs I can see that the following: Final graph: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option transport.socket.listen-backlog 128 7: option rpc-auth-allow-insecure on 8: option ping-timeout 0 9: option transport.socket.read-fail-log off 10: option transport.socket.keepalive-interval 2 11: option transport.socket.keepalive-time 10 12: option transport-type rdma 13: option working-directory /var/lib/glusterd 14: end-volume 15: +------------------------------------------------------------------------------+ [2016-01-19 12:38:35.428986] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2016-01-19 12:38:35.429108] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 And the above indicates the thread count was set to 2 by default if glusterd.vol file doesn't have event-threads option set. Since this is not a supported configuration we'd need to close this bug. (In reply to Atin Mukherjee from comment #6) > Since this is not a supported configuration we'd need to close this bug. I did a upgrade test from RHGS 3.0 to RHGS 3.1.0 and here are the results 1. In RHGS 3.0, glusterd runs with single epoll thread only 2. In RHGS 3.0, no option available to configure number of epoll threads for glusterd After upgrade to RHGS 3.1.0, there are no changes to glusterd volfile. The above observation holds true in RHGS 3.1.0 too. After upgrade to RHGS 3.1.1, there is a change in glusterd volfile with 'event-threads 1' which prompts glusterd to start with only 1 epoll thread. If this option is removed, then glusterd starts with 2 epoll threads. We haven't really reached the conclusion, until we know how 'event-threads 1' option got removed from glusterd volfile ( in glusterfs-3.7.5-14.el7rhgs ) As per the discussion with Shruti it seems like glusterd.vol file didn't have event-threads option configured as in this container setup the glusterd.vol file was shared and the same was from a different build. Hence closing this bug. |