Bug 1104626
| Summary: | rebalance: peer probe fails to add a new peer when rebalance is in progress | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Saurabh <saujain> | ||||||
| Component: | glusterfs | Assignee: | Bug Updates Notification Mailing List <rhs-bugs> | ||||||
| Status: | CLOSED WORKSFORME | QA Contact: | Matt Zywusko <mzywusko> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | rhgs-3.0 | CC: | amukherj, kparthas, mzywusko, nlevinki, nsathyan, saujain, vagarwal, vbellur | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2015-04-20 05:43:01 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 902157 [details]
sosreport from where peer probe was executed
As per the logs it looks like there was a flaky n/w around 11:38 because of which peer probe command was bailed out: [2014-06-04 10:38:26.033906] E [rpc-clnt.c:201:call_bail] 0-management: bailing out frame type(GLUSTERD-DUMP) op(DUMP(1)) xid = 0x1 sent = 2014-06-04 10:28:25.894138. timeout = 600 for 10.70.37.62:24007 [2014-06-04 10:38:26.034024] E [glusterd-handshake.c:1650:__glusterd_peer_dump_version_cbk] 0-: Error through RPC layer, retry again later This doesn't look like a issue at the application layer. Could you retest and confirm the behaviour. Request you to close this bug and kindly re-open if the problem persists. Per discussion with Atin, this works. Please reopen if you see this. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Created attachment 902156 [details] new-node sosreport Description of problem: While rebalance was in progress tried out to peer probe an rhs node, the peer probe was unsuccessful. Version-Release number of selected component (if applicable): glusterfs-3.6.0.12-1.el6rhs.x86_64 How reproducible: happened to be seen this time Steps to Reproduce: take a four node cluster 1. create a volume of 6x2 type, start it 2. mount the volume over nfs 3. create some directories and files. 4. once the data creation is finished, add-brick and start rebalance 5. while rebalance is going on, try probe a new rhs node. 6. gluster peer status Actual results: step 6 result on the node, where the peer probe command was executed, [root@nfs1 ~]# gluster peer status Number of Peers: 4 Hostname: 10.70.37.215 Uuid: 77f03019-30a1-4e81-b8df-6613159c8890 State: Peer in Cluster (Connected) Hostname: 10.70.37.44 Uuid: ad14a2bb-d39c-4bdf-93e8-32c7568c6d05 State: Peer in Cluster (Connected) Hostname: 10.70.37.201 Uuid: 3aaa0a5e-91d9-46c9-bb46-a46947ddaca5 State: Peer in Cluster (Connected) Hostname: rhsauto049.lab.eng.blr.redhat.com Uuid: 821e3f6f-5438-41fb-8a5d-f060704d0e8a State: Probe Sent to Peer (Connected) peer status from the already node existing of the cluster, [root@nfs2 ~]# gluster peer status Number of Peers: 3 Hostname: 10.70.37.62 Uuid: cb4a3869-24e0-4817-be29-73621ff218cb State: Peer in Cluster (Connected) Hostname: 10.70.37.201 Uuid: 3aaa0a5e-91d9-46c9-bb46-a46947ddaca5 State: Peer in Cluster (Connected) Hostname: 10.70.37.44 Uuid: ad14a2bb-d39c-4bdf-93e8-32c7568c6d05 State: Peer in Cluster (Connected) gluster peer status from the new rhs node, [root@rhsauto049 ~]# gluster peer status [root@rhsauto049 ~]# on new node, the glusterd logs 2014-06-04 10:58:27.655442] I [glusterd-handler.c:1314:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2014-06-04 10:58:27.655567] I [socket.c:3148:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2014-06-04 10:58:27.655596] E [rpcsvc.c:1247:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 3) to rpc-transport (socket.management) [2014-06-04 10:58:27.655621] E [glusterd-utils.c:410:glusterd_submit_reply] 0-: Reply submission failed [2014-06-04 10:58:27.658359] I [socket.c:2239:socket_event_handler] 0-transport: disconnecting now [2014-06-04 10:58:27.658409] I [glusterd-handler.c:1314:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2014-06-04 10:58:27.658501] I [socket.c:3148:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2014-06-04 10:58:27.658550] E [rpcsvc.c:1247:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 3) to rpc-transport (socket.management) [2014-06-04 10:58:27.658583] E [glusterd-utils.c:410:glusterd_submit_reply] 0-: Reply submission failed [2014-06-04 10:58:27.659642] I [socket.c:2239:socket_event_handler] 0-transport: disconnecting now [2014-06-04 10:58:27.659699] I [glusterd-handler.c:1314:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2014-06-04 10:58:27.659838] I [socket.c:3148:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2014-06-04 10:58:27.659868] E [rpcsvc.c:1247:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 3) to rpc-transport (socket.management) [2014-06-04 10:58:27.659892] E [glusterd-utils.c:410:glusterd_submit_reply] 0-: Reply submission failed Expected results: peer probe is expected to pass and the status on all nodes should be same which is not the case at this time Additional info: