Bug 968160

Summary: glusterd: 'gluster peer status' is showing all peers as 'Peer in Cluster' but from server x, server y is 'Discoonected'(server y is connected from other peers and server x is also connected from server y) and it seems it is not trying to reconnect
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: glusterdAssignee: Nagaprasad Sathyanarayana <nsathyan>
Status: CLOSED EOL QA Contact: Matt Zywusko <mzywusko>
Severity: high Docs Contact:
Priority: medium    
Version: 2.0CC: mzywusko, rhs-bugs, smohan, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-03 17:16:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rachana Patel 2013-05-29 07:02:58 UTC
Description of problem:
glusterd: 'gluster peer status' is showing all peers as 'Peer in Cluster' but from server x, server y is 'Discoonected'(server y is connected from other peers and server x is also connected from server y) and it seems it is not trying to reconnect

Version-Release number of selected component (if applicable):
glusterfs-server-3.3.0.9rhs-1.el6rhs.x86_64




Steps to Reproduce:
1.had a cluster of 3 peers and had 3 different volume.
2. didn't bring down any server or any gluster related process(neither glusterd nor a brick process), haven't done remove-brick or detach peer or new peer probe
3. while checking volume status noticed that it is not showing one brick . so verified on all server

server x which shows server y as disconnected:-
[root@rhsauto018 rpm]# gluster peer status
Number of Peers: 2

Hostname: 10.70.37.13
Port: 24007
Uuid: aed6e08b-574d-4087-8727-02278f5ac996
State: Peer in Cluster (Connected)

Hostname: rhsauto031.lab.eng.blr.redhat.com
Uuid: dbfa1c95-d845-409c-a81c-ad96b8d8cb6c
State: Peer in Cluster (Disconnected)       <-------------------------
[root@rhsauto018 rpm]# ^C
[root@rhsauto018 rpm]# gluster volume rebalance dist status
                                    Node Rebalanced-files          size       scanned      failures         status
                               ---------      -----------   -----------   -----------   -----------   ------------
                               localhost             1737    404750336        22291         3155      completed
                             10.70.37.13             3373    545259520        22203         1901      completed


[root@rhsauto018 rpm]# gluster v status dist
Status of volume: dist
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhsauto018.lab.eng.blr.redhat.com:/rhs/brick2	24012	Y	13645
Brick rhsauto038.lab.eng.blr.redhat.com:/rhs/brick2	24012	Y	13525
Brick rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5	24015	Y	14404
NFS Server on localhost					38467	Y	14410
NFS Server on 10.70.37.13				38467	Y	14330


from server y, server x is connected:-
[root@rhsauto031 rpm]# gluster peer status
Number of Peers: 2

Hostname: 10.70.37.13
Port: 24007
Uuid: aed6e08b-574d-4087-8727-02278f5ac996
State: Peer in Cluster (Connected)

Hostname: rhsauto018.lab.eng.blr.redhat.com
Uuid: 0e0d5e30-8e92-4173-8b0d-33ce8b65bff8
State: Peer in Cluster (Connected)               <-----------------------------
[root@rhsauto031 rpm]# gluster volume rebalance dist status
                                    Node Rebalanced-files          size       scanned      failures         status
                               ---------      -----------   -----------   -----------   -----------   ------------
                               localhost                0            0            0            0    not started
       rhsauto018.lab.eng.blr.redhat.com             1737    404750336        22291         3155      completed
                             10.70.37.13             3373    545259520        22203         1901      completed

[root@rhsauto031 rpm]# gluster v status dist
Status of volume: dist
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhsauto018.lab.eng.blr.redhat.com:/rhs/brick2	24012	Y13645
Brick rhsauto038.lab.eng.blr.redhat.com:/rhs/brick2	24012	Y13525
Brick rhsauto031.lab.eng.blr.redhat.com:/rhs/brick2	24015	Y13480
Brick rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5	24015	Y14404
NFS Server on localhost					38467	Y14194
NFS Server on 10.70.37.13				38467	Y14330
NFS Server on rhsauto018.lab.eng.blr.redhat.com		38467	Y14410


other server:-
[root@rhsauto038 rpm]# gluster peer status
Number of Peers: 2

Hostname: rhsauto031.lab.eng.blr.redhat.com
Port: 24007
Uuid: dbfa1c95-d845-409c-a81c-ad96b8d8cb6c
State: Peer in Cluster (Connected)

Hostname: rhsauto018.lab.eng.blr.redhat.com
Port: 24007
Uuid: 0e0d5e30-8e92-4173-8b0d-33ce8b65bff8
State: Peer in Cluster (Connected)

[root@rhsauto038 rpm]# gluster volume rebalance dist status
                                    Node Rebalanced-files          size       scanned      failures         status
                               ---------      -----------   -----------   -----------   -----------   ------------
                               localhost             3373    545259520        22203         1901      completed
       rhsauto031.lab.eng.blr.redhat.com                0            0            0            0    not started
       rhsauto018.lab.eng.blr.redhat.com             1737    404750336        22291         3155      completed

[root@rhsauto038 rpm]# gluster v status dist
Status of volume: dist
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhsauto018.lab.eng.blr.redhat.com:/rhs/brick2	24012	Y	13645
Brick rhsauto038.lab.eng.blr.redhat.com:/rhs/brick2	24012	Y	13525
Brick rhsauto031.lab.eng.blr.redhat.com:/rhs/brick2	24015	Y	13480
Brick rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5	24015	Y	14404
NFS Server on localhost					38467	Y	14330
NFS Server on rhsauto031.lab.eng.blr.redhat.com		38467	Y	14194
NFS Server on rhsauto018.lab.eng.blr.redhat.com		38467	Y	14410





Expected results:
If it is disconnected then it should be disconnected from all server and it should try to reconnect if server is up and running.

Additional info:

Comment 6 Vivek Agarwal 2015-12-03 17:16:47 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.