Description of problem: Magic block on my XFS disk was failure and i tried xfs_repair. After xfs_repair i rebuild partition table. Data on my disk was saved. Next i tried restart service and show status gluster volume: [root@node1 ~]# gluster volume status BlockStorage1-3 Locking failed on data3. Please check log file for details. Locking failed on data0. Please check log file for details. Version-Release number of selected component (if applicable): [root@node1 ~]# rpm -qa | grep gluster glusterfs-3.6.2-1.el7.x86_64 glusterfs-api-3.6.2-1.el7.x86_64 glusterfs-fuse-3.6.2-1.el7.x86_64 glusterfs-server-3.6.2-1.el7.x86_64 glusterfs-libs-3.6.2-1.el7.x86_64 glusterfs-cli-3.6.2-1.el7.x86_64 [root@node1 ~]# gluster volume info BlockStorage1-3 Volume Name: BlockStorage1-3 Type: Replicate Volume ID: fd146b57-6a49-497b-8aa0-b324dd50e79a Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: data3.os.ptl.ru:/data/glusterfs/disk1/BlockStorage1-3 Brick2: data1.os.ptl.ru:/data/glusterfs/disk1/BlockStorage1-3 Brick3: data0.os.ptl.ru:/data/glusterfs/disk1/BlockStorage1-3 Options Reconfigured: auth.allow: 10.0.2.* [root@node1 ~]# gluster peer status Number of Peers: 2 Hostname: data3 Uuid: b1b583d0-f884-47ad-9376-28a625e39d15 State: Peer in Cluster (Connected) Hostname: data0 Uuid: ad5425db-0b88-48a7-90b7-a585609ce95d State: Peer in Cluster (Connected) After 20-30 second, i tried again and gluster peer has disconnected: [root@node1 ~]# gluster peer status Number of Peers: 2 Hostname: data3 Uuid: b1b583d0-f884-47ad-9376-28a625e39d15 State: Peer in Cluster (Disconnected) Hostname: data0 Uuid: ad5425db-0b88-48a7-90b7-a585609ce9 5d State: Peer in Cluster (Disconnected) After 20-30 second, i tried again and gluster peer has connected: [root@node1 ~]# gluster peer status Number of Peers: 2 Hostname: data3.os.ptl.ru Uuid: b1b583d0-f884-47ad-9376-28a625e39d15 State: Peer in Cluster (Connected) Hostname: data0 Uuid: ad5425db-0b88-48a7-90b7-a585609ce95d State: Peer in Cluster (Connected) How reproducible: 1/1 Steps to Reproduce: 1. xfs_repair /dev/sdb1 2. fdisk /dev/sdb ( delete partition, create new partition ) 3. check gluster volume status and logs Actual results: Peer is disconnected and reconnected again Expected results: Peer should not be disconnected. Additional info: Logs: [2015-02-04 09:28:24.748664] I [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30600 [2015-02-04 09:28:41.655675] I [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30600 [2015-02-04 09:28:14.187049] I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer ad5425db-0b88-48a7-90b7-a585609ce95d, in Peer in Cluster state, has disconnected from glusterd. [2015-02-04 09:29:10.194630] C [rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-management: server 10.0.2.101:24007 has not responded in the last 30 seconds, disconnecting. [2015-02-04 09:29:10.195189] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f2cbad514c6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2cbab2401e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2cbab2412e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7f2cbab25a92] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f2cbab26248] ))))) 0-management: forced unwinding frame type(Peer mgmt) op(--(2)) called at 2015-02-04 09:28:10.277497 (xid=0x9c) [2015-02-04 09:29:10.195369] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f2cbad514c6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2cbab2401e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2cbab2412e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7f2cbab25a92] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f2cbab26248] ))))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-02-04 09:28:40.192112 (xid=0x9d) [2015-02-04 09:29:10.195400] W [rpc-clnt-ping.c:154:rpc_clnt_ping_cbk] 0-management: socket disconnected [2015-02-04 09:29:10.195426] I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer 2414717a-7615-4b0a-9940-e5e82592482c, in Peer in Cluster state, has disconnected from glusterd. [2015-02-04 09:29:10.195638] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f2cbad514c6] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x3f1)[0x7f2cabd89521] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x1a2)[0x7f2cabd01442] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)[0x7f2cabcfa01c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x90)[0x7f2cbab26290] ))))) 0-management: Lock for vol BlockStorage1-3 not held
Logs from data3 in the previous post Logs from data0: http://fpaste.org/181287/43641142/ Logs from data1: http://fpaste.org/181288/30437371/
[root@data0 ~]# netstat -ntap | grep -i glus tcp 0 0 0.0.0.0:24007 0.0.0.0:* LISTEN 8277/glusterd tcp 0 0 10.0.2.3:24007 10.0.2.103:1021 ESTABLISHED 8277/glusterd tcp 0 1964 10.0.2.3:24007 10.0.2.103:1023 ESTABLISHED 8277/glusterd tcp 0 1964 10.0.2.3:24007 10.0.2.101:1023 ESTABLISHED 8277/glusterd tcp 0 4996 10.0.2.3:1016 10.0.2.103:24007 ESTABLISHED 8277/glusterd tcp 0 5060 10.0.2.3:1018 10.0.2.101:24007 ESTABLISHED 8277/glusterd tcp 0 0 10.0.2.3:24007 10.0.2.101:1021 ESTABLISHED 8277/glusterd tcp 0 0 10.0.2.3:24007 10.0.2.101:1022 ESTABLISHED 8277/glusterd
I solved this problem. Sorry problem not in gluster. I configured network interfaces on storage node with MTU=9000 and switch between nodes with jumboo frames, but no save configuration on switch. After reboot, configuration reset and packets not went from node to node..Sorry again :)