1189027 – Gluster volume crash after rebuild partition table on XFS disk

Bug 1189027 - Gluster volume crash after rebuild partition table on XFS disk

Summary: Gluster volume crash after rebuild partition table on XFS disk

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	pre-release
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-02-04 09:30 UTC by Qtolokon
Modified:	2015-02-04 13:17 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-02-04 13:17:20 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Qtolokon 2015-02-04 09:30:12 UTC

Description of problem:
Magic block on my XFS disk was failure and i tried xfs_repair. After xfs_repair i rebuild partition table. Data on my disk was saved. Next i tried restart service and show status gluster volume:

[root@node1 ~]# gluster volume status BlockStorage1-3
Locking failed on data3. Please check log file for details.
Locking failed on data0. Please check log file for details.

Version-Release number of selected component (if applicable):
[root@node1 ~]# rpm -qa | grep gluster
glusterfs-3.6.2-1.el7.x86_64
glusterfs-api-3.6.2-1.el7.x86_64
glusterfs-fuse-3.6.2-1.el7.x86_64
glusterfs-server-3.6.2-1.el7.x86_64
glusterfs-libs-3.6.2-1.el7.x86_64
glusterfs-cli-3.6.2-1.el7.x86_64

[root@node1 ~]# gluster volume info BlockStorage1-3
 
Volume Name: BlockStorage1-3
Type: Replicate
Volume ID: fd146b57-6a49-497b-8aa0-b324dd50e79a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: data3.os.ptl.ru:/data/glusterfs/disk1/BlockStorage1-3
Brick2: data1.os.ptl.ru:/data/glusterfs/disk1/BlockStorage1-3
Brick3: data0.os.ptl.ru:/data/glusterfs/disk1/BlockStorage1-3
Options Reconfigured:
auth.allow: 10.0.2.*

[root@node1 ~]# gluster peer status
Number of Peers: 2

Hostname: data3
Uuid: b1b583d0-f884-47ad-9376-28a625e39d15
State: Peer in Cluster (Connected)

Hostname: data0
Uuid: ad5425db-0b88-48a7-90b7-a585609ce95d
State: Peer in Cluster (Connected)


After 20-30 second, i tried again and gluster peer has disconnected:
[root@node1 ~]# gluster peer status
Number of Peers: 2

Hostname: data3
Uuid: b1b583d0-f884-47ad-9376-28a625e39d15
State: Peer in Cluster (Disconnected)

Hostname: data0
Uuid: ad5425db-0b88-48a7-90b7-a585609ce9
5d
State: Peer in Cluster (Disconnected) 

After 20-30 second, i tried again and gluster peer has connected:

[root@node1 ~]# gluster peer status
Number of Peers: 2

Hostname: data3.os.ptl.ru
Uuid: b1b583d0-f884-47ad-9376-28a625e39d15
State: Peer in Cluster (Connected)

Hostname: data0
Uuid: ad5425db-0b88-48a7-90b7-a585609ce95d
State: Peer in Cluster (Connected)

How reproducible:
1/1

Steps to Reproduce:
1. xfs_repair /dev/sdb1
2. fdisk /dev/sdb ( delete partition, create new partition )
3. check gluster volume status and logs

Actual results:

Peer is disconnected and reconnected again

Expected results:

Peer should not be disconnected.

Additional info:

Logs:
[2015-02-04 09:28:24.748664] I [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30600
[2015-02-04 09:28:41.655675] I [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30600
[2015-02-04 09:28:14.187049] I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer ad5425db-0b88-48a7-90b7-a585609ce95d, in Peer in Cluster state, has disconnected from glusterd.
[2015-02-04 09:29:10.194630] C [rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-management: server 10.0.2.101:24007 has not responded in the last 30 seconds, disconnecting.
[2015-02-04 09:29:10.195189] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f2cbad514c6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2cbab2401e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2cbab2412e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7f2cbab25a92] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f2cbab26248] ))))) 0-management: forced unwinding frame type(Peer mgmt) op(--(2)) called at 2015-02-04 09:28:10.277497 (xid=0x9c)
[2015-02-04 09:29:10.195369] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f2cbad514c6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2cbab2401e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2cbab2412e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7f2cbab25a92] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f2cbab26248] ))))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-02-04 09:28:40.192112 (xid=0x9d)
[2015-02-04 09:29:10.195400] W [rpc-clnt-ping.c:154:rpc_clnt_ping_cbk] 0-management: socket disconnected
[2015-02-04 09:29:10.195426] I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer 2414717a-7615-4b0a-9940-e5e82592482c, in Peer in Cluster state, has disconnected from glusterd.
[2015-02-04 09:29:10.195638] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f2cbad514c6] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x3f1)[0x7f2cabd89521] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x1a2)[0x7f2cabd01442] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)[0x7f2cabcfa01c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x90)[0x7f2cbab26290] ))))) 0-management: Lock for vol BlockStorage1-3 not held

Comment 1 Qtolokon 2015-02-04 09:56:18 UTC

Logs from data3 in the previous post

Logs from data0: http://fpaste.org/181287/43641142/

Logs from data1: http://fpaste.org/181288/30437371/

Comment 2 Qtolokon 2015-02-04 10:16:19 UTC

[root@data0 ~]# netstat -ntap | grep -i glus
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      8277/glusterd       
tcp        0      0 10.0.2.3:24007          10.0.2.103:1021         ESTABLISHED 8277/glusterd       
tcp        0   1964 10.0.2.3:24007          10.0.2.103:1023         ESTABLISHED 8277/glusterd       
tcp        0   1964 10.0.2.3:24007          10.0.2.101:1023         ESTABLISHED 8277/glusterd       
tcp        0   4996 10.0.2.3:1016           10.0.2.103:24007        ESTABLISHED 8277/glusterd       
tcp        0   5060 10.0.2.3:1018           10.0.2.101:24007        ESTABLISHED 8277/glusterd       
tcp        0      0 10.0.2.3:24007          10.0.2.101:1021         ESTABLISHED 8277/glusterd       
tcp        0      0 10.0.2.3:24007          10.0.2.101:1022         ESTABLISHED 8277/glusterd

Comment 3 Qtolokon 2015-02-04 13:17:20 UTC

I solved this problem.
Sorry problem not in gluster. 

I configured network interfaces on storage node with MTU=9000 and switch between nodes with jumboo frames, but no save configuration on switch. After reboot, configuration reset and packets not went from node to node..Sorry again :)

Note You need to log in before you can comment on or make changes to this bug.