Bug 1215002 - glusterd crashed on the node when tried to detach a tier after restoring data from the snapshot.
Summary: glusterd crashed on the node when tried to detach a tier after restoring data...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: snapshot
Version: mainline
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: Mohammed Rafi KC
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: qe_tracker_everglades 1223207 1224147 1226032
TreeView+ depends on / blocked
 
Reported: 2015-04-24 05:46 UTC by Triveni Rao
Modified: 2016-06-16 12:54 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1223207 1224147 1226032 (view as bug list)
Environment:
Last Closed: 2016-06-16 12:54:19 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Triveni Rao 2015-04-24 05:46:37 UTC
Description of problem:
i saw glusterd crash on the node when i tried to detach a tier after restoring data from the snapshot.

Version-Release number of selected component (if applicable):

[root@rhsqa14-vm1 ~]# rpm -qa| grep gluster
glusterfs-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-devel-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-geo-replication-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-resource-agents-3.7dev-0.952.gita7f1d08.el6.noarch
glusterfs-debuginfo-3.7dev-0.952.gita7f1d08.el6.x86_64
glusterfs-libs-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-api-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-fuse-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-extra-xlators-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-regression-tests-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-rdma-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-cli-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-server-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-api-devel-3.7dev-0.994.gitf522001.el6.x86_64
[root@rhsqa14-vm1 ~]# 

[root@rhsqa14-vm1 ~]# glusterfs --version
glusterfs 3.7dev built on Apr 13 2015 07:14:26
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[root@rhsqa14-vm1 ~]# 


Steps to Reproduce:
1. Create a normal distrep volume
2. Attach-tier to it.
3. fuse mount it and add some data.
4. Create snapshot and activate it.
5. access the snapshot on mount point and check everything is fine.
6. delete few files/dirs from mount point
7. stop the volume and restore the snapshot.
8. start the volume and check the data is available
9. detach the tier.

Actual results:

Glusterd crashed.
10. restored data available on the mount point
11. detached the tier causes snapshot removed which was created with tier.
12. on the gluster nodes peer status show peer rejected state.
13. no operations on the volumes can be done

[root@rhsqa14-vm1 ~]# gluster v create Mint replica 2 10.70.46.233:/rhs/brick3/M1 10.70.46.236:/rhs/brick3/M1 10.70.46.233:/rhs/brick4/M1 10.70.46.236:/rhs/brick4/M1 force
volume create: Mint: failed: Host 10.70.46.236 is not in 'Peer in Cluster' state
[root@rhsqa14-vm1 ~]#


[root@rhsqa14-vm1 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.46.240
Uuid: 0f69be9f-0055-41ba-89e8-34ef4c33b521
State: Peer Rejected (Connected)

Hostname: 10.70.46.243
Uuid: cd48cc5a-2f4c-4d53-847e-c67c2f7aefd9
State: Peer Rejected (Connected)

Hostname: 10.70.46.236
Uuid: 76dd61a5-e5e0-4a93-8f1d-8d5de71fca14
State: Peer Rejected (Connected)
[root@rhsqa14-vm1 ~]#

Expected results:


Additional info:

i have uploaded the sosreport here:

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/007/


log messages:

[root@rhsqa14-vm1 ~]# less /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
[2015-04-20 05:03:08.285701] I [glusterd-handler.c:1317:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-04-20 05:03:08.289612] I [glusterd-handler.c:1317:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-04-20 05:03:08.292597] I [glusterd-handler.c:1317:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-04-20 05:03:08.295527] I [glusterd-handler.c:1317:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-04-20 05:49:21.790376] E [glusterd-snapshot.c:5331:glusterd_snapshot_status_prevalidate] 0-management: Snapshot (mix_snap) does not exist
[2015-04-20 05:49:21.790678] W [glusterd-snapshot.c:7769:glusterd_snapshot_prevalidate] 0-management: Snapshot status validation failed
[2015-04-20 05:49:21.790720] W [glusterd-mgmt.c:155:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed
[2015-04-20 05:49:21.790741] E [glusterd-mgmt.c:691:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Snapshot on local node
[2015-04-20 05:49:21.790761] E [glusterd-mgmt.c:1945:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed
[2015-04-20 05:49:51.841947] E [glusterd-snapshot.c:3386:glusterd_handle_snapshot_info] 0-management: Snapshot (mix_snap) does not exist
[2015-04-20 05:49:51.841978] W [glusterd-snapshot.c:8336:glusterd_handle_snapshot_fn] 0-management: Snapshot info failed
[2015-04-20 05:52:56.699396] E [glusterd-snapshot.c:3386:glusterd_handle_snapshot_info] 0-management: Snapshot (mix) does not exist
[2015-04-20 05:52:56.699447] W [glusterd-snapshot.c:8336:glusterd_handle_snapshot_fn] 0-management: Snapshot info failed
[2015-04-20 05:55:06.886334] E [glusterd-snapshot.c:3386:glusterd_handle_snapshot_info] 0-management: Snapshot (mix_snap) does not exist
[2015-04-20 05:55:06.886385] W [glusterd-snapshot.c:8336:glusterd_handle_snapshot_fn] 0-management: Snapshot info failed
[2015-04-20 06:12:10.157110] W [glusterd-snapshot-utils.c:312:glusterd_snap_volinfo_find] 0-management: Snap volume b8482da332324836b016225ae8c2e669.10.70.46.233.var-run-gluster-snaps-b8482da332324836b016225ae8c2e669-brick2-mix not found
[2015-04-20 06:12:10.190691] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /var/run/gluster/snaps/b8482da332324836b016225ae8c2e669/brick2/mix on port 49163
[2015-04-20 06:12:10.192862] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2015-04-20 06:12:10.217393] W [glusterd-snapshot-utils.c:312:glusterd_snap_volinfo_find] 0-management: Snap volume b8482da332324836b016225ae8c2e669.10.70.46.233.var-run-gluster-snaps-b8482da332324836b016225ae8c2e669-brick3-mix not found
[2015-04-20 06:12:10.251479] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /var/run/gluster/snaps/b8482da332324836b016225ae8c2e669/brick3/mix on port 49164
[2015-04-20 06:12:10.253502] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2015-04-20 06:12:10.273797] W [glusterd-snapshot-utils.c:312:glusterd_snap_volinfo_find] 0-management: Snap volume b8482da332324836b016225ae8c2e669.10.70.46.233.var-run-gluster-snaps-b8482da332324836b016225ae8c2e669-brick5-mix not found
[2015-04-20 06:12:10.298518] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /var/run/gluster/snaps/b8482da332324836b016225ae8c2e669/brick5/mix on port 49165
[2015-04-20 06:12:10.300905] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2015-04-20 06:21:54.350244] I [glusterd-handler.c:1317:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-04-20 06:28:10.400329] W [socket.c:3059:socket_connect] 0-snapd: Ignore failed connection attempt on , (No such file or directory) 
[2015-04-20 06:28:11.887406] I [glusterd-utils.c:3981:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully
[2015-04-20 06:28:11.888200] I [glusterd-utils.c:3986:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully
...skipping...
)[0x7f4aa1adcfc0] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x356f410143] ))))) 0-management: Lock for vol glusterfs_shared_storage not held
[2015-04-24 05:07:26.977951] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x356f022140] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x481)[0x7f4aa1b78fa1] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2a0)[0x7f4aa1af48c0] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f4aa1adcfc0] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x356f410143] ))))) 0-management: Lock for vol mix not held
[2015-04-24 05:07:26.978375] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x356f022140] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x481)[0x7f4aa1b78fa1] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2a0)[0x7f4aa1af48c0] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f4aa1adcfc0] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x356f410143] ))))) 0-management: Lock for vol test not held
[2015-04-24 05:07:26.978788] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x356f022140] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x481)[0x7f4aa1b78fa1] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2a0)[0x7f4aa1af48c0] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f4aa1adcfc0] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x356f410143] ))))) 0-management: Lock for vol testing not held
[2015-04-24 05:07:26.979191] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x356f022140] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x481)[0x7f4aa1b78fa1] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2a0)[0x7f4aa1af48c0] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f4aa1adcfc0] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x356f410143] ))))) 0-management: Lock for vol tri not held
[2015-04-24 05:07:34.023839] I [glusterd-rpc-ops.c:463:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: cd48cc5a-2f4c-4d53-847e-c67c2f7aefd9, host: 10.70.46.243, port: 0
[2015-04-24 05:07:34.409049] I [glusterd-handshake.c:1151:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30700
[2015-04-24 05:07:36.182749] I [glusterd-handshake.c:1151:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30700
[2015-04-24 05:07:53.414628] I [glusterd-handler.c:2337:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: cd48cc5a-2f4c-4d53-847e-c67c2f7aefd9
[2015-04-24 05:07:53.445409] E [MSGID: 106010] [glusterd-utils.c:2608:glusterd_compare_friend_volume] 0-management: Version of Cksums everglades differ. local cksum = 2068641408, remote cksum = 1397023877 on peer 10.70.46.243
[2015-04-24 05:07:53.445844] I [glusterd-handler.c:3491:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.46.243 (0), ret: 0
[2015-04-24 05:08:14.940876] I [glusterd-rpc-ops.c:463:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: 76dd61a5-e5e0-4a93-8f1d-8d5de71fca14, host: 10.70.46.236, port: 0
[2015-04-24 05:08:15.261211] I [glusterd-handler.c:2337:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 76dd61a5-e5e0-4a93-8f1d-8d5de71fca14
[2015-04-24 05:08:15.291703] E [MSGID: 106010] [glusterd-utils.c:2608:glusterd_compare_friend_volume] 0-management: Version of Cksums everglades differ. local cksum = 2068641408, remote cksum = 1397023877 on peer 10.70.46.236
[2015-04-24 05:08:15.292301] I [glusterd-handler.c:3491:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.46.236 (0), ret: 0
[2015-04-24 05:11:12.917953] I [glusterd-handler.c:1262:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
You have new mail in /var/spool/mail/root
[root@rhsqa14-vm1 ~]#

Comment 1 Anand Avati 2015-05-12 13:26:16 UTC
REVIEW: http://review.gluster.org/10761 (glusterd: function to create duplicate should copy subvol_count) posted (#1) for review on master by mohammed rafi  kc (rkavunga)

Comment 2 Anand Avati 2015-05-12 14:52:00 UTC
REVIEW: http://review.gluster.org/10761 (glusterd: function to create duplicate of volinfo should copy subvol_count) posted (#2) for review on master by mohammed rafi  kc (rkavunga)

Comment 3 Anand Avati 2015-05-28 05:38:52 UTC
REVIEW: http://review.gluster.org/10761 (glusterd: function to create duplicate of volinfo should copy subvol_count) posted (#3) for review on master by mohammed rafi  kc (rkavunga)

Comment 4 Nagaprasad Sathyanarayana 2015-10-25 15:17:03 UTC
Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.

Comment 5 Niels de Vos 2016-06-16 12:54:19 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.