Bug 1226032 - glusterd crashed on the node when tried to detach a tier after restoring data from the snapshot.
Summary: glusterd crashed on the node when tried to detach a tier after restoring data...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: snapshot
Version: 3.7.0
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: Mohammed Rafi KC
QA Contact:
URL:
Whiteboard:
Depends On: 1215002
Blocks: qe_tracker_everglades 1223207 1224147
TreeView+ depends on / blocked
 
Reported: 2015-05-28 19:20 UTC by Mohammed Rafi KC
Modified: 2015-06-02 08:24 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.7.1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1215002
Environment:
Last Closed: 2015-06-02 08:03:53 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Mohammed Rafi KC 2015-05-28 19:20:30 UTC
+++ This bug was initially created as a clone of Bug #1215002 +++

Description of problem:
i saw glusterd crash on the node when i tried to detach a tier after restoring data from the snapshot.

Version-Release number of selected component (if applicable):

[root@rhsqa14-vm1 ~]# rpm -qa| grep gluster
glusterfs-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-devel-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-geo-replication-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-resource-agents-3.7dev-0.952.gita7f1d08.el6.noarch
glusterfs-debuginfo-3.7dev-0.952.gita7f1d08.el6.x86_64
glusterfs-libs-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-api-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-fuse-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-extra-xlators-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-regression-tests-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-rdma-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-cli-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-server-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-api-devel-3.7dev-0.994.gitf522001.el6.x86_64
[root@rhsqa14-vm1 ~]# 

[root@rhsqa14-vm1 ~]# glusterfs --version
glusterfs 3.7dev built on Apr 13 2015 07:14:26
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[root@rhsqa14-vm1 ~]# 


Steps to Reproduce:
1. Create a normal distrep volume
2. Attach-tier to it.
3. fuse mount it and add some data.
4. Create snapshot and activate it.
5. access the snapshot on mount point and check everything is fine.
6. delete few files/dirs from mount point
7. stop the volume and restore the snapshot.
8. start the volume and check the data is available
9. detach the tier.

Actual results:

Glusterd crashed.
10. restored data available on the mount point
11. detached the tier causes snapshot removed which was created with tier.
12. on the gluster nodes peer status show peer rejected state.
13. no operations on the volumes can be done

[root@rhsqa14-vm1 ~]# gluster v create Mint replica 2 10.70.46.233:/rhs/brick3/M1 10.70.46.236:/rhs/brick3/M1 10.70.46.233:/rhs/brick4/M1 10.70.46.236:/rhs/brick4/M1 force
volume create: Mint: failed: Host 10.70.46.236 is not in 'Peer in Cluster' state
[root@rhsqa14-vm1 ~]#


[root@rhsqa14-vm1 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.46.240
Uuid: 0f69be9f-0055-41ba-89e8-34ef4c33b521
State: Peer Rejected (Connected)

Hostname: 10.70.46.243
Uuid: cd48cc5a-2f4c-4d53-847e-c67c2f7aefd9
State: Peer Rejected (Connected)

Hostname: 10.70.46.236
Uuid: 76dd61a5-e5e0-4a93-8f1d-8d5de71fca14
State: Peer Rejected (Connected)
[root@rhsqa14-vm1 ~]#

Expected results:


Additional info:

i have uploaded the sosreport here:

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/007/


log messages:

[root@rhsqa14-vm1 ~]# less /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
[2015-04-20 05:03:08.285701] I [glusterd-handler.c:1317:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-04-20 05:03:08.289612] I [glusterd-handler.c:1317:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-04-20 05:03:08.292597] I [glusterd-handler.c:1317:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-04-20 05:03:08.295527] I [glusterd-handler.c:1317:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-04-20 05:49:21.790376] E [glusterd-snapshot.c:5331:glusterd_snapshot_status_prevalidate] 0-management: Snapshot (mix_snap) does not exist
[2015-04-20 05:49:21.790678] W [glusterd-snapshot.c:7769:glusterd_snapshot_prevalidate] 0-management: Snapshot status validation failed
[2015-04-20 05:49:21.790720] W [glusterd-mgmt.c:155:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed
[2015-04-20 05:49:21.790741] E [glusterd-mgmt.c:691:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Snapshot on local node
[2015-04-20 05:49:21.790761] E [glusterd-mgmt.c:1945:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed
[2015-04-20 05:49:51.841947] E [glusterd-snapshot.c:3386:glusterd_handle_snapshot_info] 0-management: Snapshot (mix_snap) does not exist
[2015-04-20 05:49:51.841978] W [glusterd-snapshot.c:8336:glusterd_handle_snapshot_fn] 0-management: Snapshot info failed
[2015-04-20 05:52:56.699396] E [glusterd-snapshot.c:3386:glusterd_handle_snapshot_info] 0-management: Snapshot (mix) does not exist
[2015-04-20 05:52:56.699447] W [glusterd-snapshot.c:8336:glusterd_handle_snapshot_fn] 0-management: Snapshot info failed
[2015-04-20 05:55:06.886334] E [glusterd-snapshot.c:3386:glusterd_handle_snapshot_info] 0-management: Snapshot (mix_snap) does not exist
[2015-04-20 05:55:06.886385] W [glusterd-snapshot.c:8336:glusterd_handle_snapshot_fn] 0-management: Snapshot info failed
[2015-04-20 06:12:10.157110] W [glusterd-snapshot-utils.c:312:glusterd_snap_volinfo_find] 0-management: Snap volume b8482da332324836b016225ae8c2e669.10.70.46.233.var-run-gluster-snaps-b8482da332324836b016225ae8c2e669-brick2-mix not found
[2015-04-20 06:12:10.190691] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /var/run/gluster/snaps/b8482da332324836b016225ae8c2e669/brick2/mix on port 49163
[2015-04-20 06:12:10.192862] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2015-04-20 06:12:10.217393] W [glusterd-snapshot-utils.c:312:glusterd_snap_volinfo_find] 0-management: Snap volume b8482da332324836b016225ae8c2e669.10.70.46.233.var-run-gluster-snaps-b8482da332324836b016225ae8c2e669-brick3-mix not found
[2015-04-20 06:12:10.251479] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /var/run/gluster/snaps/b8482da332324836b016225ae8c2e669/brick3/mix on port 49164
[2015-04-20 06:12:10.253502] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2015-04-20 06:12:10.273797] W [glusterd-snapshot-utils.c:312:glusterd_snap_volinfo_find] 0-management: Snap volume b8482da332324836b016225ae8c2e669.10.70.46.233.var-run-gluster-snaps-b8482da332324836b016225ae8c2e669-brick5-mix not found
[2015-04-20 06:12:10.298518] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /var/run/gluster/snaps/b8482da332324836b016225ae8c2e669/brick5/mix on port 49165
[2015-04-20 06:12:10.300905] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2015-04-20 06:21:54.350244] I [glusterd-handler.c:1317:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-04-20 06:28:10.400329] W [socket.c:3059:socket_connect] 0-snapd: Ignore failed connection attempt on , (No such file or directory) 
[2015-04-20 06:28:11.887406] I [glusterd-utils.c:3981:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully
[2015-04-20 06:28:11.888200] I [glusterd-utils.c:3986:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully
...skipping...
)[0x7f4aa1adcfc0] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x356f410143] ))))) 0-management: Lock for vol glusterfs_shared_storage not held
[2015-04-24 05:07:26.977951] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x356f022140] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x481)[0x7f4aa1b78fa1] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2a0)[0x7f4aa1af48c0] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f4aa1adcfc0] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x356f410143] ))))) 0-management: Lock for vol mix not held
[2015-04-24 05:07:26.978375] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x356f022140] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x481)[0x7f4aa1b78fa1] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2a0)[0x7f4aa1af48c0] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f4aa1adcfc0] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x356f410143] ))))) 0-management: Lock for vol test not held
[2015-04-24 05:07:26.978788] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x356f022140] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x481)[0x7f4aa1b78fa1] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2a0)[0x7f4aa1af48c0] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f4aa1adcfc0] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x356f410143] ))))) 0-management: Lock for vol testing not held
[2015-04-24 05:07:26.979191] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x356f022140] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x481)[0x7f4aa1b78fa1] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2a0)[0x7f4aa1af48c0] (--> /usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f4aa1adcfc0] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x356f410143] ))))) 0-management: Lock for vol tri not held
[2015-04-24 05:07:34.023839] I [glusterd-rpc-ops.c:463:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: cd48cc5a-2f4c-4d53-847e-c67c2f7aefd9, host: 10.70.46.243, port: 0
[2015-04-24 05:07:34.409049] I [glusterd-handshake.c:1151:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30700
[2015-04-24 05:07:36.182749] I [glusterd-handshake.c:1151:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30700
[2015-04-24 05:07:53.414628] I [glusterd-handler.c:2337:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: cd48cc5a-2f4c-4d53-847e-c67c2f7aefd9
[2015-04-24 05:07:53.445409] E [MSGID: 106010] [glusterd-utils.c:2608:glusterd_compare_friend_volume] 0-management: Version of Cksums everglades differ. local cksum = 2068641408, remote cksum = 1397023877 on peer 10.70.46.243
[2015-04-24 05:07:53.445844] I [glusterd-handler.c:3491:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.46.243 (0), ret: 0
[2015-04-24 05:08:14.940876] I [glusterd-rpc-ops.c:463:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: 76dd61a5-e5e0-4a93-8f1d-8d5de71fca14, host: 10.70.46.236, port: 0
[2015-04-24 05:08:15.261211] I [glusterd-handler.c:2337:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 76dd61a5-e5e0-4a93-8f1d-8d5de71fca14
[2015-04-24 05:08:15.291703] E [MSGID: 106010] [glusterd-utils.c:2608:glusterd_compare_friend_volume] 0-management: Version of Cksums everglades differ. local cksum = 2068641408, remote cksum = 1397023877 on peer 10.70.46.236
[2015-04-24 05:08:15.292301] I [glusterd-handler.c:3491:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.46.236 (0), ret: 0
[2015-04-24 05:11:12.917953] I [glusterd-handler.c:1262:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
You have new mail in /var/spool/mail/root
[root@rhsqa14-vm1 ~]#

--- Additional comment from Anand Avati on 2015-05-12 09:26:16 EDT ---

REVIEW: http://review.gluster.org/10761 (glusterd: function to create duplicate should copy subvol_count) posted (#1) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Anand Avati on 2015-05-12 10:52:00 EDT ---

REVIEW: http://review.gluster.org/10761 (glusterd: function to create duplicate of volinfo should copy subvol_count) posted (#2) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Anand Avati on 2015-05-28 01:38:52 EDT ---

REVIEW: http://review.gluster.org/10761 (glusterd: function to create duplicate of volinfo should copy subvol_count) posted (#3) for review on master by mohammed rafi  kc (rkavunga)

Comment 1 Anand Avati 2015-05-28 19:23:42 UTC
REVIEW: http://review.gluster.org/10982 (glusterd: function to create duplicate of volinfo should copy subvol_count) posted (#1) for review on release-3.7 by mohammed rafi  kc (rkavunga)

Comment 2 Anand Avati 2015-06-01 04:42:23 UTC
COMMIT: http://review.gluster.org/10982 committed in release-3.7 by Kaushal M (kaushal) 
------
commit 661142d7c934f36236909137f496095eb88a4129
Author: Mohammed Rafi KC <rkavunga>
Date:   Tue May 12 18:49:15 2015 +0530

    glusterd: function to create duplicate of volinfo should copy subvol_count
    
            Back port of http://review.gluster.org/10761
    
    when we create duplicate volfile from a existing volfile,
    we are not copying the variable subvol_count to the new
    volfile.
    
     >Change-Id: I943aa7fdf1a2ca5bf57522cb2402b6b3165501ac
     >BUG: 1215002
     >Signed-off-by: Mohammed Rafi KC <rkavunga>
     >Reviewed-on: http://review.gluster.org/10761
     >Reviewed-by: Atin Mukherjee <amukherj>
     >Tested-by: Gluster Build System <jenkins.com>
     >Tested-by: NetBSD Build System
    
    Change-Id: I3c58018833ad84fba13e1d17755f5dadbb01a5d3
    BUG: 1226032
    Signed-off-by: Mohammed Rafi KC <rkavunga>
    Reviewed-on: http://review.gluster.org/10982
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Joseph Fernandes
    Reviewed-by: Kaushal M <kaushal>

Comment 3 Niels de Vos 2015-06-02 08:03:53 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.1, please reopen this bug report.

glusterfs-3.7.1 has been announced on the Gluster Packaging mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.packaging/1
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.