Description of problem: ======================= From use case point of view: Created geo-rep session. Paused it and tried to create a snapshot. Snapshot hungs and timesout after 2 min of cli/barrier timeout. Problem is with the changelog/changelog on. Tried the following on the cleaned up system. 1. Create a volume 2. Set the changelog.changelog to on 3. create a snapshot, it times out as [root@georep1 scripts]# gluster snapshot create snapa master Error : Request timed out Snapshot command failed [root@georep1 scripts]# Brick Log snippet: =================== [2015-05-26 17:34:59.595211] I [changelog.c:2043:notify] 0-master-changelog: Barrier on notification [2015-05-26 17:34:59.595394] I [changelog-helpers.c:838:changelog_snap_logging_start] 0-master-changelog: Now starting to log in call path [2015-05-26 17:34:59.595410] E [changelog.c:2064:notify] 0-master-changelog: Received another barrier on notification when last one is not served yet [2015-05-26 17:34:59.595434] I [socket.c:3432:socket_submit_reply] 0-socket.glusterfsd: not connected (priv->connected = -1) [2015-05-26 17:34:59.595464] E [rpcsvc.c:1312:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: Gluster Brick operations, ProgVers: 2, Proc: 10) to rpc-transport (socket.glusterfsd) [2015-05-26 17:34:59.595480] E [glusterfsd-mgmt.c:149:glusterfs_submit_reply] 0-glusterfs: Reply submission failed [2015-05-26 17:34:59.595501] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2015-05-26 17:34:59.596373] E [socket.c:3421:socket_submit_reply] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x31c5624fb0] (--> /usr/lib64/glusterfs/3.7.0/rpc-transport/socket.so(+0x6f2f)[0x7fa8cdec7f2f] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_transport_submit+0x76)[0x31c5a089a6] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_submit_generic+0x1c8)[0x31c5a091f8] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_error_reply+0x66)[0x31c5a09726] ))))) 0-socket: invalid argument: this->private [2015-05-26 17:34:59.596394] E [rpcsvc.c:1312:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: Gluster Brick operations, ProgVers: 2, Proc: 10) to rpc-transport (socket.glusterfsd) [2015-05-26 17:34:59.596568] C [mem-pool.c:560:mem_put] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x31c5624fb0] (--> /usr/lib64/libglusterfs.so.0(mem_put+0x105)[0x31c5655895] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_submit_generic+0x256)[0x31c5a09286] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_error_reply+0x66)[0x31c5a09726] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_check_and_reply_error+0x6b)[0x31c5a0979b] ))))) 0-mem-pool: mem_put called on freed ptr 0x6d2d84 of mem pool 0x6d1610 [2015-05-26 17:34:59.597962] W [rpcsvc.c:571:rpcsvc_check_and_reply_error] 0-rpcsvc: failed to queue error reply [2015-05-26 17:34:59.598024] E [barrier.c:522:notify] 0-master-barrier: Already enabled [2015-05-26 17:34:59.598381] I [changelog.c:1989:notify] 0-master-changelog: Barrier off notification [2015-05-26 17:34:59.598688] I [changelog-helpers.c:860:changelog_snap_logging_stop] 0-master-changelog: Stopped to log in call path [2015-05-26 17:34:59.598713] E [changelog.c:2030:notify] 0-master-changelog: Changelog barrier already disabled (END) Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.0-2.el6rhs.x86_64 How reproducible: ================= always Steps to Reproduce: Way1: ===== 1. Create master and slave volume 2. Create geo-replication between them 3. Start and Pause the geo-rep session 4. Try to create the snapshot. It fails Way2: ===== 1. Create a volume 2. Set the volume option changelog.changelog on 3. Try to create the snapshot. It fails Actual results: =============== Snapshot creation fails with timeout Expected results: ================= Snapshot creation should succeed Additional info: ================ snapshot creation when geo-replication is paused used to work with the upstream 3.7 beta1 build. Hence its a recent regression.
Upstream (master): http://review.gluster.org/10951 Upstream (3.7): http://review.gluster.org/10988 Downstream: https://code.engineering.redhat.com/gerrit/#/c/49689/
Verified with build: glusterfs-3.7.1-3.el6rhs.x86_64 [root@georep1 ~]# gluster volume geo-replication master 10.70.46.154::slave stop Stopping geo-replication session between master & 10.70.46.154::slave has been successful [root@georep1 ~]# gluster volume geo-replication master 10.70.46.154::slave delete Deleting geo-replication session between master & 10.70.46.154::slave has been successful [root@georep1 ~]# gluster snapshot list snap1_GMT-2015.06.18-13.54.17 snap2_GMT-2015.06.18-14.00.01 [root@georep1 ~]# gluster volume info master | grep change changelog.changelog: on [root@georep1 ~]# time gluster snapshot create snapr master snapshot create: success: Snap snapr_GMT-2015.06.18-14.04.29 created successfully real 0m13.162s user 0m0.106s sys 0m0.026s [root@georep1 ~]# gluster snapshot list snap1_GMT-2015.06.18-13.54.17 snap2_GMT-2015.06.18-14.00.01 snapr_GMT-2015.06.18-14.04.29 [root@georep1 ~]# gluster snapshot activate snapr_GMT-2015.06.18-14.04.29 Snapshot activate: snapr_GMT-2015.06.18-14.04.29: Snap activated successfully [root@georep1 ~]# [root@georep1 ~]# gluster snapshot info snapr_GMT-2015.06.18-14.04.29 | grep -i "status" Status : Started [root@georep1 ~]# Moving the bug to verified state
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html