Bug 1225542 - [geo-rep]: snapshot creation timesout even if geo-replication is in pause/stop/delete state
Summary: [geo-rep]: snapshot creation timesout even if geo-replication is in pause/sto...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1223636 1225543
TreeView+ depends on / blocked
 
Reported: 2015-05-27 16:13 UTC by Aravinda VK
Modified: 2016-06-16 13:05 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.8rc2
Clone Of: 1225338
: 1225543 (view as bug list)
Environment:
Last Closed: 2016-06-16 13:05:25 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Aravinda VK 2015-05-27 16:13:41 UTC
+++ This bug was initially created as a clone of Bug #1225338 +++

Description of problem:
=======================

From use case point of view: Created geo-rep session. Paused it and tried to create a snapshot. Snapshot hungs and timesout after 2 min of cli/barrier timeout. 

Problem is with the changelog/changelog on. Tried the following on the cleaned up system.

1. Create a volume
2. Set the changelog.changelog to on
3. create a snapshot, it times out as

[root@georep1 scripts]# gluster snapshot create snapa master
Error : Request timed out
Snapshot command failed
[root@georep1 scripts]# 

Brick Log snippet:
===================

[2015-05-26 17:34:59.595211] I [changelog.c:2043:notify] 0-master-changelog: Barrier on notification
[2015-05-26 17:34:59.595394] I [changelog-helpers.c:838:changelog_snap_logging_start] 0-master-changelog: Now starting to log in call path
[2015-05-26 17:34:59.595410] E [changelog.c:2064:notify] 0-master-changelog: Received another barrier on notification when last one is not served yet
[2015-05-26 17:34:59.595434] I [socket.c:3432:socket_submit_reply] 0-socket.glusterfsd: not connected (priv->connected = -1)
[2015-05-26 17:34:59.595464] E [rpcsvc.c:1312:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: Gluster Brick operations, ProgVers: 2, Proc: 10) to rpc-transport (socket.glusterfsd)
[2015-05-26 17:34:59.595480] E [glusterfsd-mgmt.c:149:glusterfs_submit_reply] 0-glusterfs: Reply submission failed
[2015-05-26 17:34:59.595501] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2015-05-26 17:34:59.596373] E [socket.c:3421:socket_submit_reply] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x31c5624fb0] (--> /usr/lib64/glusterfs/3.7.0/rpc-transport/socket.so(+0x6f2f)[0x7fa8cdec7f2f] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_transport_submit+0x76)[0x31c5a089a6] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_submit_generic+0x1c8)[0x31c5a091f8] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_error_reply+0x66)[0x31c5a09726] ))))) 0-socket: invalid argument: this->private
[2015-05-26 17:34:59.596394] E [rpcsvc.c:1312:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: Gluster Brick operations, ProgVers: 2, Proc: 10) to rpc-transport (socket.glusterfsd)
[2015-05-26 17:34:59.596568] C [mem-pool.c:560:mem_put] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x31c5624fb0] (--> /usr/lib64/libglusterfs.so.0(mem_put+0x105)[0x31c5655895] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_submit_generic+0x256)[0x31c5a09286] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_error_reply+0x66)[0x31c5a09726] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_check_and_reply_error+0x6b)[0x31c5a0979b] ))))) 0-mem-pool: mem_put called on freed ptr 0x6d2d84 of mem pool 0x6d1610
[2015-05-26 17:34:59.597962] W [rpcsvc.c:571:rpcsvc_check_and_reply_error] 0-rpcsvc: failed to queue error reply
[2015-05-26 17:34:59.598024] E [barrier.c:522:notify] 0-master-barrier: Already enabled
[2015-05-26 17:34:59.598381] I [changelog.c:1989:notify] 0-master-changelog: Barrier off notification
[2015-05-26 17:34:59.598688] I [changelog-helpers.c:860:changelog_snap_logging_stop] 0-master-changelog: Stopped to log in call path
[2015-05-26 17:34:59.598713] E [changelog.c:2030:notify] 0-master-changelog: Changelog barrier already disabled
(END) 



Version-Release number of selected component (if applicable):
=============================================================



How reproducible:
=================

always


Steps to Reproduce:

Way1:
=====
1. Create master and slave volume
2. Create geo-replication between them
3. Start and Pause the geo-rep session
4. Try to create the snapshot. It fails

Way2:
=====
1. Create a volume
2. Set the volume option changelog.changelog on
3. Try to create the snapshot. It fails

Actual results:
===============

Snapshot creation fails with timeout


Expected results:
=================

Snapshot creation should succeed


Additional info:
================

Comment 1 Kotresh HR 2015-05-27 18:26:22 UTC
Upstream Patch Sent:
http://review.gluster.org/#/c/10951/

Comment 2 Anand Avati 2015-05-28 08:25:44 UTC
REVIEW: http://review.gluster.org/10951 (featuress/changelog: On snapshot, notify irrespective of failures) posted (#2) for review on master by Kotresh HR (khiremat)

Comment 3 Anand Avati 2015-05-28 08:29:07 UTC
REVIEW: http://review.gluster.org/10951 (features/changelog: On snapshot, notify irrespective of failures) posted (#3) for review on master by Kotresh HR (khiremat)

Comment 4 Anand Avati 2015-05-29 06:26:39 UTC
REVIEW: http://review.gluster.org/10951 (featuress/changelog: On snapshot, notify irrespective of failures) posted (#4) for review on master by Kotresh HR (khiremat)

Comment 5 Anand Avati 2015-05-31 14:48:34 UTC
COMMIT: http://review.gluster.org/10951 committed in master by Venky Shankar (vshankar) 
------
commit d76e9b83454786e6845d0cad3c2c0695815fae1b
Author: Kotresh HR <khiremat>
Date:   Wed May 27 16:27:25 2015 +0530

    featuress/changelog: On snapshot, notify irrespective of failures
    
    During snapshot, changelog barrier is enabled and a
    explicit rollover of changelog is initiated. During
    rollover of changelog, if any error or changelog is
    empty, the notification was not sent to reconfigure
    and hence snapshot was failing because of timeout.
    This patch addresses it by sending notification
    irrespective of failures and sends error if any
    back to barrier.
    
    Change-Id: I898af624b44555281a9e43c69066077e0e121c17
    BUG: 1225542
    Signed-off-by: Kotresh HR <khiremat>
    Reviewed-on: http://review.gluster.org/10951
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Aravinda VK <avishwan>
    Reviewed-by: Venky Shankar <vshankar>

Comment 6 Niels de Vos 2016-06-16 13:05:25 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.