Bug 1225338 - [geo-rep]: snapshot creation timesout even if geo-replication is in pause/stop/delete state
Summary: [geo-rep]: snapshot creation timesout even if geo-replication is in pause/sto...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: RHGS 3.1.0
Assignee: Kotresh HR
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On: 1225543
Blocks: 1202842 1223636
TreeView+ depends on / blocked
 
Reported: 2015-05-27 07:24 UTC by Rahul Hinduja
Modified: 2015-07-29 04:53 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.7.1-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1225542 (view as bug list)
Environment:
Last Closed: 2015-07-29 04:53:16 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1495 0 normal SHIPPED_LIVE Important: Red Hat Gluster Storage 3.1 update 2015-07-29 08:26:26 UTC

Description Rahul Hinduja 2015-05-27 07:24:42 UTC
Description of problem:
=======================

From use case point of view: Created geo-rep session. Paused it and tried to create a snapshot. Snapshot hungs and timesout after 2 min of cli/barrier timeout. 

Problem is with the changelog/changelog on. Tried the following on the cleaned up system.

1. Create a volume
2. Set the changelog.changelog to on
3. create a snapshot, it times out as

[root@georep1 scripts]# gluster snapshot create snapa master
Error : Request timed out
Snapshot command failed
[root@georep1 scripts]# 

Brick Log snippet:
===================

[2015-05-26 17:34:59.595211] I [changelog.c:2043:notify] 0-master-changelog: Barrier on notification
[2015-05-26 17:34:59.595394] I [changelog-helpers.c:838:changelog_snap_logging_start] 0-master-changelog: Now starting to log in call path
[2015-05-26 17:34:59.595410] E [changelog.c:2064:notify] 0-master-changelog: Received another barrier on notification when last one is not served yet
[2015-05-26 17:34:59.595434] I [socket.c:3432:socket_submit_reply] 0-socket.glusterfsd: not connected (priv->connected = -1)
[2015-05-26 17:34:59.595464] E [rpcsvc.c:1312:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: Gluster Brick operations, ProgVers: 2, Proc: 10) to rpc-transport (socket.glusterfsd)
[2015-05-26 17:34:59.595480] E [glusterfsd-mgmt.c:149:glusterfs_submit_reply] 0-glusterfs: Reply submission failed
[2015-05-26 17:34:59.595501] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2015-05-26 17:34:59.596373] E [socket.c:3421:socket_submit_reply] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x31c5624fb0] (--> /usr/lib64/glusterfs/3.7.0/rpc-transport/socket.so(+0x6f2f)[0x7fa8cdec7f2f] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_transport_submit+0x76)[0x31c5a089a6] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_submit_generic+0x1c8)[0x31c5a091f8] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_error_reply+0x66)[0x31c5a09726] ))))) 0-socket: invalid argument: this->private
[2015-05-26 17:34:59.596394] E [rpcsvc.c:1312:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: Gluster Brick operations, ProgVers: 2, Proc: 10) to rpc-transport (socket.glusterfsd)
[2015-05-26 17:34:59.596568] C [mem-pool.c:560:mem_put] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x31c5624fb0] (--> /usr/lib64/libglusterfs.so.0(mem_put+0x105)[0x31c5655895] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_submit_generic+0x256)[0x31c5a09286] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_error_reply+0x66)[0x31c5a09726] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_check_and_reply_error+0x6b)[0x31c5a0979b] ))))) 0-mem-pool: mem_put called on freed ptr 0x6d2d84 of mem pool 0x6d1610
[2015-05-26 17:34:59.597962] W [rpcsvc.c:571:rpcsvc_check_and_reply_error] 0-rpcsvc: failed to queue error reply
[2015-05-26 17:34:59.598024] E [barrier.c:522:notify] 0-master-barrier: Already enabled
[2015-05-26 17:34:59.598381] I [changelog.c:1989:notify] 0-master-changelog: Barrier off notification
[2015-05-26 17:34:59.598688] I [changelog-helpers.c:860:changelog_snap_logging_stop] 0-master-changelog: Stopped to log in call path
[2015-05-26 17:34:59.598713] E [changelog.c:2030:notify] 0-master-changelog: Changelog barrier already disabled
(END) 



Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.0-2.el6rhs.x86_64

How reproducible:
=================

always


Steps to Reproduce:

Way1:
=====
1. Create master and slave volume
2. Create geo-replication between them
3. Start and Pause the geo-rep session
4. Try to create the snapshot. It fails

Way2:
=====
1. Create a volume
2. Set the volume option changelog.changelog on
3. Try to create the snapshot. It fails

Actual results:
===============

Snapshot creation fails with timeout


Expected results:
=================

Snapshot creation should succeed


Additional info:
================

snapshot creation when geo-replication is paused used to work with the upstream 3.7 beta1 build. Hence its a recent regression.

Comment 3 Kotresh HR 2015-06-02 09:16:44 UTC
Upstream (master):
http://review.gluster.org/10951

Upstream (3.7):
http://review.gluster.org/10988

Downstream:
https://code.engineering.redhat.com/gerrit/#/c/49689/

Comment 6 Rahul Hinduja 2015-06-18 08:44:31 UTC
Verified with build:  glusterfs-3.7.1-3.el6rhs.x86_64


[root@georep1 ~]# gluster volume geo-replication master 10.70.46.154::slave stop
Stopping geo-replication session between master & 10.70.46.154::slave has been successful
[root@georep1 ~]# gluster volume geo-replication master 10.70.46.154::slave delete
Deleting geo-replication session between master & 10.70.46.154::slave has been successful
[root@georep1 ~]# gluster snapshot list
snap1_GMT-2015.06.18-13.54.17
snap2_GMT-2015.06.18-14.00.01
[root@georep1 ~]# gluster volume info master | grep change
changelog.changelog: on
[root@georep1 ~]# time gluster snapshot create snapr master
snapshot create: success: Snap snapr_GMT-2015.06.18-14.04.29 created successfully

real	0m13.162s
user	0m0.106s
sys	0m0.026s
[root@georep1 ~]# gluster snapshot list
snap1_GMT-2015.06.18-13.54.17
snap2_GMT-2015.06.18-14.00.01
snapr_GMT-2015.06.18-14.04.29
[root@georep1 ~]# gluster snapshot activate snapr_GMT-2015.06.18-14.04.29
Snapshot activate: snapr_GMT-2015.06.18-14.04.29: Snap activated successfully
[root@georep1 ~]#
[root@georep1 ~]# gluster snapshot info  snapr_GMT-2015.06.18-14.04.29 | grep -i "status"
	Status                    : Started
[root@georep1 ~]# 

Moving the bug to verified state

Comment 7 errata-xmlrpc 2015-07-29 04:53:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html


Note You need to log in before you can comment on or make changes to this bug.