Bug 1105102

Summary:	CTDB:Adding volume set option in hook script causes delay in glusterd operations.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	surabhi <sbhaloth>
Component:	samba	Assignee:	Raghavendra Talur <rtalur>
Status:	CLOSED ERRATA	QA Contact:	surabhi <sbhaloth>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	rhgs-3.0	CC:	asrivast, ira, nlevinki, pgurusid, ssamanta, vagarwal
Target Milestone:	---
Target Release:	RHGS 3.0.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.6.0.17-1.el6rhs	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1105118 (view as bug list)		Environment:
Last Closed:	2014-09-22 19:40:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1092242, 1294224
Bug Blocks:	1105118

Description surabhi 2014-06-05 11:48:06 UTC

Description of problem:
In a ctdb setup when we start a volume the hook scripts are supposed to mount the ctdb vol on /gluster/lock and add the entry to fstab. 

With the changes in hook script as follows  where we are adding volume set option for ping timeout for ctdb volume , and then do the mount and add entry in fstab ,it is causing delay in mount and showing inconsistent behaviour.
script location:
/var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh

function add_ping_timeout () {
    volname=$1
    value=$2
    gluster volume set $volname network.ping-timeout $value
}

sleep 5
    # Make sure ping-timeout is not default for CTDB volume
    add_ping_timeout $VOL $PING_TIMEOUT_SECS;
    mount -t glusterfs `hostname`:$VOL "$CTDB_MNT" && \
        add_fstab_entry $VOL $CTDB_MNT

As in ctdb customer scenario we run these hook scripts on multiple nodes(scaling)together and running these gluster volume set option causes delay in gluster operations and we get following error.

ctdb: failed: Another transaction is in progress. Please try again after sometime.

*****************
If we need ping timeout value to be set for ctdb volume it would be better to add it manually after we create the ctdb volume instead of handling it in hook script.

Version-Release number of selected component (if applicable):
glusterfs-geo-replication-3.6.0.11-1.el6rhs.x86_64
glusterfs-api-3.6.0.11-1.el6rhs.x86_64
glusterfs-server-3.6.0.11-1.el6rhs.x86_64
glusterfs-debuginfo-3.6.0.11-1.el6rhs.x86_64
glusterfs-libs-3.6.0.11-1.el6rhs.x86_64
glusterfs-fuse-3.6.0.11-1.el6rhs.x86_64
samba-glusterfs-3.6.9-168.2.el6rhs.x86_64
glusterfs-devel-3.6.0.11-1.el6rhs.x86_64
glusterfs-rdma-3.6.0.11-1.el6rhs.x86_64
glusterfs-api-devel-3.6.0.11-1.el6rhs.x86_64
glusterfs-cli-3.6.0.11-1.el6rhs.x86_64
glusterfs-3.6.0.11-1.el6rhs.x86_64


How reproducible:
Always

Steps to Reproduce:
1.Create a ctdb volume
2.start the volume


Actual results:
ctdb volume starts but the mount is taking long time and any gluster operation
gets delayed and give following error:

ctdb: failed: Another transaction is in progress. Please try again after sometime.

Expected results:
On all the nodes the mount of ctdb volume happens immediately the volume is started and none of the glusterd operations should be delayed.


Additional info:
We can add the ping time out option manually for ctdb volume or should handle it out of hook scripts because it may cause issues in multi-node environment.

Comment 3 Raghavendra Talur 2014-06-12 10:55:50 UTC

Patch posted at https://code.engineering.redhat.com/gerrit/26746.

Comment 4 surabhi 2014-06-17 09:38:02 UTC

With the fix in glusterfs-server-3.6.0.17-1.el6rhs.x86_64 where ping-time-out is part of mount option,IP:/ctdb_vol /gluster/lock glusterfs _netdev,defaults,transport=tcp,xlator-option=*client*.ping-timeout=10 0 0
I tried creating a new ctdb volume and started it.There is no delay seen in gluster operations and the volume is mounted on /gluster/lock on all the nodes.
Marking the BZ Verified.

Comment 6 errata-xmlrpc 2014-09-22 19:40:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html