1230635 – Snapshot daemon failed to run on newly created dist-rep volume with uss enabled

Bug 1230635 - Snapshot daemon failed to run on newly created dist-rep volume with uss enabled

Summary: Snapshot daemon failed to run on newly created dist-rep volume with uss enabled

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	snapshot
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.1.0
Assignee:	Avra Sengupta
QA Contact:	Anil Shah
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1202842 1231197 1232883
TreeView+	depends on / blocked

Reported:	2015-06-11 09:16 UTC by Triveni Rao
Modified:	2016-09-17 13:00 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.7.1-4
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1231197 (view as bug list)
Environment:
Last Closed:	2015-07-29 05:02:11 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1495	0	normal	SHIPPED_LIVE	Important: Red Hat Gluster Storage 3.1 update	2015-07-29 08:26:26 UTC

Description Triveni Rao 2015-06-11 09:16:55 UTC

Description of problem:

Snapshot daemon failed to run on newly created dist-rep volume with uss enabled.
uss enable successful, volume status showed that snapshot daemon was not running.

Note: i am using a work around that restarting glusterd on the nodes where snapd failed will fix this issue.


Version-Release number of selected component (if applicable):

[root@rhsqa14-vm1 ~]# glusterfs --version
glusterfs 3.7.1 built on Jun  9 2015 02:31:54
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[root@rhsqa14-vm1 ~]# rpm -qa | grep gluster
glusterfs-3.7.1-1.el6rhs.x86_64
glusterfs-cli-3.7.1-1.el6rhs.x86_64
glusterfs-libs-3.7.1-1.el6rhs.x86_64
glusterfs-client-xlators-3.7.1-1.el6rhs.x86_64
glusterfs-fuse-3.7.1-1.el6rhs.x86_64
glusterfs-server-3.7.1-1.el6rhs.x86_64
glusterfs-api-3.7.1-1.el6rhs.x86_64
[root@rhsqa14-vm1 ~]# 


How reproducible:
easily.

Steps to Reproduce:
1. Create a dist-rep volume on 2 nodes.
2. enable uss, gluster v set tier_test features.uss enable
3. check gluster v status tier_test.


Additional info:

[root@rhsqa14-vm1 ~]# gluster v create tier_test replica 2 10.70.47.165:/rhs/brick1/l0 10.70.47.163:/rhs/brick1/l0 10.70.47.165:/rhs/brick2/l0 10.70.47.163:/rhs/brick2/l0 force
volume create: tier_test: success: please start the volume to access data
[root@rhsqa14-vm1 ~]# 
[root@rhsqa14-vm1 ~]# 
[root@rhsqa14-vm1 ~]# 
[root@rhsqa14-vm1 ~]# gluster v start tier_test
^[[Avolume start: tier_test: success
[root@rhsqa14-vm1 ~]# gluster v info tier_test

Volume Name: tier_test
Type: Distributed-Replicate
Volume ID: 819de17d-8abb-4372-879f-81fd677b0d0e
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp   
Bricks:
Brick1: 10.70.47.165:/rhs/brick1/l0
Brick2: 10.70.47.163:/rhs/brick1/l0
Brick3: 10.70.47.165:/rhs/brick2/l0
Brick4: 10.70.47.163:/rhs/brick2/l0
Options Reconfigured: 
performance.readdir-ahead: on
[root@rhsqa14-vm1 ~]# gluster v statustier_test
unrecognized word: statustier_test (position 1)
[root@rhsqa14-vm1 ~]# gluster v status tier_test
Status of volume: tier_test
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.165:/rhs/brick1/l0           49169     0          Y       12488
Brick 10.70.47.163:/rhs/brick1/l0           49169     0          Y       12160
Brick 10.70.47.165:/rhs/brick2/l0           49170     0          Y       12506
Brick 10.70.47.163:/rhs/brick2/l0           49170     0          Y       12180
NFS Server on localhost                     2049      0          Y       12525
Self-heal Daemon on localhost               N/A       N/A        Y       12548
NFS Server on 10.70.47.159                  2049      0          Y       10123
Self-heal Daemon on 10.70.47.159            N/A       N/A        Y       10132
NFS Server on 10.70.46.2                    2049      0          Y       32083
Self-heal Daemon on 10.70.46.2              N/A       N/A        Y       32092
NFS Server on 10.70.47.163                  2049      0          Y       12204
Self-heal Daemon on 10.70.47.163            N/A       N/A        Y       12216

Task Status of Volume tier_test
------------------------------------------------------------------------------
There are no active volume tasks

[root@rhsqa14-vm1 ~]# 
[root@rhsqa14-vm1 ~]# ./options.sh tier_test
volume set: success   
volume quota : success
volume set: success   
volume quota : success
volume set: success   
[root@rhsqa14-vm1 ~]# 

[root@rhsqa14-vm1 ~]# gluster v status tier_test
Status of volume: tier_test
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.165:/rhs/brick1/l0           49169     0          Y       12488
Brick 10.70.47.163:/rhs/brick1/l0           49169     0          Y       12160
Brick 10.70.47.165:/rhs/brick2/l0           49170     0          Y       12506
Brick 10.70.47.163:/rhs/brick2/l0           49170     0          Y       12180
Snapshot Daemon on localhost                49171     0          Y       12767
NFS Server on localhost                     2049      0          Y       12775
Self-heal Daemon on localhost               N/A       N/A        Y       12548
Quota Daemon on localhost                   N/A       N/A        Y       12673
Snapshot Daemon on 10.70.47.163             49171     0          Y       12400
NFS Server on 10.70.47.163                  2049      0          Y       12408
Self-heal Daemon on 10.70.47.163            N/A       N/A        Y       12216
Quota Daemon on 10.70.47.163                N/A       N/A        Y       12313
Snapshot Daemon on 10.70.46.2               N/A       N/A        N       N/A
NFS Server on 10.70.46.2                    2049      0          Y       32250
Self-heal Daemon on 10.70.46.2              N/A       N/A        Y       32092
Quota Daemon on 10.70.46.2                  N/A       N/A        Y       32173
Snapshot Daemon on 10.70.47.159             N/A       N/A        N       N/A
NFS Server on 10.70.47.159                  2049      0          Y       10289
Self-heal Daemon on 10.70.47.159            N/A       N/A        Y       10132
Quota Daemon on 10.70.47.159                N/A       N/A        Y       10213

Task Status of Volume tier_test
------------------------------------------------------------------------------
There are no active volume tasks

[root@rhsqa14-vm1 ~]# 



snapd logs:

[root@rhsqa14-vm3 ~]# less /var/log/glusterfs/snaps/tier_test/snapd.log 
[2015-06-11 08:23:06.598143] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.1 (args: /usr/sbin/glusterfsd -s localhost --volfile-id snapd/tier_test -p /var/lib/glusterd/vols/tier_test/run/tier_test-snapd.pid -l /var/log/glusterfs/snaps/tier_test/snapd.log --brick-name snapd-tier_test -S /var/run/gluster/c9799260d198bbfad617e96cf0af7f84.socket --brick-port 49155 --xlator-option tier_test-server.listen-port=49155 --no-mem-accounting)
[2015-06-11 08:23:06.598227] E [MSGID: 100017] [glusterfsd.c:1880:glusterfs_pidfile_setup] 0-glusterfsd: pidfile /var/lib/glusterd/vols/tier_test/run/tier_test-snapd.pid open failed [No such file or directory]
/var/log/glusterfs/snaps/tier_test/snapd.log (END)

Comment 2 Triveni Rao 2015-06-11 09:59:01 UTC

when we have node which is not hosting the brick then this bug will be hit

Comment 4 Avra Sengupta 2015-06-18 11:24:13 UTC

Mainline - http://review.gluster.org/#/c/11227/
3.7 - http://review.gluster.org/#/c/11291/
Downstream - https://code.engineering.redhat.com/gerrit/51027

Comment 6 senaik 2015-06-22 13:05:08 UTC

Version : glusterfs-3.7.1-4.el6rhs.x86_64

Created a 2 brick volume in a 4 node cluster, snapshot daemon is running on all the nodes in the cluster.
Marking the bug 'Verified'

gluster v status vol3
Status of volume: vol3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick inception.lab.eng.blr.redhat.com:/rhs
/brick10/b10                                49173     0          Y       10213
Brick rhs-arch-srv4.lab.eng.blr.redhat.com:
/rhs/brick6/b6                              49167     0          Y       26414
Snapshot Daemon on localhost                49171     0          Y       3865 
NFS Server on localhost                     2049      0          Y       3873 
Self-heal Daemon on localhost               N/A       N/A        Y       3837 
Snapshot Daemon on 10.70.34.50              49174     0          Y       10262
NFS Server on 10.70.34.50                   2049      0          Y       10270
Self-heal Daemon on 10.70.34.50             N/A       N/A        Y       10239
Snapshot Daemon on rhs-arch-srv3.lab.eng.bl
r.redhat.com                                49169     0          Y       30266
NFS Server on rhs-arch-srv3.lab.eng.blr.red
hat.com                                     2049      0          Y       30278
Self-heal Daemon on rhs-arch-srv3.lab.eng.b
lr.redhat.com                               N/A       N/A        Y       30249
Snapshot Daemon on rhs-arch-srv4.lab.eng.bl
r.redhat.com                                49168     0          Y       26459
NFS Server on rhs-arch-srv4.lab.eng.blr.red
hat.com                                     2049      0          Y       26471
Self-heal Daemon on rhs-arch-srv4.lab.eng.b
lr.redhat.com                               N/A       N/A        Y       26436
 
Task Status of Volume vol3
------------------------------------------------------------------------------
There are no active volume tasks

Comment 7 errata-xmlrpc 2015-07-29 05:02:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.