Description of problem: Snapshot daemon failed to run on newly created dist-rep volume with uss enabled. uss enable successful, volume status showed that snapshot daemon was not running. Note: i am using a work around that restarting glusterd on the nodes where snapd failed will fix this issue. Version-Release number of selected component (if applicable): [root@rhsqa14-vm1 ~]# glusterfs --version glusterfs 3.7.1 built on Jun 9 2015 02:31:54 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. [root@rhsqa14-vm1 ~]# rpm -qa | grep gluster glusterfs-3.7.1-1.el6rhs.x86_64 glusterfs-cli-3.7.1-1.el6rhs.x86_64 glusterfs-libs-3.7.1-1.el6rhs.x86_64 glusterfs-client-xlators-3.7.1-1.el6rhs.x86_64 glusterfs-fuse-3.7.1-1.el6rhs.x86_64 glusterfs-server-3.7.1-1.el6rhs.x86_64 glusterfs-api-3.7.1-1.el6rhs.x86_64 [root@rhsqa14-vm1 ~]# How reproducible: easily. Steps to Reproduce: 1. Create a dist-rep volume on 2 nodes. 2. enable uss, gluster v set tier_test features.uss enable 3. check gluster v status tier_test. Additional info: [root@rhsqa14-vm1 ~]# gluster v create tier_test replica 2 10.70.47.165:/rhs/brick1/l0 10.70.47.163:/rhs/brick1/l0 10.70.47.165:/rhs/brick2/l0 10.70.47.163:/rhs/brick2/l0 force volume create: tier_test: success: please start the volume to access data [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm1 ~]# gluster v start tier_test ^[[Avolume start: tier_test: success [root@rhsqa14-vm1 ~]# gluster v info tier_test Volume Name: tier_test Type: Distributed-Replicate Volume ID: 819de17d-8abb-4372-879f-81fd677b0d0e Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.47.165:/rhs/brick1/l0 Brick2: 10.70.47.163:/rhs/brick1/l0 Brick3: 10.70.47.165:/rhs/brick2/l0 Brick4: 10.70.47.163:/rhs/brick2/l0 Options Reconfigured: performance.readdir-ahead: on [root@rhsqa14-vm1 ~]# gluster v statustier_test unrecognized word: statustier_test (position 1) [root@rhsqa14-vm1 ~]# gluster v status tier_test Status of volume: tier_test Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.165:/rhs/brick1/l0 49169 0 Y 12488 Brick 10.70.47.163:/rhs/brick1/l0 49169 0 Y 12160 Brick 10.70.47.165:/rhs/brick2/l0 49170 0 Y 12506 Brick 10.70.47.163:/rhs/brick2/l0 49170 0 Y 12180 NFS Server on localhost 2049 0 Y 12525 Self-heal Daemon on localhost N/A N/A Y 12548 NFS Server on 10.70.47.159 2049 0 Y 10123 Self-heal Daemon on 10.70.47.159 N/A N/A Y 10132 NFS Server on 10.70.46.2 2049 0 Y 32083 Self-heal Daemon on 10.70.46.2 N/A N/A Y 32092 NFS Server on 10.70.47.163 2049 0 Y 12204 Self-heal Daemon on 10.70.47.163 N/A N/A Y 12216 Task Status of Volume tier_test ------------------------------------------------------------------------------ There are no active volume tasks [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm1 ~]# ./options.sh tier_test volume set: success volume quota : success volume set: success volume quota : success volume set: success [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm1 ~]# gluster v status tier_test Status of volume: tier_test Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.165:/rhs/brick1/l0 49169 0 Y 12488 Brick 10.70.47.163:/rhs/brick1/l0 49169 0 Y 12160 Brick 10.70.47.165:/rhs/brick2/l0 49170 0 Y 12506 Brick 10.70.47.163:/rhs/brick2/l0 49170 0 Y 12180 Snapshot Daemon on localhost 49171 0 Y 12767 NFS Server on localhost 2049 0 Y 12775 Self-heal Daemon on localhost N/A N/A Y 12548 Quota Daemon on localhost N/A N/A Y 12673 Snapshot Daemon on 10.70.47.163 49171 0 Y 12400 NFS Server on 10.70.47.163 2049 0 Y 12408 Self-heal Daemon on 10.70.47.163 N/A N/A Y 12216 Quota Daemon on 10.70.47.163 N/A N/A Y 12313 Snapshot Daemon on 10.70.46.2 N/A N/A N N/A NFS Server on 10.70.46.2 2049 0 Y 32250 Self-heal Daemon on 10.70.46.2 N/A N/A Y 32092 Quota Daemon on 10.70.46.2 N/A N/A Y 32173 Snapshot Daemon on 10.70.47.159 N/A N/A N N/A NFS Server on 10.70.47.159 2049 0 Y 10289 Self-heal Daemon on 10.70.47.159 N/A N/A Y 10132 Quota Daemon on 10.70.47.159 N/A N/A Y 10213 Task Status of Volume tier_test ------------------------------------------------------------------------------ There are no active volume tasks [root@rhsqa14-vm1 ~]# snapd logs: [root@rhsqa14-vm3 ~]# less /var/log/glusterfs/snaps/tier_test/snapd.log [2015-06-11 08:23:06.598143] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.1 (args: /usr/sbin/glusterfsd -s localhost --volfile-id snapd/tier_test -p /var/lib/glusterd/vols/tier_test/run/tier_test-snapd.pid -l /var/log/glusterfs/snaps/tier_test/snapd.log --brick-name snapd-tier_test -S /var/run/gluster/c9799260d198bbfad617e96cf0af7f84.socket --brick-port 49155 --xlator-option tier_test-server.listen-port=49155 --no-mem-accounting) [2015-06-11 08:23:06.598227] E [MSGID: 100017] [glusterfsd.c:1880:glusterfs_pidfile_setup] 0-glusterfsd: pidfile /var/lib/glusterd/vols/tier_test/run/tier_test-snapd.pid open failed [No such file or directory] /var/log/glusterfs/snaps/tier_test/snapd.log (END)
when we have node which is not hosting the brick then this bug will be hit
Mainline - http://review.gluster.org/#/c/11227/ 3.7 - http://review.gluster.org/#/c/11291/ Downstream - https://code.engineering.redhat.com/gerrit/51027
Version : glusterfs-3.7.1-4.el6rhs.x86_64 Created a 2 brick volume in a 4 node cluster, snapshot daemon is running on all the nodes in the cluster. Marking the bug 'Verified' gluster v status vol3 Status of volume: vol3 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick inception.lab.eng.blr.redhat.com:/rhs /brick10/b10 49173 0 Y 10213 Brick rhs-arch-srv4.lab.eng.blr.redhat.com: /rhs/brick6/b6 49167 0 Y 26414 Snapshot Daemon on localhost 49171 0 Y 3865 NFS Server on localhost 2049 0 Y 3873 Self-heal Daemon on localhost N/A N/A Y 3837 Snapshot Daemon on 10.70.34.50 49174 0 Y 10262 NFS Server on 10.70.34.50 2049 0 Y 10270 Self-heal Daemon on 10.70.34.50 N/A N/A Y 10239 Snapshot Daemon on rhs-arch-srv3.lab.eng.bl r.redhat.com 49169 0 Y 30266 NFS Server on rhs-arch-srv3.lab.eng.blr.red hat.com 2049 0 Y 30278 Self-heal Daemon on rhs-arch-srv3.lab.eng.b lr.redhat.com N/A N/A Y 30249 Snapshot Daemon on rhs-arch-srv4.lab.eng.bl r.redhat.com 49168 0 Y 26459 NFS Server on rhs-arch-srv4.lab.eng.blr.red hat.com 2049 0 Y 26471 Self-heal Daemon on rhs-arch-srv4.lab.eng.b lr.redhat.com N/A N/A Y 26436 Task Status of Volume vol3 ------------------------------------------------------------------------------ There are no active volume tasks
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html