escription of problem: After the volume is stop/start and browsing the .snaps from the CIFS fails. Version-Release number of selected component (if applicable): [root@gqas005 ~]# rpm -qa | grep gluster gluster-nagios-common-0.1.4-1.el6rhs.noarch glusterfs-fuse-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-rdma-3.6.0.42.1-1.el6rhs.x86_64 gluster-nagios-addons-0.1.14-1.el6rhs.x86_64 samba-glusterfs-3.6.509-169.4.el6rhs.x86_64 rhs-tests-rhs-tests-beaker-rhs-gluster-qe-libs-dev-bturner-2.37-0.noarch glusterfs-libs-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-api-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-cli-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-geo-replication-3.6.0.42.1-1.el6rhs.x86_64 vdsm-gluster-4.14.7.3-1.el6rhs.noarch glusterfs-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-server-3.6.0.42.1-1.el6rhs.x86_64 [root@gqas005 ~]# How reproducible: Intermittent Steps to Reproduce: 1. Create a 6*2 dist-rep volume and start it 2. Mount the volume at the Windows Client 3. Enable USS and run I/O at CIFS mount point 4. Create 256 snapshots for the volume 5. While accessing the 256 snaps and present in <Drive>:\.snaps stop the gluster volume 6. Start the gluster volume again and try to access .snaps dir Actual results: snapd is down and not accessible as snapd is not able to connect to existing socket because of error "Socket already in use" This may also happen in the case when the snapd is killed for some reason and the ports are held up by kernel. Ideally if the snapd could not find the port it should bind to a different free port and so the .snaps can be accessible from the client. Expected results: snapd should come up successfully. Additional info: Workaround: After enable the uss forcefully it worked. snapd.log shows the following: ============================= [2015-02-10 09:42:43.617728] W [options.c:898:xl_opt_validate] 0-testvol1-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction [2015-02-10 09:42:43.617825] E [socket.c:711:__socket_server_bind] 0-tcp.testvol1-server: binding to failed: Address already in use [2015-02-10 09:42:43.617840] E [socket.c:714:__socket_server_bind] 0-tcp.testvol1-server: Port is already in use [2015-02-10 09:42:43.617859] W [rpcsvc.c:1531:rpcsvc_transport_create] 0-rpc-service: listening on transport failed [2015-02-10 09:42:43.617872] W [server.c:911:init] 0-testvol1-server: creation of listener failed [2015-02-10 09:42:43.617883] E [xlator.c:406:xlator_init] 0-testvol1-server: Initialization of volume 'testvol1-server' failed, review your volfile again [2015-02-10 09:42:43.617894] E [graph.c:322:glusterfs_graph_init] 0-testvol1-server: initializing translator failed [2015-02-10 09:42:43.617904] E [graph.c:525:glusterfs_graph_activate] 0-graph: init failed [2015-02-10 09:42:43.618221] W [glusterfsd.c:1183:cleanup_and_exit] (--> 0-: received signum (0), shutting down [2015-02-10 10:31:33.327204] I [MSGID: 100030] [glusterfsd.c:2016:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.6.0.42.1 (args: /usr/sbin/glusterfsd -s localhost --volfile-id snapd/testvol1 -p /var/lib/glusterd/vols/testvol1/run/testvol1-snapd.pid -l /var/log/glusterfs/snaps/testvol1/snapd.log --brick-name snapd-testvol1 -S /var/run/c3bc0889c974e54aaf844607b33c8054.socket --brick-port 49959 --xlator-option testvol1-server.listen-port=49959 --no-mem-accounting) [2015-02-10 10:31:34.202665] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2015-02-10 10:31:34.702169] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2015-02-10 10:31:35.160011] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2015-02-10 10:31:35.187382] I [graph.c:269:gf_add_cmdline_options] 0-testvol1-server: adding option 'listen-port' for volume 'testvol1-server' with value '49959' [2015-02-10 10:31:35.225011] I [rpcsvc.c:2142:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64 [2015-02-10 10:31:35.225094] W [options.c:898:xl_opt_validate] 0-testvol1-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction [2015-02-10 10:31:35.234311] W [graph.c:344:_log_if_unknown_option] 0-testvol1-server: option 'rpc-auth.auth-glusterfs' is not recognized [2015-02-10 10:31:35.234355] W [graph.c:344:_log_if_unknown_option] 0-testvol1-server: option 'rpc-auth.auth-unix' is not recognized [2015-02-10 10:31:35.234386] W [graph.c:344:_log_if_unknown_option] 0-testvol1-server: option 'rpc-auth.auth-null' is not recognized [root@gqas005 ~]# gluster volume info Volume Name: testvol1 Type: Distributed-Replicate Volume ID: df8c4ec8-714f-4c58-8a34-65fe8c170dd9 Status: Started Snap Volume: no Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick1/b1 Brick2: gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick2/b2 Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick3/b3 Brick4: gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick4/b4 Brick5: gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick5/b5 Brick6: gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick6/b6 Brick7: gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick7/b7 Brick8: gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick8/b8 Brick9: gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick9/b9 Brick10: gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick10/b10 Brick11: gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick11/b11 Brick12: gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick12/b12 Options Reconfigured: server.allow-insecure: enable storage.batch-fsync-delay-usec: 0 features.quota: on features.uss: enable performance.readdir-ahead: on features.show-snapshot-directory: enable performance.stat-prefetch: enable performance.io-cache: enable features.quota-deem-statfs: enable features.barrier: disable snap-max-hard-limit: 256 snap-max-soft-limit: 90 auto-delete: disable [root@gqas005 ~]# [root@gqas005 ~]# rpm -qa | grep gluster gluster-nagios-common-0.1.4-1.el6rhs.noarch glusterfs-fuse-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-rdma-3.6.0.42.1-1.el6rhs.x86_64 gluster-nagios-addons-0.1.14-1.el6rhs.x86_64 samba-glusterfs-3.6.509-169.4.el6rhs.x86_64 rhs-tests-rhs-tests-beaker-rhs-gluster-qe-libs-dev-bturner-2.37-0.noarch glusterfs-libs-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-api-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-cli-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-geo-replication-3.6.0.42.1-1.el6rhs.x86_64 vdsm-gluster-4.14.7.3-1.el6rhs.noarch glusterfs-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-server-3.6.0.42.1-1.el6rhs.x86_64 [root@gqas005 ~]# [root@gqas005 ~]# gluster volume status testvol1 Status of volume: testvol1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick1/b1 49153 Y 25691 Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick2/b2 49153 Y 23636 Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick3/b3 49153 Y 22686 Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick4/b4 49154 Y 27794 Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick5/b5 49155 Y 27813 Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick6/b6 49154 Y 22697 Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick7/b7 49154 Y 25709 Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick8/b8 49154 Y 23647 Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick9/b9 49155 Y 22708 Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick10/b 10 49156 Y 27824 Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick11/b 11 49155 Y 25721 Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick12/b 12 49155 Y 23658 Snapshot Daemon on localhost N/A N N/A ---> snapd is down NFS Server on localhost 2049 Y 27844 Self-heal Daemon on localhost N/A Y 27855 Quota Daemon on localhost N/A Y 27862 Snapshot Daemon on gqas009.sbu.lab.eng.bos.redhat.com 49925 Y 25733 NFS Server on gqas009.sbu.lab.eng.bos.redhat.com 2049 Y 25758 Self-heal Daemon on gqas009.sbu.lab.eng.bos.redhat.com N/A Y 25793 Quota Daemon on gqas009.sbu.lab.eng.bos.redhat.com N/A Y 25830 Snapshot Daemon on gqas006.sbu.lab.eng.bos.redhat.com 49925 Y 22720 NFS Server on gqas006.sbu.lab.eng.bos.redhat.com 2049 Y 22727 Self-heal Daemon on gqas006.sbu.lab.eng.bos.redhat.com N/A Y 22734 Quota Daemon on gqas006.sbu.lab.eng.bos.redhat.com N/A Y 22741 Snapshot Daemon on gqas012.sbu.lab.eng.bos.redhat.com 49925 Y 23670 NFS Server on gqas012.sbu.lab.eng.bos.redhat.com 2049 Y 23678 Self-heal Daemon on gqas012.sbu.lab.eng.bos.redhat.com N/A Y 23685 Quota Daemon on gqas012.sbu.lab.eng.bos.redhat.com N/A Y 23692 Task Status of Volume testvol1 ------------------------------------------------------------------------------ There are no active volume tasks [root@gqas005 ~]# [root@gqas005 ~]# [root@gqas005 ~]# [root@gqas005 ~]# less /var/log/glusterfs/ bricks/ geo-replication-slaves/ quotad.log cli.log glustershd.log quotad.log-20150208 cli.log-20150208 glustershd.log-20150208 quota-mount-testvol1.log .cmd_log_history nfs.log quota-mount-testvol1.log-20150208 etc-glusterfs-glusterd.vol.log nfs.log-20150208 quota-mount-testvol.log etc-glusterfs-glusterd.vol.log-20150208 quota-crawl.log quota-mount-testvol.log-20150208 geo-replication/ quota-crawl.log-20150208 snaps/ [root@gqas005 ~]# less /var/log/glusterfs/s /var/log/glusterfs/s: No such file or directory [root@gqas005 ~]# [root@gqas005 ~]# rpm -qa | grep gluster gluster-nagios-common-0.1.4-1.el6rhs.noarch glusterfs-fuse-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-rdma-3.6.0.42.1-1.el6rhs.x86_64 gluster-nagios-addons-0.1.14-1.el6rhs.x86_64 samba-glusterfs-3.6.509-169.4.el6rhs.x86_64 rhs-tests-rhs-tests-beaker-rhs-gluster-qe-libs-dev-bturner-2.37-0.noarch glusterfs-libs-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-api-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-cli-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-geo-replication-3.6.0.42.1-1.el6rhs.x86_64 vdsm-gluster-4.14.7.3-1.el6rhs.noarch glusterfs-3.6.0.42.1-1.el6rhs.x86_64 glusterfs-server-3.6.0.42.1-1.el6rhs.x86_64 [root@gqas005 ~]# [root@gqas005 ~]# [root@gqas005 ~]# [root@gqas005 ~]# [root@gqas005 ~]# [root@gqas005 ~]# gluster vol set testvol1 features.uss enable force Usage: volume set <VOLNAME> <KEY> <VALUE> [root@gqas005 ~]# gluster vol set testvol1 features.uss enable volume set: success [root@gqas005 ~]# [root@gqas005 ~]# [root@gqas005 ~]# [root@gqas005 ~]# gluster vol status Status of volume: testvol1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick1/b1 49153 Y 25691 Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick2/b2 49153 Y 23636 Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick3/b3 49153 Y 22686 Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick4/b4 49154 Y 27794 Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick5/b5 49155 Y 27813 Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick6/b6 49154 Y 22697 Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick7/b7 49154 Y 25709 Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick8/b8 49154 Y 23647 Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick9/b9 49155 Y 22708 Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick10/b 10 49156 Y 27824 Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick11/b 11 49155 Y 25721 Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick12/b 12 49155 Y 23658 Snapshot Daemon on localhost 49959 Y 29088 NFS Server on localhost 2049 Y 27844 Self-heal Daemon on localhost N/A Y 27855 Quota Daemon on localhost N/A Y 27862 Snapshot Daemon on gqas006.sbu.lab.eng.bos.redhat.com 49925 Y 22720 NFS Server on gqas006.sbu.lab.eng.bos.redhat.com 2049 Y 22727 Self-heal Daemon on gqas006.sbu.lab.eng.bos.redhat.com N/A Y 22734 Quota Daemon on gqas006.sbu.lab.eng.bos.redhat.com N/A Y 22741 Snapshot Daemon on gqas009.sbu.lab.eng.bos.redhat.com 49925 Y 25733 NFS Server on gqas009.sbu.lab.eng.bos.redhat.com 2049 Y 25758 Self-heal Daemon on gqas009.sbu.lab.eng.bos.redhat.com N/A Y 25793 Quota Daemon on gqas009.sbu.lab.eng.bos.redhat.com N/A Y 25830 Snapshot Daemon on gqas012.sbu.lab.eng.bos.redhat.com 49925 Y 23670 NFS Server on gqas012.sbu.lab.eng.bos.redhat.com 2049 Y 23678 Self-heal Daemon on gqas012.sbu.lab.eng.bos.redhat.com N/A Y 23685 Quota Daemon on gqas012.sbu.lab.eng.bos.redhat.com N/A Y 23692 Task Status of Volume testvol1 ------------------------------------------------------------------------------ There are no active volume tasks [root@gqas005 ~]# [root@gqas005 ~]# [root@gqas005 ~]# [root@gqas005 ~]# gluster volume stop testvol1 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: testvol1: success [root@gqas005 ~]# gluster volume status Volume testvol1 is not started [root@gqas005 ~]# gluster volume start testvol1 volume start: testvol1: success [root@gqas005 ~]# gluster volume status Status of volume: testvol1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick1/b1 49153 Y 27126 Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick2/b2 49153 Y 25068 Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick3/b3 49153 Y 24140 Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick4/b4 49154 Y 30889 Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick5/b5 49155 Y 30900 Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick6/b6 49154 Y 24151 Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick7/b7 49154 Y 27137 Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick8/b8 49154 Y 25083 Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick9/b9 49155 Y 24162 Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick10/b 10 49156 Y 30911 Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick11/b 11 49155 Y 27148 Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick12/b 12 49155 Y 25095 Snapshot Daemon on localhost 49959 Y 30923 NFS Server on localhost 2049 Y 30930 Self-heal Daemon on localhost N/A Y 30937 Quota Daemon on localhost N/A Y 30944 Snapshot Daemon on gqas009.sbu.lab.eng.bos.redhat.com 49925 Y 27160 NFS Server on gqas009.sbu.lab.eng.bos.redhat.com N/A N N/A Self-heal Daemon on gqas009.sbu.lab.eng.bos.redhat.com N/A N N/A Quota Daemon on gqas009.sbu.lab.eng.bos.redhat.com N/A N N/A Snapshot Daemon on gqas012.sbu.lab.eng.bos.redhat.com 49925 Y 25108 NFS Server on gqas012.sbu.lab.eng.bos.redhat.com 2049 Y 25116 Self-heal Daemon on gqas012.sbu.lab.eng.bos.redhat.com N/A Y 25124 Quota Daemon on gqas012.sbu.lab.eng.bos.redhat.com N/A Y 25131 Snapshot Daemon on gqas006.sbu.lab.eng.bos.redhat.com 49925 Y 24174 NFS Server on gqas006.sbu.lab.eng.bos.redhat.com 2049 Y 24181 Self-heal Daemon on gqas006.sbu.lab.eng.bos.redhat.com N/A Y 24188 Quota Daemon on gqas006.sbu.lab.eng.bos.redhat.com N/A Y 24195 Task Status of Volume testvol1 --- Additional comment from Mohammed Rafi KC on 2017-06-30 12:11:44 EDT --- RCA: in windows client .snaps is considered as special directory, whereas in all other client, it is a virtual directory. For this reason, lookup on root has to return the snapshot entries in windows client. Even though snapshot is a specially directory it doesn't have a dedicated inode. Which means each time, it gets a different gfid which is not present in the backend. When a snapd restarts, gfid from the backend will loose. but the client will have older gfid and the client lookup fails with ESTALE. Usually when we get ESTALE error in lookup we try with new inode, but that is missing in this code path. Because this was a special lookup during readdirp on root.
COMMIT: https://review.gluster.org/17689 committed in master by Atin Mukherjee (amukherj) ------ commit ecd92d42bbd9249aa637b1ad3000aa242308cb04 Author: Mohammed Rafi KC <rkavunga> Date: Fri Jun 30 20:17:20 2017 +0530 svs:implement CHILD UP notify in snapview-server protocol/server expects a child up event to successfully configure the graph. In the actual brick graph, posix is the one who decide to initiate the notification to the parent that the child is up. But in snapd graph there is no posix, hence the child up notification was missing. Ideally each xlator should initiate the child up event whenever it see's that this is the last child xlator. Change-Id: Icccdb9fe920c265cadaf9f91c040a0831b4b78fc BUG: 1467513 Signed-off-by: Mohammed Rafi KC <rkavunga> Reviewed-on: https://review.gluster.org/17689 CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Amar Tumballi <amarts> Reviewed-by: Raghavendra Bhat <raghavendra>
COMMIT: https://review.gluster.org/17690 committed in master by Atin Mukherjee (amukherj) ------ commit 70a5dfdea4980dea5da5b5008a16fd155a3adf34 Author: Mohammed Rafi KC <rkavunga> Date: Mon Jul 3 12:45:38 2017 +0530 svc: send revalidate lookup on special dir .snaps directory is a virtual direcotory, that doesn't exist on the backend. Even though it is a special dentry, it doesn't have a dedicated inode. So the inode number is always random. Which means it will get different inode number when reboot happens on snapd process. Now with windows client the show-direcotry feature requires a lookup on the .snpas direcoty post readdirp on root. If the snapd restarted after a lookup, then subsequent lookup will fail, because linked inode will be stale. This patch will do a revalidate lookup with a new inode. Change-Id: If97c07ecb307cefe7c86be8ebd05e28cbf678d1f BUG: 1467513 Signed-off-by: Mohammed Rafi KC <rkavunga> Reviewed-on: https://review.gluster.org/17690 CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra Bhat <raghavendra> Smoke: Gluster Build System <jenkins.org>
REVIEW: https://review.gluster.org/17691 (uss/svc: fix double free on xdata dictionary) posted (#2) for review on master by Atin Mukherjee (amukherj)
COMMIT: https://review.gluster.org/17691 committed in master by Atin Mukherjee (amukherj) ------ commit 26241777bf59c7d64c582ce09e557bc2dc97dabb Author: Mohammed Rafi KC <rkavunga> Date: Mon Jul 3 16:37:01 2017 +0530 uss/svc: fix double free on xdata dictionary we were taking unref on wrong dictionary which results in wrong memory access. Change-Id: Ic25a6c209ecd72c9056dfcb79fabcfc650dd3c1e BUG: 1467513 Signed-off-by: Mohammed Rafi KC <rkavunga> Reviewed-on: https://review.gluster.org/17691 CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Atin Mukherjee <amukherj>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/