Bug 1168873

Summary:	[USS]: If uss is disabled and than enabled ls from virtual snap world hungs
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Rahul Hinduja <rhinduja>
Component:	snapshot	Assignee:	Vijaikumar Mallikarjuna <vmallika>
Status:	CLOSED ERRATA	QA Contact:	storage-qa-internal <storage-qa-internal>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.0	CC:	rhs-bugs, rjoseph, smohan, storage-qa-internal, vagarwal, vmallika
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.0.3
Hardware:	x86_64
OS:	Linux
Whiteboard:	USS
Fixed In Version:	glusterfs-3.6.0.35	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-01-15 13:43:01 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1162694

Description Rahul Hinduja 2014-11-28 09:45:50 UTC

Description of problem:
=======================

If you are in virtual snap world (.snaps) and uss is disabled, ls from it errors out as expected. But if you again enable the uss which starts the new snapd process and if you try to do ls from it, it hungs. It should re-establish the connect OR it should gracefully error out.

USS is enabled and we are inside the virtual snap world from client:
====================================================================

[root@wingo vol2]# cd .snaps
[root@wingo .snaps]# ls
rs1  rs2
[root@wingo .snaps]# cd rs1
[root@wingo rs1]# ls
etc.1  etc.2
[root@wingo rs1]# cd etc.1
[root@wingo etc.1]#
[root@wingo etc.1]# pwd
/mnt/vol2/.snaps/rs1/etc.1
[root@wingo etc.1]#

Disable the USS and try to ls from inside, it errors as expected:
=================================================================

[root@inception ~]# gluster v set vol2 uss off
volume set: success

[root@wingo etc.1]# pwd
/mnt/vol2/.snaps/rs1/etc.1
[root@wingo etc.1]# ls
ls: cannot open directory .: No such file or directory
[root@wingo etc.1]# ls
ls: cannot open directory .: No such file or directory
[root@wingo etc.1]#

Re-enable the USS and try to do ls:
===================================

[root@inception ~]# gluster v set vol2 uss on
volume set: success
[root@inception ~]# 
[root@wingo etc.1]# ls
^C
^C^C
^C
^C


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.6.0.34-1.el6rhs.x86_64


How reproducible:
=================

always


Steps to Reproduce:
===================
1. Create 4 node cluster 
2. Create 2x2 volume
3. Mount on client (Fuse and NFS)
4. From Fuse do cp -rf /etc etc.1
5. From Nfs do cp -rf /etc etc.2
6. Create 2 snapshots rs1 and rs2
7. Activate snapshots rs1 and rs2
8. Enable USS
9. cd to the virtual .snaps directory from fuse and nfs
From fuse: cd .snaps/rs1/etc.1
From nfs: cd .snaps/rs2/etc.2
10. ls from both the above directories, it should list entries
11. Disable the USS
12. ls again, it should error out
13. Enable the USS
14. ls again

Actual results:
===============

It hungs from fuse and nfs

Expected results:
=================

It should re-establish the connection and list the entries OR, it should gracefully error out

Comment 1 Rahul Hinduja 2014-11-28 09:48:10 UTC

nfs: server inception.lab.eng.blr.redhat.com not responding, still trying
INFO: task ls:6757 blocked for more than 120 seconds.
      Not tainted 2.6.32-504.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ls            D 0000000000000000     0  6757   6339 0x00000084
 ffff880119e85a68 0000000000000082 0000000000000001 0000000000000246
 ffff880119e859f8 0000000000000206 ffff880119e859f8 ffffffff8109ece7
 ffff880119e85a08 ffff880119594d98 ffff8801196f65f8 ffff880119e85fd8
Call Trace:
 [<ffffffff8109ece7>] ? finish_wait+0x67/0x80
 [<ffffffff8109ee2e>] ? prepare_to_wait+0x4e/0x80
 [<ffffffffa00e433d>] __fuse_request_send+0xed/0x2b0 [fuse]
 [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa00e4512>] fuse_request_send+0x12/0x20 [fuse]
 [<ffffffffa00e7ccc>] fuse_do_getattr+0x10c/0x2c0 [fuse]
 [<ffffffffa00e7efd>] fuse_update_attributes+0x7d/0x90 [fuse]
 [<ffffffffa00e9038>] fuse_permission+0xb8/0x1f0 [fuse]
 [<ffffffff8119e1e3>] __link_path_walk+0xb3/0x1000
 [<ffffffff8119c5b5>] ? path_init+0x185/0x250
 [<ffffffff8119f3ea>] path_walk+0x6a/0xe0
 [<ffffffff8119f5fb>] filename_lookup+0x6b/0xc0
 [<ffffffff8122d466>] ? security_file_alloc+0x16/0x20
 [<ffffffff811a0ad4>] do_filp_open+0x104/0xd20
 [<ffffffff811a3752>] ? vfs_ioctl+0x22/0xa0
 [<ffffffff81298eea>] ? strncpy_from_user+0x4a/0x90
 [<ffffffff811adf82>] ? alloc_fd+0x92/0x160
 [<ffffffff8118ae07>] do_sys_open+0x67/0x130
 [<ffffffff8118af10>] sys_open+0x20/0x30
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[root@wingo ~]#

Comment 4 Vijaikumar Mallikarjuna 2014-11-28 11:08:35 UTC

I am not able to re-create this problem in the latest downstream code

Comment 6 Vijaikumar Mallikarjuna 2014-12-02 13:49:56 UTC

We were not able to re-create this problem with the below setup:

Installed glusterfs-3.6.0.35
Created 4 node cluster
Created 2x2 volume
Followed the instruction mentioned in the description

Comment 7 Rahul Hinduja 2014-12-03 07:34:55 UTC

This issue is easily reproducible with build: glusterfs-3.6.0.34

With build: glusterfs-3.6.0.35, after enabling, it errors out with "No such file or directory" and no hung observed.

Comment 8 Vijaikumar Mallikarjuna 2014-12-03 09:28:42 UTC

Patch https://code.engineering.redhat.com/gerrit/#/c/37398/ has fixed this issue

Comment 11 errata-xmlrpc 2015-01-15 13:43:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0038.html