Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1163209 - [USS]: cd to snap directory from fuse/nfs hungs OR takes too long when a node is brought offline
[USS]: cd to snap directory from fuse/nfs hungs OR takes too long when a nod...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: snapshot (Show other bugs)
3.0
x86_64 Linux
urgent Severity urgent
: ---
: RHGS 3.0.3
Assigned To: Raghavendra Bhat
Rahul Hinduja
USS
: ZStream
Depends On: 1174205 1175751
Blocks: 1162694
  Show dependency treegraph
 
Reported: 2014-11-12 08:30 EST by Rahul Hinduja
Modified: 2016-09-17 09:03 EDT (History)
10 users (show)

See Also:
Fixed In Version: glusterfs-3.6.0.39-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-01-15 08:42:19 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0038 normal SHIPPED_LIVE Red Hat Storage 3.0 enhancement and bug fix update #3 2015-01-15 13:35:28 EST

  None (edit)
Description Rahul Hinduja 2014-11-12 08:30:04 EST
Description of problem:
=======================

In a cluster of 4 nodes, when one of the node is brought offline and glusterd on other node is brought offline. cd to snap directory from fuse/nfs mount either hungs or takes too long


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.6.0.32-1.el6rhs.x86_64


How reproducible:
=================
always


Steps to Reproduce:
===================
1. Create 4 node cluster (node1 to node4) 
2. Create and start a volume (2*2) consisting brick from each node (node1 to node4)
3. Mount the volume on a client from node1 and populate data to it (example /mnt/)
4. Create 2 snapshots of a volume
5. Bring down node2 
6. Kill glusterd on node4
7. Change the snapshot-directory to snap-directory
8. Enable USS on the volume
9. From client access snap-directory (cd /mnt/snap-directory) from fuse and nfs 

Actual results:
===============

In two tries observed following result:

1. cd from fuse hungs and cd from nfs took too long (more than 2mins)
2. cd from fuse and nfs took too long
3. Once we are in snap-directory, cd to snapshots took too long


Expected results:
=================

cd from either fuse or nfs should be successful without observing delay or hung


Additional info:
================


Bricks on node1 and node2 are replicate pair and so as on node3 and node4
Comment 4 senaik 2014-11-13 06:48:36 EST
Version : glusterfs 3.6.0.32
=======

Another scenario where cd to .snaps from NFS mount hangs .

1)Fuse and NFS mount a 2x2 dist-rep volume , and enble USS

2) Create 256 snapshots in a loop while IO is going on 
 for i in {1..150} ; do cp -rvf /var/log/glusterfs f_log.$i ; done
 for i in {1..150} ; do cp -rvf /var/log/glusterfs n_log.$i ; done

3) After snapshot creation is complete, cd to .snaps from fuse and NFS mount
 From fuse mount, .snaps was accessible , then while accessing .snaps from NFS mount, failed with IO error

4) Checked gluster v status of the volume, showed snapd on the server (thro which the volume was mounted) was down

Log messages reported :
~~~~~~~~~~~~~~~~~~~~~~
[2014-11-12 13:32:35.074996] E [rpcsvc.c:617:rpcsvc_handle_rpc_call] 0-glusterd: Request received from non-privileged port. Failing request
[2014-11-12 13:32:35.106171] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick snapd-vol1 on port 49170
[2014-11-12 13:32:35.957462] W [socket.c:529:__socket_rwv] 0-management: readv on /var/run/22f16287a2b97835e475c3bbf5501834.socket failed (No data available)
[2014-11-12 13:32:36.109356] I [MSGID: 106006] [glusterd-handler.c:4238:__glusterd_snapd_rpc_notify] 0-management: snapd for volume vol1 has disconnected from glusterd.

5) Restarted glusterd and accessed .snaps - successful

6) Access .snaps from fuse and nfs mount again, while trying to cd to .snaps from NFS mount , snapd on the server always went down

7) Tried to stop the volume, start it again and then access .snaps . From Fuse mount, it was successful, but from NFS mount cd  to .snaps was hung
Comment 6 Vijaikumar Mallikarjuna 2014-12-02 08:54:17 EST
We were not able to re-create this problem with the below setup:

Installed glusterfs-3.6.0.35
Created 4 node cluster
Created 2x2 volume
Followed the instruction mentioned in the description
Comment 7 Vijaikumar Mallikarjuna 2014-12-03 04:33:22 EST
Patch https://code.engineering.redhat.com/gerrit/#/c/37398/ has fixed this issue.
Comment 8 Rahul Hinduja 2014-12-08 06:40:36 EST
Able to recreate the issue with exactly same steps on build: glusterfs-3.6.0.36-1.el6.x86_64

From Fuse it took more than a minute, and from NFS it took more than 3 mins

From Fuse:
==========
[root@wingo vol0]# pwd
/mnt/vol0
[root@wingo vol0]# time cd .snaps

real    1m3.043s
user    0m0.000s
sys     0m0.000s
[root@wingo .snaps]#


From NFS:
=========
[root@wingo ~]# cd /mnt/nvol0
[root@wingo nvol0]# 
[root@wingo nvol0]# time cd .snaps

real    3m3.043s
user    0m0.000s
sys     0m0.002s
[root@wingo .snaps]# 
[root@wingo .snaps]# rpm -qa | grep glusterfs-3.6.0.36-1.el6.x86_64
glusterfs-3.6.0.36-1.el6.x86_64
[root@wingo .snaps]# 


In general, do uss on when a node was done, and cd to .snaps it takes too long.

Moving back to assigned state
Comment 9 senaik 2014-12-09 06:51:15 EST
Version : glusterfs 3.6.0.36 
========

Another scenario where cd to .snaps hangs and sometimes fails with "Transport endpoint not connected" from Fuse mount and "I/O Error" from NFS mount


- Create a 2x2 dist-rep volume
- Fuse and NFS mount the volume & enable USS 
- Create some IO
- Take few snapshots
- Bring down glusterd on node2
- Activate one of the snapshots 
- From both fuse and nfs mounts cd to .snaps and list the snaps --> it hangs 
- From a different terminal cd to .snaps and list the snaps , it fails with "Transport endpoint not connected" from Fuse mount and "I/O Error" from NFS mount


[root@dhcp-0-97 .snaps]# ll
ls: reading directory .: Transport endpoint is not connected
total 0
[root@dhcp-0-97 .snaps]# ll
ls: cannot open directory .: Transport endpoint is not connected


[root@dhcp-0-97 .snaps]# ll
ls: cannot open directory .: Input/output error
[root@dhcp-0-97 .snaps]# pwd
/mnt/vol0_nfs/nfs_etc.1/.snaps


Based on Comment8 and Comment9 , changing the severity of this bug to Urgent since the issue is reproduced quite often
Comment 12 senaik 2014-12-30 02:22:14 EST
Version :glusterfs 3.6.0.40
=======
Repeated the steps as mentioned in Description, Comment8 and Comment 9 , unable to reproduce the issue.
The issue mentioned in Comment4 is tracked by bz 1163750

Marking the bug as 'Verified'
Comment 14 errata-xmlrpc 2015-01-15 08:42:19 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0038.html

Note You need to log in before you can comment on or make changes to this bug.