Bug 1163209
| Summary: | [USS]: cd to snap directory from fuse/nfs hungs OR takes too long when a node is brought offline | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rahul Hinduja <rhinduja> |
| Component: | snapshot | Assignee: | Raghavendra Bhat <rabhat> |
| Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | rhgs-3.0 | CC: | amainkar, nsathyan, rabhat, rhs-bugs, rjoseph, senaik, storage-qa-internal, surs, vagarwal, vmallika |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | RHGS 3.0.3 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | USS | ||
| Fixed In Version: | glusterfs-3.6.0.39-1 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-01-15 13:42:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1174205, 1175751 | ||
| Bug Blocks: | 1162694 | ||
|
Description
Rahul Hinduja
2014-11-12 13:30:04 UTC
Version : glusterfs 3.6.0.32
=======
Another scenario where cd to .snaps from NFS mount hangs .
1)Fuse and NFS mount a 2x2 dist-rep volume , and enble USS
2) Create 256 snapshots in a loop while IO is going on
for i in {1..150} ; do cp -rvf /var/log/glusterfs f_log.$i ; done
for i in {1..150} ; do cp -rvf /var/log/glusterfs n_log.$i ; done
3) After snapshot creation is complete, cd to .snaps from fuse and NFS mount
From fuse mount, .snaps was accessible , then while accessing .snaps from NFS mount, failed with IO error
4) Checked gluster v status of the volume, showed snapd on the server (thro which the volume was mounted) was down
Log messages reported :
~~~~~~~~~~~~~~~~~~~~~~
[2014-11-12 13:32:35.074996] E [rpcsvc.c:617:rpcsvc_handle_rpc_call] 0-glusterd: Request received from non-privileged port. Failing request
[2014-11-12 13:32:35.106171] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick snapd-vol1 on port 49170
[2014-11-12 13:32:35.957462] W [socket.c:529:__socket_rwv] 0-management: readv on /var/run/22f16287a2b97835e475c3bbf5501834.socket failed (No data available)
[2014-11-12 13:32:36.109356] I [MSGID: 106006] [glusterd-handler.c:4238:__glusterd_snapd_rpc_notify] 0-management: snapd for volume vol1 has disconnected from glusterd.
5) Restarted glusterd and accessed .snaps - successful
6) Access .snaps from fuse and nfs mount again, while trying to cd to .snaps from NFS mount , snapd on the server always went down
7) Tried to stop the volume, start it again and then access .snaps . From Fuse mount, it was successful, but from NFS mount cd to .snaps was hung
We were not able to re-create this problem with the below setup: Installed glusterfs-3.6.0.35 Created 4 node cluster Created 2x2 volume Followed the instruction mentioned in the description Patch https://code.engineering.redhat.com/gerrit/#/c/37398/ has fixed this issue. Able to recreate the issue with exactly same steps on build: glusterfs-3.6.0.36-1.el6.x86_64 From Fuse it took more than a minute, and from NFS it took more than 3 mins From Fuse: ========== [root@wingo vol0]# pwd /mnt/vol0 [root@wingo vol0]# time cd .snaps real 1m3.043s user 0m0.000s sys 0m0.000s [root@wingo .snaps]# From NFS: ========= [root@wingo ~]# cd /mnt/nvol0 [root@wingo nvol0]# [root@wingo nvol0]# time cd .snaps real 3m3.043s user 0m0.000s sys 0m0.002s [root@wingo .snaps]# [root@wingo .snaps]# rpm -qa | grep glusterfs-3.6.0.36-1.el6.x86_64 glusterfs-3.6.0.36-1.el6.x86_64 [root@wingo .snaps]# In general, do uss on when a node was done, and cd to .snaps it takes too long. Moving back to assigned state Version : glusterfs 3.6.0.36 ======== Another scenario where cd to .snaps hangs and sometimes fails with "Transport endpoint not connected" from Fuse mount and "I/O Error" from NFS mount - Create a 2x2 dist-rep volume - Fuse and NFS mount the volume & enable USS - Create some IO - Take few snapshots - Bring down glusterd on node2 - Activate one of the snapshots - From both fuse and nfs mounts cd to .snaps and list the snaps --> it hangs - From a different terminal cd to .snaps and list the snaps , it fails with "Transport endpoint not connected" from Fuse mount and "I/O Error" from NFS mount [root@dhcp-0-97 .snaps]# ll ls: reading directory .: Transport endpoint is not connected total 0 [root@dhcp-0-97 .snaps]# ll ls: cannot open directory .: Transport endpoint is not connected [root@dhcp-0-97 .snaps]# ll ls: cannot open directory .: Input/output error [root@dhcp-0-97 .snaps]# pwd /mnt/vol0_nfs/nfs_etc.1/.snaps Based on Comment8 and Comment9 , changing the severity of this bug to Urgent since the issue is reproduced quite often Version :glusterfs 3.6.0.40 ======= Repeated the steps as mentioned in Description, Comment8 and Comment 9 , unable to reproduce the issue. The issue mentioned in Comment4 is tracked by bz 1163750 Marking the bug as 'Verified' Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0038.html |