Bug 1159173 - [USS] Logging is completely messed up after enabling the USS and carrying out few cases, snapd logs and nfs logs exhausted 47GB of logs in 3 hours
Summary: [USS] Logging is completely messed up after enabling the USS and carrying out...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: snapshot
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: RHGS 3.0.3
Assignee: Sachin Pandit
QA Contact: Rahul Hinduja
URL:
Whiteboard: USS
Depends On: 1166197 1175736
Blocks: 1154965 1162694
TreeView+ depends on / blocked
 
Reported: 2014-10-31 06:39 UTC by Rahul Hinduja
Modified: 2016-09-17 12:55 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.6.0.35-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-01-15 13:41:37 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0038 0 normal SHIPPED_LIVE Red Hat Storage 3.0 enhancement and bug fix update #3 2015-01-15 18:35:28 UTC

Description Rahul Hinduja 2014-10-31 06:39:03 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Rahul Hinduja 2014-10-31 07:01:48 UTC
Description of problem:
=======================

snapd logs are filled with following error messages. 

From snapX*.log
================

[root@inception ~]# tail -100 /var/log/glusterfs/snap2-c0e76752-8ec5-42be-8b1f-13a13670aed8.log | grep "2014-10-30 14:55:01"
[2014-10-30 14:44:55.934297] W [glfs-[2014-10-30 14:55:01.085073] W [glfs-handleops.c:1086:glfs_h_create_from_handle] 0-meta-autoload: inode refresh of 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6 failed: No such file or directory
[2014-10-30 14:55:01.085139] E [snapview-server.c:153:svs_lookup_gfid] 0-vol2-snapview-server: failed to do lookup and get the handle on the snapshot (null) (path: <gfid:5677aeba-c1dd-43c6-a65b-ecd7ef0557b6>, gfid: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6)
[2014-10-30 14:55:01.085142] W [glfs-handleops.c:1086:glfs_h_create_from_handle] 0-meta-autoload: inode refresh of 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6 failed: No such file or directory
[2014-10-30 14:55:01.085161] E [snapview-server.c:153:svs_lookup_gfid] 0-vol2-snapview-server: failed to do lookup and get the handle on the snapshot (null) (path: <gfid:5677aeba-c1dd-43c6-a65b-ecd7ef0557b6>, gfid: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6)
[2014-10-30 14:55:01.085237] W [glfs-handleops.c:1086:glfs_h_create_from_handle] 0-meta-autoload: inode refresh of 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6 failed: No such file or directory
[2014-10-30 14:55:01.085261] E [snapview-server.c:153:svs_lookup_gfid] 0-vol2-snapview-server: failed to do lookup and get the handle on the snapshot (null) (path: <gfid:5677aeba-c1dd-43c6-a65b-ecd7ef0557b6>, gfid: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6)
[2014-10-30 14:55:01.085334] W [glfs-handleops.c:1086:glfs_h_create_from_handle] 0-meta-autoload: inode refresh of 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6 failed: No such file or directory
[2014-10-30 14:55:01.085353] E [snapview-server.c:153:svs_lookup_gfid] 0-vol2-snapview-server: failed to do lookup and get the handle on the snapshot (null) (path: <gfid:5677aeba-c1dd-43c6-a65b-ecd7ef0557b6>, gfid: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6)
[2014-10-30 14:55:01.085399] W [glfs-handleops.c:1086:glfs_h_create_from_handle] 0-meta-autoload: inode refresh of 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6 failed: No such file or directory
[2014-10-30 14:55:01.085427] E [snapview-server.c:153:svs_lookup_gfid] 0-vol2-snapview-server: failed to do lookup and get the handle on the snapshot (null) (path: <gfid:5677aeba-c1dd-43c6-a65b-ecd7ef0557b6>, gfid: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6)
[2014-10-30 14:55:01.085665] W [glfs-handleops.c:1086:glfs_h_create_from_handle] 0-meta-autoload: inode refresh of 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6 failed: No such file or directory
[2014-10-30 14:55:01.085688] E [snapview-server.c:153:svs_lookup_gfid] 0-vol2-snapview-server: failed to do lookup and get the handle on the snapshot (null) (path: <gfid:5677aeba-c1dd-43c6-a65b-ecd7ef0557b6>, gfid: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6)
[2014-10-30 14:55:01.085695] W [glfs-handleops.c:1086:glfs_h_create_from_handle] 0-meta-autoload: inode refresh of 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6 failed: No such file or directory
[2014-10-30 14:55:01.085714] E [snapview-server.c:153:svs_lookup_gfid] 0-vol2-snapview-server: failed to do lookup and get the handle on the snapshot (null) (path: <gfid:5677aeba-c1dd-43c6-a65b-ecd7ef0557b6>, gfid: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6)
[2014-10-30 14:55:01.085736] W [glfs-handleops.c:1086:glfs_h_create_from_handle] 0-meta-autoload: inode refresh of 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6 failed: No such file or directory
[2014-10-30 14:55:01.085752] E [snapview-server.c:153:svs_lookup_gfid] 0-vol2-snapview-server: failed to do lookup and get the handle on the snapshot (null) (path: <gfid:5677aeba-c1dd-43c6-a65b-ecd7ef0557b6>, gfid: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6)
[2014-10-30 14:55:01.085782] W [glfs-handleops.c:1086:glfs_h_create_from_handle] 0-meta-autoload: inode refresh of 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6 failed: No such file or directory
[2014-10-30 14:55:01.085802] E [snapview-server.c:153:svs_lookup_gfid] 0-vol2-snapview-server: failed to do lookup and get the handle on the snapshot (null) (path: <gfid:5677aeba-c1dd-43c6-a65b-ecd7ef0557b6>, gfid: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6)
[2014-10-30 14:55:01.085827] W [glfs-handleops.c:1086:glfs_h_create_from_handle] 0-meta-autoload: inode refresh of 5677aeba-c1dd-43c6-a65b-ecd7ef
[root@inception ~]#


From volx-snapd.log:
====================

[root@inception ~]# tail -100 /var/log/glusterfs/vol2-snapd.log | grep "2014-10-30 14:44:25"
[2014-10-30 14:43:56.844035] W [server-resolve.c:122:res[2014-10-30 14:44:25.930293] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.934178] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.934209] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.934226] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.934304] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.934357] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.934418] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.934426] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.934438] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.934562] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.934921] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.935004] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.935039] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.935108] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.935202] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.935237] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.935324] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.935362] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.935469] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.935695] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.935881] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.935902] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.935954] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.936063] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.936099] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[2014-10-30 14:44:25.936224] W [server-resolve[2014-10-30 14:44:55.932522] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol2-server: 5677aeba-c1dd-43c6-a65b-ecd7ef0557b6: failed to resolve (Stale file handle)
[root@inception ~]#


Note:
=====

Every second these many logs are reported 

There are only 2 files a and b having following content:

[root@wingo vol2]# cat a b
abc
def

ajkadj
skjfksfj
[root@wingo vol2]# 

All these logs started filling after hitting the issue mentioned in BZ 1158883 and BZ 1158898


Size:
=====
[root@inception ~]# du -sh /var/log/glusterfs/snap2-c0e76752-8ec5-42be-8b1f-13a13670aed8.log
23G	/var/log/glusterfs/snap2-c0e76752-8ec5-42be-8b1f-13a13670aed8.log
[root@inception ~]# du -sh /var/log/glusterfs/vol2-snapd.log 
9.6G	/var/log/glusterfs/vol2-snapd.log
[root@inception ~]# 





Version-Release number of selected component (if applicable):
==============================================================

glusterfs-3.6.0.30-1.el6rhs.x86_64

Steps to Reproduce:
===================
1. Carry the scenario mentioned in bz 1158883 and 1158898

Actual results:
===============

Logs are filled in 33 GB's in hours and no more space left on root device which is of 50G

Expected results:
=================

At first place we need to investigate why these error messages are coming, and if these are the valid error messages than the amount in which they are logged is just not acceptable. It fills the complete file system

Comment 3 Rahul Hinduja 2014-10-31 07:09:33 UTC
When looked into the dmesg, observed that it has also hit OOM. Just to highlight this machine is physical machine having 32G of memory and 24 processor.

Since there is no space left on machine, I have copied /var/log to my local system for investigation. 

nrpe invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
nrpe cpuset=/ mems_allowed=0-1
Pid: 2580, comm: nrpe Not tainted 2.6.32-504.el6.x86_64 #1
Call Trace:
 [<ffffffff810d40c1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
 [<ffffffff81127300>] ? dump_header+0x90/0x1b0
 [<ffffffff8122ea2c>] ? security_real_capable_noaudit+0x3c/0x70
 [<ffffffff81127782>] ? oom_kill_process+0x82/0x2a0
 [<ffffffff811276c1>] ? select_bad_process+0xe1/0x120
 [<ffffffff81127bc0>] ? out_of_memory+0x220/0x3c0
 [<ffffffff811344df>] ? __alloc_pages_nodemask+0x89f/0x8d0
 [<ffffffff8116c69a>] ? alloc_pages_current+0xaa/0x110
 [<ffffffff811246f7>] ? __page_cache_alloc+0x87/0x90
 [<ffffffff811240de>] ? find_get_page+0x1e/0xa0
 [<ffffffff81125697>] ? filemap_fault+0x1a7/0x500
 [<ffffffff8114eae4>] ? __do_fault+0x54/0x530
 [<ffffffff8114f0b7>] ? handle_pte_fault+0xf7/0xb00
 [<ffffffff811d2db2>] ? fsnotify_clear_marks_by_inode+0x32/0xf0
 [<ffffffff81012bfe>] ? copy_user_generic+0xe/0x20
 [<ffffffff8114fcea>] ? handle_mm_fault+0x22a/0x300
 [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480
 [<ffffffff81014a79>] ? read_tsc+0x9/0x20
 [<ffffffff810aaa21>] ? ktime_get_ts+0xb1/0xf0
 [<ffffffff811a5728>] ? poll_select_copy_remaining+0xf8/0x150
 [<ffffffff8152ffbe>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8152d375>] ? page_fault+0x25/0x30
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
CPU    4: hi:    0, btch:   1 usd:   0
CPU    5: hi:    0, btch:   1 usd:   0
CPU    6: hi:    0, btch:   1 usd:   0
CPU    7: hi:    0, btch:   1 usd:   0
CPU    8: hi:    0, btch:   1 usd:   0
CPU    9: hi:    0, btch:   1 usd:   0
CPU   10: hi:    0, btch:   1 usd:   0
CPU   11: hi:    0, btch:   1 usd:   0
CPU   12: hi:    0, btch:   1 usd:   0
CPU   13: hi:    0, btch:   1 usd:   0
CPU   14: hi:    0, btch:   1 usd:   0
CPU   15: hi:    0, btch:   1 usd:   0
CPU   16: hi:    0, btch:   1 usd:   0
CPU   17: hi:    0, btch:   1 usd:   0
CPU   18: hi:    0, btch:   1 usd:   0
CPU   19: hi:    0, btch:   1 usd:   0
CPU   20: hi:    0, btch:   1 usd:   0
CPU   21: hi:    0, btch:   1 usd:   0
CPU   22: hi:    0, btch:   1 usd:   0
CPU   23: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd: 168
CPU    1: hi:  186, btch:  31 usd:  79
CPU    2: hi:  186, btch:  31 usd: 176
CPU    3: hi:  186, btch:  31 usd:  73
CPU    4: hi:  186, btch:  31 usd: 162
CPU    5: hi:  186, btch:  31 usd:  37
CPU    6: hi:  186, btch:  31 usd: 100
CPU    7: hi:  186, btch:  31 usd:  73
CPU    8: hi:  186, btch:  31 usd:  65
CPU    9: hi:  186, btch:  31 usd:  38
CPU   10: hi:  186, btch:  31 usd:  70
CPU   11: hi:  186, btch:  31 usd:  26
CPU   12: hi:  186, btch:  31 usd: 167
CPU   13: hi:  186, btch:  31 usd:  76
CPU   14: hi:  186, btch:  31 usd: 157
CPU   15: hi:  186, btch:  31 usd:  54
CPU   16: hi:  186, btch:  31 usd: 174
CPU   17: hi:  186, btch:  31 usd: 173
CPU   18: hi:  186, btch:  31 usd:  61
CPU   19: hi:  186, btch:  31 usd:  27
CPU   20: hi:  186, btch:  31 usd:  68
CPU   21: hi:  186, btch:  31 usd:  70
CPU   22: hi:  186, btch:  31 usd:  58
CPU   23: hi:  186, btch:  31 usd:  85
Node 0 Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 183
CPU    1: hi:  186, btch:  31 usd: 169
CPU    2: hi:  186, btch:  31 usd: 157
CPU    3: hi:  186, btch:  31 usd: 163
CPU    4: hi:  186, btch:  31 usd: 169
CPU    5: hi:  186, btch:  31 usd:  91
CPU    6: hi:  186, btch:  31 usd: 102
CPU    7: hi:  186, btch:  31 usd: 184
CPU    8: hi:  186, btch:  31 usd: 171
CPU    9: hi:  186, btch:  31 usd: 113
CPU   10: hi:  186, btch:  31 usd: 146
CPU   11: hi:  186, btch:  31 usd:  43
CPU   12: hi:  186, btch:  31 usd: 162
CPU   13: hi:  186, btch:  31 usd: 101
CPU   14: hi:  186, btch:  31 usd: 163
CPU   15: hi:  186, btch:  31 usd: 168
CPU   16: hi:  186, btch:  31 usd: 172
CPU   17: hi:  186, btch:  31 usd: 176
CPU   18: hi:  186, btch:  31 usd: 150
CPU   19: hi:  186, btch:  31 usd: 137
CPU   20: hi:  186, btch:  31 usd: 154
CPU   21: hi:  186, btch:  31 usd:  90
CPU   22: hi:  186, btch:  31 usd:  56
CPU   23: hi:  186, btch:  31 usd:  26
Node 1 Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 175
CPU    1: hi:  186, btch:  31 usd: 165
CPU    2: hi:  186, btch:  31 usd: 174
CPU    3: hi:  186, btch:  31 usd: 168
CPU    4: hi:  186, btch:  31 usd: 103
CPU    5: hi:  186, btch:  31 usd: 164
CPU    6: hi:  186, btch:  31 usd: 161
CPU    7: hi:  186, btch:  31 usd:  83
CPU    8: hi:  186, btch:  31 usd:  85
CPU    9: hi:  186, btch:  31 usd: 166
CPU   10: hi:  186, btch:  31 usd: 148
CPU   11: hi:  186, btch:  31 usd: 149
CPU   12: hi:  186, btch:  31 usd: 158
CPU   13: hi:  186, btch:  31 usd:  64
CPU   14: hi:  186, btch:  31 usd:  61
CPU   15: hi:  186, btch:  31 usd: 163
CPU   16: hi:  186, btch:  31 usd: 166
CPU   17: hi:  186, btch:  31 usd:  62
CPU   18: hi:  186, btch:  31 usd: 107
CPU   19: hi:  186, btch:  31 usd: 168
CPU   20: hi:  186, btch:  31 usd: 159
CPU   21: hi:  186, btch:  31 usd: 174
CPU   22: hi:  186, btch:  31 usd:  91
CPU   23: hi:  186, btch:  31 usd: 185
active_anon:7303138 inactive_anon:752638 isolated_anon:0
 active_file:235 inactive_file:17 isolated_file:0
 unevictable:9402 dirty:0 writeback:0 unstable:0
 free:39368 slab_reclaimable:3946 slab_unreclaimable:22451
 mapped:1702 shmem:0 pagetables:25452 bounce:0

Comment 4 Rahul Hinduja 2014-10-31 07:20:32 UTC
Even NFS logs are filled to 14G. IN 1 sec nfs has logged 8788 entries for "client-rpc-fops.c:2761:client3_3_lookup_cbk"

[root@inception ~]# tail -10000 /var/log/glusterfs/nfs.log | grep "2014-10-30 14:43:56" | grep "client-rpc-fops.c:2761:client3_3_lookup_cbk" | wc
   8788  140620 2100480
[root@inception ~]# 
[root@inception ~]# du -sh /var/log/glusterfs/nfs.log 
14G	/var/log/glusterfs/nfs.log
[root@inception ~]#

Comment 5 Sachin Pandit 2014-11-06 09:33:43 UTC
Part of the problem has been solved and the patch is sent for review
in the upstream. I'll update this bug once we make further progress on
this bug.

Comment 8 senaik 2014-11-13 10:31:14 UTC
Another scenario where size of snapd log is high and the logs are failed with 'Stale file handler' message 

Steps :
======
-Created 256 snapshots on the volume while IO was going on
-cd to .snaps from Fuse and NFS mount 
-While NFS server is down , try to cd to .snaps and list the snapshots. 
-check snapd log , it is filled with the below log messages :

[2014-11-13 09:06:26.442062] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol1-server: b1d45859-bb42-4057-95e5-b667705e2045: failed to resolve (Stale file handle)
[2014-11-13 09:06:26.442101] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol1-server: b1d45859-bb42-4057-95e5-b667705e2045: failed to resolve (Stale file handle)
[2014-11-13 09:06:26.442112] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol1-server: b1d45859-bb42-4057-95e5-b667705e2045: failed to resolve (Stale file handle)
[2014-11-13 09:06:26.442155] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol1-server: b1d45859-bb42-4057-95e5-b667705e2045: failed to resolve (Stale file handle)
[2014-11-13 09:06:26.442624] W [server-resolve.c:122:resolve_gfid_cbk] 0-vol1-server: b1d45859-bb42-4057-95e5-b667705e2045: failed to resolve (Stale file handle)


du -sh /var/log/glusterfs/vol1-snapd.log
5.1G	/var/log/glusterfs/vol1-snapd.log

Comment 10 Sachin Pandit 2014-11-19 06:46:17 UTC
This issue is consistently reproduced in the below mentioned scenario.

1) create a volume (2X2).
2) enable uss and create few snapshots
3) enter in any of the snapshots from uss world (say /<mnt>/.snaps/snap2)
4) stop the volume 
5) restore to snapshot snap2
6) start the volume
7) try "ls" from the directory snap2.

the command hangs and snapd.log is filled with numerous logs entries. Around 100 entries each second.

We are still working on root causing the issue, I'll update this bug as and when we make a progress.

Comment 11 Vijaikumar Mallikarjuna 2014-11-21 06:23:00 UTC
*** Bug 1115951 has been marked as a duplicate of this bug. ***

Comment 12 Sachin Pandit 2014-11-21 07:08:09 UTC
NFS Client was looking for the snapshot in the wrong place, and was not updating the subvolume once a proper path is resolved. Patch which resolves https://bugzilla.redhat.com/show_bug.cgi?id=1165704 also fixes this issue

Comment 13 Rahul Hinduja 2014-12-17 11:36:39 UTC
Verified with build: glusterfs-3.6.0.38-1.el6rhs.x86_64

Performed the steps mentioned in comment 10, bz - 1158883 and 1158898. The total size of logs is 8.9M

[root@inception ~]# du -sh /var/log/glusterfs
8.9M	/var/log/glusterfs
[root@inception ~]# 

Moving the bug to verified state

Comment 15 errata-xmlrpc 2015-01-15 13:41:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0038.html


Note You need to log in before you can comment on or make changes to this bug.