Bug 1394752

Summary: Seeing error messages [snapview-client.c:283:gf_svc_lookup_cbk] and [dht-helper.c:1666ht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x5d75c)
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED ERRATA QA Contact: Prasad Desala <tdesala>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1395261 (view as bug list) Environment:
Last Closed: 2017-03-23 06:18:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1395261, 1395510, 1395517    
Bug Blocks: 1351528    

Description Nag Pavan Chilakam 2016-11-14 11:57:40 UTC
On my systemic setup, 
I am seeing lot of error messages on my clients as below


[2016-11-14 02:43:50.274000] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-sysvol-snapview-client: Lookup failed on normal graph with error Transport endpoint is not connected
[2016-11-14 02:43:50.275390] E [dht-helper.c:1666:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x5d75c) [0x7f2a4ee4175c] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so(+0x4623c) [0x7f2a4eba023c] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so(+0x99b0) [0x7f2a4eb639b0] ) 0-sysvol-dht: invalid argument: inode [Invalid argument]


I see these message repeating about every 20 min in bulk

Vol info is as below
Volume Name: sysvol
Type: Distributed-Replicate
Volume ID: b1ef4d84-0614-4d5d-9e2e-b19183996e43
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: 10.70.35.191:/rhs/brick1/sysvol
Brick2: 10.70.37.108:/rhs/brick1/sysvol
Brick3: 10.70.35.3:/rhs/brick1/sysvol
Brick4: 10.70.37.66:/rhs/brick1/sysvol
Brick5: 10.70.35.191:/rhs/brick2/sysvol
Brick6: 10.70.37.108:/rhs/brick2/sysvol
Brick7: 10.70.35.3:/rhs/brick2/sysvol
Brick8: 10.70.37.66:/rhs/brick2/sysvol
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.stat-prefetch: on
performance.cache-invalidation: on
cluster.shd-max-threads: 10
features.cache-invalidation-timeout: 400
features.cache-invalidation: on
performance.md-cache-timeout: 300
features.uss: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
[root@dhcp35-191 ~]# gluster v status
Status of volume: sysvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.191:/rhs/brick1/sysvol       N/A       N/A        N       N/A  
Brick 10.70.37.108:/rhs/brick1/sysvol       49152     0          Y       27848
Brick 10.70.35.3:/rhs/brick1/sysvol         N/A       N/A        N       N/A  
Brick 10.70.37.66:/rhs/brick1/sysvol        49152     0          Y       28853
Brick 10.70.35.191:/rhs/brick2/sysvol       49153     0          Y       18344
Brick 10.70.37.108:/rhs/brick2/sysvol       N/A       N/A        N       N/A  
Brick 10.70.35.3:/rhs/brick2/sysvol         49153     0          Y       11727
Brick 10.70.37.66:/rhs/brick2/sysvol        N/A       N/A        N       N/A  
Snapshot Daemon on localhost                49154     0          Y       18461
Self-heal Daemon on localhost               N/A       N/A        Y       18364
Quota Daemon on localhost                   N/A       N/A        Y       18410
Snapshot Daemon on 10.70.35.3               49154     0          Y       11826
Self-heal Daemon on 10.70.35.3              N/A       N/A        Y       11747
Quota Daemon on 10.70.35.3                  N/A       N/A        Y       11779
Snapshot Daemon on 10.70.37.66              49154     0          Y       28970
Self-heal Daemon on 10.70.37.66             N/A       N/A        Y       28892
Quota Daemon on 10.70.37.66                 N/A       N/A        Y       28923
Snapshot Daemon on 10.70.37.108             49154     0          Y       27965
Self-heal Daemon on 10.70.37.108            N/A       N/A        Y       27887
Quota Daemon on 10.70.37.108                N/A       N/A        Y       27918
 
Task Status of Volume sysvol
------------------------------------------------------------------------------
There are no active volume tasks




Client IO patterns can be found at :
https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=760435885

Comment 3 Nithya Balachandran 2016-11-16 04:30:04 UTC
Upstream patch:

Master: http://review.gluster.org/#/c/15847/

Comment 4 Nithya Balachandran 2016-11-16 06:05:59 UTC
Patches:

Upstream:
master: http://review.gluster.org/15847
release-3.8 : http://review.gluster.org/15850
release-3.9 : http://review.gluster.org/15851


Downstream:
https://code.engineering.redhat.com/gerrit/#/c/90283/

Comment 8 Prasad Desala 2016-12-13 05:39:10 UTC
Verified this BZ on glusterfs version 3.8.4-8.el7rhgs.x86_64.
Followed the same steps mentioned in Comment 2 and did not see the errors reported in this BZ.

Moving this BZ to Verified.

Comment 10 errata-xmlrpc 2017-03-23 06:18:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html