On my systemic setup, I am seeing lot of error messages on my clients as below [2016-11-14 02:43:50.274000] E [snapview-client.c:283:gf_svc_lookup_cbk] 0-sysvol-snapview-client: Lookup failed on normal graph with error Transport endpoint is not connected [2016-11-14 02:43:50.275390] E [dht-helper.c:1666:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x5d75c) [0x7f2a4ee4175c] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so(+0x4623c) [0x7f2a4eba023c] -->/usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so(+0x99b0) [0x7f2a4eb639b0] ) 0-sysvol-dht: invalid argument: inode [Invalid argument] I see these message repeating about every 20 min in bulk Vol info is as below Volume Name: sysvol Type: Distributed-Replicate Volume ID: b1ef4d84-0614-4d5d-9e2e-b19183996e43 Status: Started Snapshot Count: 0 Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: 10.70.35.191:/rhs/brick1/sysvol Brick2: 10.70.37.108:/rhs/brick1/sysvol Brick3: 10.70.35.3:/rhs/brick1/sysvol Brick4: 10.70.37.66:/rhs/brick1/sysvol Brick5: 10.70.35.191:/rhs/brick2/sysvol Brick6: 10.70.37.108:/rhs/brick2/sysvol Brick7: 10.70.35.3:/rhs/brick2/sysvol Brick8: 10.70.37.66:/rhs/brick2/sysvol Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on performance.stat-prefetch: on performance.cache-invalidation: on cluster.shd-max-threads: 10 features.cache-invalidation-timeout: 400 features.cache-invalidation: on performance.md-cache-timeout: 300 features.uss: on features.quota-deem-statfs: on features.inode-quota: on features.quota: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on [root@dhcp35-191 ~]# gluster v status Status of volume: sysvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.191:/rhs/brick1/sysvol N/A N/A N N/A Brick 10.70.37.108:/rhs/brick1/sysvol 49152 0 Y 27848 Brick 10.70.35.3:/rhs/brick1/sysvol N/A N/A N N/A Brick 10.70.37.66:/rhs/brick1/sysvol 49152 0 Y 28853 Brick 10.70.35.191:/rhs/brick2/sysvol 49153 0 Y 18344 Brick 10.70.37.108:/rhs/brick2/sysvol N/A N/A N N/A Brick 10.70.35.3:/rhs/brick2/sysvol 49153 0 Y 11727 Brick 10.70.37.66:/rhs/brick2/sysvol N/A N/A N N/A Snapshot Daemon on localhost 49154 0 Y 18461 Self-heal Daemon on localhost N/A N/A Y 18364 Quota Daemon on localhost N/A N/A Y 18410 Snapshot Daemon on 10.70.35.3 49154 0 Y 11826 Self-heal Daemon on 10.70.35.3 N/A N/A Y 11747 Quota Daemon on 10.70.35.3 N/A N/A Y 11779 Snapshot Daemon on 10.70.37.66 49154 0 Y 28970 Self-heal Daemon on 10.70.37.66 N/A N/A Y 28892 Quota Daemon on 10.70.37.66 N/A N/A Y 28923 Snapshot Daemon on 10.70.37.108 49154 0 Y 27965 Self-heal Daemon on 10.70.37.108 N/A N/A Y 27887 Quota Daemon on 10.70.37.108 N/A N/A Y 27918 Task Status of Volume sysvol ------------------------------------------------------------------------------ There are no active volume tasks Client IO patterns can be found at : https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=760435885
Upstream patch: Master: http://review.gluster.org/#/c/15847/
Patches: Upstream: master: http://review.gluster.org/15847 release-3.8 : http://review.gluster.org/15850 release-3.9 : http://review.gluster.org/15851 Downstream: https://code.engineering.redhat.com/gerrit/#/c/90283/
Verified this BZ on glusterfs version 3.8.4-8.el7rhgs.x86_64. Followed the same steps mentioned in Comment 2 and did not see the errors reported in this BZ. Moving this BZ to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html