1381831 – dom_md/ids is always reported in the self-heal info

Bug 1381831 - dom_md/ids is always reported in the self-heal info

Summary: dom_md/ids is always reported in the self-heal info

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	arbiter
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Ravishankar N
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1351528
TreeView+	depends on / blocked

Reported:	2016-10-05 07:32 UTC by RamaKasturi
Modified:	2021-06-10 11:34 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.8.4-3
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-03-23 06:08:12 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description RamaKasturi 2016-10-05 07:32:21 UTC

Description of problem:
I see that /4f007a3a-612f-40db-8b07-6666e8259957/dom_md/ids is always reported in the self-heal info.

Version-Release number of selected component (if applicable):
glusterfs-3.8.4-2.el7rhgs.x86_64

How reproducible:
File in the description is always seen in the self-heal info

Steps to Reproduce:
1. Install HC stack on RHV-H + arbiter
2. Enable SSL on the volumes.
3. Now create vms on the volumes.
4. Run 'gluster volume heal engine info' and 'gluster volume heal vmstore info'

Actual results:

[root@rhsqa-grafton2 ~]# gluster volume heal vmstore info
Brick 10.70.36.79:/rhgs/brick3/vmstore
Status: Connected
Number of entries: 0

Brick 10.70.36.80:/rhgs/brick3/vmstore
/4f007a3a-612f-40db-8b07-6666e8259957/dom_md/ids
Status: Connected
Number of entries: 1

Brick 10.70.36.81:/rhgs/brick3/vmstore
/4f007a3a-612f-40db-8b07-6666e8259957/dom_md/ids
Status: Connected
Number of entries: 1

[root@rhsqa-grafton2 ~]# gluster volume heal engine info
Brick 10.70.36.79:/rhgs/brick1/engine
Status: Connected
Number of entries: 0

Brick 10.70.36.80:/rhgs/brick1/engine
/53c84f1e-3643-45aa-805e-8c9e92ee3098/dom_md/ids
/53c84f1e-3643-45aa-805e-8c9e92ee3098/images/f0c14312-7e49-464f-9660-3b629fb8b538/7efedd28-7a43-4142-8bf7-fe468376626f
Status: Connected
Number of entries: 2

Brick 10.70.36.81:/rhgs/brick1/engine
/53c84f1e-3643-45aa-805e-8c9e92ee3098/dom_md/ids
/53c84f1e-3643-45aa-805e-8c9e92ee3098/images/f0c14312-7e49-464f-9660-3b629fb8b538/7efedd28-7a43-4142-8bf7-fe468376626f
Status: Connected
Number of entries: 2

I see that this file requires data heal and is always reported in the heal info.

Expected results:
If there is pending heal it should heal the file and should not appear always in the heal info.

Additional info:

Comment 2 RamaKasturi 2016-10-05 07:36:53 UTC

extended attributes on the files which are reported in heal info:

extended attributes for the file on the engine volume from all nodes:
=====================================================================

[root@rhsqa-grafton1 ~]# getfattr -d -m . -e hex /rhgs/brick1/engine//53c84f1e-3643-45aa-805e-8c9e92ee3098/dom_md/ids
getfattr: Removing leading '/' from absolute path names
# file: rhgs/brick1/engine//53c84f1e-3643-45aa-805e-8c9e92ee3098/dom_md/ids
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x080000000000000057f3b7260004776c
trusted.gfid=0x496e047d725f4a0b87a131f47be477a9
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000


[root@rhsqa-grafton2 ~]# getfattr -d -m . -e hex /rhgs/brick1/engine//53c84f1e-3643-45aa-805e-8c9e92ee3098/dom_md/ids
getfattr: Removing leading '/' from absolute path names
# file: rhgs/brick1/engine//53c84f1e-3643-45aa-805e-8c9e92ee3098/dom_md/ids
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.engine-client-0=0x000000010000000000000000
trusted.bit-rot.version=0x040000000000000057f3b085000093a5
trusted.gfid=0x496e047d725f4a0b87a131f47be477a9
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000

[root@rhsqa-grafton3 ~]# getfattr -d -m . -e hex /rhgs/brick1/engine//53c84f1e-3643-45aa-805e-8c9e92ee3098/dom_md/ids
getfattr: Removing leading '/' from absolute path names
# file: rhgs/brick1/engine//53c84f1e-3643-45aa-805e-8c9e92ee3098/dom_md/ids
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.engine-client-0=0x000000050000000000000000
trusted.bit-rot.version=0x040000000000000057f3b0850000a228
trusted.gfid=0x496e047d725f4a0b87a131f47be477a9
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000

extended attributes for the file on vmstore volume:
=================================================================

[root@rhsqa-grafton1 ~]# getfattr -d -m . -e hex /rhgs/brick3/vmstore/4f007a3a-612f-40db-8b07-6666e8259957/dom_md/ids
getfattr: Removing leading '/' from absolute path names
# file: rhgs/brick3/vmstore/4f007a3a-612f-40db-8b07-6666e8259957/dom_md/ids
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x070000000000000057f3b7260004756f
trusted.gfid=0x16a892f912294233aea514be469b926d
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000

[root@rhsqa-grafton2 ~]# getfattr -d -m . -e hex /rhgs/brick3/vmstore/4f007a3a-612f-40db-8b07-6666e8259957/dom_md/ids
getfattr: Removing leading '/' from absolute path names
# file: rhgs/brick3/vmstore/4f007a3a-612f-40db-8b07-6666e8259957/dom_md/ids
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.vmstore-client-0=0x000000060000000000000000
trusted.bit-rot.version=0x030000000000000057f3b08a00027c98
trusted.gfid=0x16a892f912294233aea514be469b926d
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000

[root@rhsqa-grafton3 ~]# getfattr -d -m . -e hex /rhgs/brick3/vmstore/4f007a3a-612f-40db-8b07-6666e8259957/dom_md/ids
getfattr: Removing leading '/' from absolute path names
# file: rhgs/brick3/vmstore/4f007a3a-612f-40db-8b07-6666e8259957/dom_md/ids
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.vmstore-client-0=0x000000060000000000000000
trusted.bit-rot.version=0x030000000000000057f3b08a00027f8e
trusted.gfid=0x16a892f912294233aea514be469b926d
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000

Comment 3 RamaKasturi 2016-10-05 08:26:30 UTC

gluster volume info details:
==================================
[root@rhsqa-grafton3 ~]# gluster volume info engine
 
Volume Name: engine
Type: Replicate
Volume ID: 03c68517-4be1-45e3-b788-87e10d73f3ee
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.36.79:/rhgs/brick1/engine
Brick2: 10.70.36.80:/rhgs/brick1/engine
Brick3: 10.70.36.81:/rhgs/brick1/engine (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
network.ping-timeout: 30
user.cifs: off
performance.strict-o-direct: on
auth.ssl-allow: 10.70.36.79,10.70.36.80,10.70.36.81
client.ssl: on
server.ssl: on

[root@rhsqa-grafton3 ~]# gluster volume info data
 
Volume Name: data
Type: Replicate
Volume ID: 03454b82-d4ea-4cf5-85c3-29bee7afd87f
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.36.79:/rhgs/brick2/data
Brick2: 10.70.36.80:/rhgs/brick2/data
Brick3: 10.70.36.81:/rhgs/brick2/data (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
network.ping-timeout: 30
user.cifs: off
performance.strict-o-direct: on
auth.ssl-allow: 10.70.36.79,10.70.36.80,10.70.36.81
client.ssl: on
server.ssl: on

[root@rhsqa-grafton3 ~]# gluster volume info vmstore
 
Volume Name: vmstore
Type: Replicate
Volume ID: 16fb0e38-4a9c-4468-8a51-fa8dc5a8dc06
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.36.79:/rhgs/brick3/vmstore
Brick2: 10.70.36.80:/rhgs/brick3/vmstore
Brick3: 10.70.36.81:/rhgs/brick3/vmstore (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
network.ping-timeout: 30
user.cifs: off
performance.strict-o-direct: on
auth.ssl-allow: 10.70.36.79,10.70.36.80,10.70.36.81
client.ssl: on
server.ssl: on

Comment 4 RamaKasturi 2016-10-05 08:27:46 UTC

sosreports are present in the link below: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1381822/

Comment 5 RamaKasturi 2016-10-05 12:22:31 UTC

Following errors seen in engine mount log:
================================================

[2016-10-04 13:27:42.529680] I [MSGID: 108006] [afr-common.c:4439:afr_local_init] 0-engine-replicate-0: no subvolumes up
[2016-10-04 13:27:42.529730] E [MSGID: 133014] [shard.c:1129:shard_common_stat_cbk] 0-engine-shard: stat failed: 496e047d-725f-4a0b-87a1-31f47be477a9 [Transport endpoint is 
not connected]
[2016-10-04 13:27:42.716866] I [MSGID: 108006] [afr-common.c:4439:afr_local_init] 0-engine-replicate-0: no subvolumes up
[2016-10-04 13:27:42.716918] E [MSGID: 133014] [shard.c:1129:shard_common_stat_cbk] 0-engine-shard: stat failed: d27d0848-cf13-4aa6-a012-d2cc8f3b9a6a [Transport endpoint is 
not connected]
[2016-10-04 13:27:43.030224] E [MSGID: 133014] [shard.c:1129:shard_common_stat_cbk] 0-engine-shard: stat failed: 496e047d-725f-4a0b-87a1-31f47be477a9 [Transport endpoint is 
not connected]
[2016-10-04 13:27:47.534122] W [fuse-bridge.c:767:fuse_attr_cbk] 0-glusterfs-fuse: 339820: FSTAT() /53c84f1e-3643-45aa-805e-8c9e92ee3098/dom_md/ids => -1 (Transport endpoint
 is not connected)
[2016-10-04 13:27:57.229171] W [fuse-bridge.c:767:fuse_attr_cbk] 0-glusterfs-fuse: 339862: FSTAT() /53c84f1e-3643-45aa-805e-8c9e92ee3098/images/f0c14312-7e49-464f-9660-3b629
fb8b538/7efedd28-7a43-4142-8bf7-fe468376626f => -1 (Transport endpoint is not connected)
[2016-10-04 13:28:07.050637] W [fuse-bridge.c:767:fuse_attr_cbk] 0-glusterfs-fuse: 339904: FSTAT() /53c84f1e-3643-45aa-805e-8c9e92ee3098/dom_md/ids => -1 (Transport endpoint
 is not connected)
[2016-10-04 13:28:10.753676] E [socket.c:2309:socket_connect_finish] 0-engine-client-0: connection to 10.70.36.79:24007 failed (Connection refused)
[2016-10-04 13:28:11.990676] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.36.80 (No data available)
[2016-10-04 13:28:11.990716] I [glusterfsd-mgmt.c:1959:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting to next volfile server 10.70.36.81
[2016-10-04 13:28:14.774318] E [socket.c:2309:socket_connect_finish] 0-engine-client-2: connection to 10.70.36.81:24007 failed (Connection refused)
[2016-10-04 13:28:14.778947] E [socket.c:2309:socket_connect_finish] 0-engine-client-1: connection to 10.70.36.80:24007 failed (Connection refused)
[2016-10-04 13:28:16.745810] W [fuse-bridge.c:767:fuse_attr_cbk] 0-glusterfs-fuse: 339946: FSTAT() /53c84f1e-3643-45aa-805e-8c9e92ee3098/images/f0c14312-7e49-464f-9660-3b629
fb8b538/7efedd28-7a43-4142-8bf7-fe468376626f => -1 (Transport endpoint is not connected)
[2016-10-04 13:28:22.814018] E [socket.c:2309:socket_connect_finish] 0-glusterfs: connection to 10.70.36.81:24007 failed (Connection refused)
[2016-10-04 13:28:22.814070] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.36.81 (Transport endpoint is not connect
ed)
[2016-10-04 13:28:22.814084] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers

Comment 6 RamaKasturi 2016-10-05 12:24:47 UTC

Following error messages are seen in vmstore mount log:
======================================================
[2016-10-04 13:28:10.152365] E [MSGID: 133014] [shard.c:1129:shard_common_stat_cbk] 0-vmstore-shard: stat failed: 16a892f9-1229-4233-aea5-14be469b926d [Transport endpoint is
 not connected]
[2016-10-04 13:28:10.652808] I [MSGID: 108006] [afr-common.c:4439:afr_local_init] 0-vmstore-replicate-0: no subvolumes up
[2016-10-04 13:28:10.652858] E [MSGID: 133014] [shard.c:1129:shard_common_stat_cbk] 0-vmstore-shard: stat failed: 16a892f9-1229-4233-aea5-14be469b926d [Transport endpoint is
 not connected]
[2016-10-04 13:28:11.153299] I [MSGID: 108006] [afr-common.c:4439:afr_local_init] 0-vmstore-replicate-0: no subvolumes up
[2016-10-04 13:28:11.153350] E [MSGID: 133014] [shard.c:1129:shard_common_stat_cbk] 0-vmstore-shard: stat failed: 16a892f9-1229-4233-aea5-14be469b926d [Transport endpoint is
 not connected]
[2016-10-04 13:28:11.228410] I [MSGID: 108006] [afr-common.c:4439:afr_local_init] 0-vmstore-replicate-0: no subvolumes up
[2016-10-04 13:28:11.653757] I [MSGID: 108006] [afr-common.c:4439:afr_local_init] 0-vmstore-replicate-0: no subvolumes up
[2016-10-04 13:28:11.653771] E [MSGID: 133014] [shard.c:1129:shard_common_stat_cbk] 0-vmstore-shard: stat failed: 16a892f9-1229-4233-aea5-14be469b926d [Transport endpoint is
 not connected]
[2016-10-04 13:28:11.990719] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.36.80 (No data available)
[2016-10-04 13:28:11.990762] I [glusterfsd-mgmt.c:1959:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting to next volfile server 10.70.36.81
[2016-10-04 13:28:12.154230] I [MSGID: 108006] [afr-common.c:4439:afr_local_init] 0-vmstore-replicate-0: no subvolumes up
[2016-10-04 13:28:12.154303] E [MSGID: 133014] [shard.c:1129:shard_common_stat_cbk] 0-vmstore-shard: stat failed: 16a892f9-1229-4233-aea5-14be469b926d [Transport endpoint is
 not connected]
[2016-10-04 13:28:12.654727] I [MSGID: 108006] [afr-common.c:4439:afr_local_init] 0-vmstore-replicate-0: no subvolumes up
[2016-10-04 13:28:12.654780] E [MSGID: 133014] [shard.c:1129:shard_common_stat_cbk] 0-vmstore-shard: stat failed: 16a892f9-1229-4233-aea5-14be469b926d [Transport endpoint is
 not connected]
[2016-10-04 13:28:13.155210] I [MSGID: 108006] [afr-common.c:4439:afr_local_init] 0-vmstore-replicate-0: no subvolumes up
[2016-10-04 13:28:13.155264] E [MSGID: 133014] [shard.c:1129:shard_common_stat_cbk] 0-vmstore-shard: stat failed: 16a892f9-1229-4233-aea5-14be469b926d [Transport endpoint is
 not connected]
[2016-10-04 13:28:13.610936] I [MSGID: 108006] [afr-common.c:4439:afr_local_init] 0-vmstore-replicate-0: no subvolumes up
[2016-10-04 13:28:13.655747] I [MSGID: 108006] [afr-common.c:4439:afr_local_init] 0-vmstore-replicate-0: no subvolumes up
[2016-10-04 13:28:13.655762] E [MSGID: 133014] [shard.c:1129:shard_common_stat_cbk] 0-vmstore-shard: stat failed: 16a892f9-1229-4233-aea5-14be469b926d [Transport endpoint is
 not connected]
[2016-10-04 13:28:13.918741] E [socket.c:2309:socket_connect_finish] 0-vmstore-client-1: connection to 10.70.36.80:24007 failed (Connection refused)
[2016-10-04 13:28:14.156240] I [MSGID: 108006] [afr-common.c:4439:afr_local_init] 0-vmstore-replicate-0: no subvolumes up
[2016-10-04 13:28:14.156317] E [MSGID: 133014] [shard.c:1129:shard_common_stat_cbk] 0-vmstore-shard: stat failed: 16a892f9-1229-4233-aea5-14be469b926d [Transport endpoint is
 not connected]
[2016-10-04 13:28:16.928912] E [socket.c:2309:socket_connect_finish] 0-vmstore-client-2: connection to 10.70.36.81:24007 failed (Connection refused)
[2016-10-04 13:28:22.958364] E [socket.c:2309:socket_connect_finish] 0-glusterfs: connection to 10.70.36.81:24007 failed (Connection refused)
[2016-10-04 13:28:22.958424] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.36.81 (Transport endpoint is not connect
ed)
[2016-10-04 13:28:22.958438] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
The message "I [MSGID: 108006] [afr-common.c:4439:afr_local_init] 0-vmstore-replicate-0: no subvolumes up" repeated 24 times between [2016-10-04 13:28:14.156240] and [2016-1
0-04 13:28:24.665181]
The message "E [MSGID: 133014] [shard.c:1129:shard_common_stat_cbk] 0-vmstore-shard: stat failed: 16a892f9-1229-4233-aea5-14be469b926d [Transport endpoint is not connected]"
 repeated 21 times between [2016-10-04 13:28:14.156317] and [2016-10-04 13:28:24.665196]
[2016-10-04 13:28:24.784468] W [glusterfsd.c:1288:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f0a063fcdc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x
7f0a07a92c45] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7f0a07a92abb] ) 0-: received signum (15), shutting down

Comment 7 Ravishankar N 2016-10-05 17:10:56 UTC

Hi Kasturi, at a first glance, the afr xattrs for the 'ids' file (comment #2) seem to indicate a pending heal on the first brick, which is why heal info shows these entries. I'm not sure what the issue is. Is the file not getting healed when you run 'gluster vol heal volname' when all bricks and shds are up?

Comment 8 RamaKasturi 2016-10-06 06:41:41 UTC

Ravi, i have not ran gluster vol heal <vol_name> as i was expecting that files should automatically get healed. gluster volume status shows that all bricks and shd are up. My setup is currently down due to SSL issue. I will get back to you on this once the other issue is resolved.

Comment 12 Ravishankar N 2016-11-08 04:55:25 UTC

Hi Kasturi, looks like the sos reports in comment #4 are for BZ 1381822. Could you provide the links to the one for this BZ?

Comment 13 RamaKasturi 2016-11-08 05:35:29 UTC

Hi Ravi,
 
   Logs are the same. Directory name i have created with the other BZ. 

Thanks
kasturi

Comment 14 Ravishankar N 2016-11-08 12:36:38 UTC

In the client logs, there are frequent disconnects to the bricks and in some cases, it is not able to connect to client-0 after  a disconnect because of glusterd not serving the client the port number of the brick to connect to because of BZ 1381822 (Too many open files in glusterd). This is likely the reason for the constant healing needed on client-0 for the ids file. Moving the BZ to ON_QA.

Comment 15 RamaKasturi 2016-12-15 07:11:38 UTC

Verified and works fine with build glusterfs-3.8.4-8.el7rhgs.x86_64.

procedure 1:
================

1) Killed one of the brick in vmstore volume.

2) Created a new vm.

3) Brought back the brick which was down by running 'gluster volume start <vol_name> force"

4) Once the brick is brought up back i see that all the entries get healed and 'gluster volume heal vmstore info' reports nothing after sometime.

procedure 2:
==================

1) Created vms on the setup.

2) started running  I/O using dd inside the vm.

3) killed one of the data brick.

4) After sometime brought the brick up.

5) I see that heal happens successfully and heal info reports zero entries.

Comment 17 errata-xmlrpc 2017-03-23 06:08:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.