Bug 1451720 - cifs Client triggering heals while deletion of files
Summary: cifs Client triggering heals while deletion of files
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
low
high
Target Milestone: ---
: ---
Assignee: Ravishankar N
QA Contact: Anees Patel
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-17 11:18 UTC by Nag Pavan Chilakam
Modified: 2023-09-14 03:57 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-15 11:17:25 UTC
Embargoed:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2017-05-17 11:18:18 UTC
Description of problem:
========================
When I issued an rm -rf of the root of the volume from cifs mount, I see that the cifs log is population lot of selfheal messages.
There is no necesssity for healing while deletes.
I have check the heal info of the volume and no heals are pending

Version-Release number of selected component (if applicable):
========
3.8.4-25

How reproducible:
===========
2/2


Steps to Reproduce:
1.have a 4 node setup with ctdb samba enabled
2.created a 6x2 volume
3.enabled nl cache and parallel readir options
4. populated lot of files into 3 different directories
5. did negative lookups from two different clients for some time
6. while doing negative lookusp from one client, did rm -rf * from another client

seeing below messages in cifs samba mount log


The message "W [MSGID: 138004] [nl-cache.c:415:nlc_unlink_cbk] 0-saturday-saturday-nl-cache: Failed to get GET_LINK_COUNT from dict" repeated 367 times between [2017-05-17 11:15:42.274741] and [2017-05-17 11:15:56.343824]
[2017-05-17 11:15:56.368917] I [MSGID: 108031] [afr-common.c:2255:afr_local_discovery_cbk] 0-saturday-saturday-replicate-0: selecting local read_child saturday-saturday-client-0
[2017-05-17 11:15:56.369378] I [MSGID: 108031] [afr-common.c:2255:afr_local_discovery_cbk] 0-saturday-saturday-replicate-4: selecting local read_child saturday-saturday-client-8
[2017-05-17 11:15:56.369554] I [MSGID: 108031] [afr-common.c:2255:afr_local_discovery_cbk] 0-saturday-saturday-replicate-2: selecting local read_child saturday-saturday-client-4
The message "I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-saturday-saturday-replicate-4: performing metadata selfheal on 2d79c1cf-798e-4ce7-820f-ce999f9cb089" repeated 11 times between [2017-05-17 11:14:56.916777] and [2017-05-17 11:15:39.798577]
The message "I [MSGID: 108026] [afr-self-heal-common.c:1212:afr_log_selfheal] 0-saturday-saturday-replicate-4: Completed metadata selfheal on 2d79c1cf-798e-4ce7-820f-ce999f9cb089. sources=[0]  sinks=1 " repeated 11 times between [2017-05-17 11:14:56.934556] and [2017-05-17 11:15:39.815418]
The message "I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-saturday-saturday-replicate-4: performing metadata selfheal on 79a010f9-7259-4d67-a793-73bbf6731078" repeated 11 times between [2017-05-17 11:14:56.984354] and [2017-05-17 11:15:41.298009]
The message "I [MSGID: 108026] [afr-self-heal-common.c:1212:afr_log_selfheal] 0-saturday-saturday-replicate-4: Completed metadata selfheal on 79a010f9-7259-4d67-a793-73bbf6731078. sources=[0]  sinks=1 " repeated 11 times between [2017-05-17 11:14:56.994990] and [2017-05-17 11:15:41.312922]

Comment 2 Nag Pavan Chilakam 2017-05-17 11:19:41 UTC
[root@dhcp47-127 samba]# gluster v list
gctdb
saturday-saturday
[root@dhcp47-127 samba]# gluster v heal saturday-saturday info
gBrick dhcp47-127.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick0
Status: Connected
Number of entries: 0

Brick dhcp46-181.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick1
Status: Connected
Number of entries: 0

Brick dhcp46-47.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick2
Status: Connected
Number of entries: 0

Brick dhcp47-140.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick3
Status: Connected
Number of entries: 0

Brick dhcp47-127.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick4
Status: Connected
Number of entries: 0

Brick dhcp46-181.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick5
Status: Connected
Number of entries: 0

Brick dhcp46-47.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick6
Status: Connected
Number of entries: 0

Brick dhcp47-140.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick7
Status: Connected
Number of entries: 0

Brick dhcp47-127.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick8
Status: Connected
Number of entries: 0

Brick dhcp46-181.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick9
Status: Connected
Number of entries: 0

Brick dhcp46-47.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick10
Status: Connected
Number of entries: 0

Brick dhcp47-140.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick11
Status: Connected
Number of entries: 0

[root@dhcp47-127 samba]# gluster v status saturday-saturday
Status of volume: saturday-saturday
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp47-127.lab.eng.blr.redhat.com:/br
icks/brick0/saturday-saturday_brick0        49153     0          Y       22466
Brick dhcp46-181.lab.eng.blr.redhat.com:/br
icks/brick0/saturday-saturday_brick1        49153     0          Y       3740 
Brick dhcp46-47.lab.eng.blr.redhat.com:/bri
cks/brick0/saturday-saturday_brick2         49153     0          Y       13002
Brick dhcp47-140.lab.eng.blr.redhat.com:/br
icks/brick0/saturday-saturday_brick3        49153     0          Y       18266
Brick dhcp47-127.lab.eng.blr.redhat.com:/br
icks/brick1/saturday-saturday_brick4        49154     0          Y       22469
Brick dhcp46-181.lab.eng.blr.redhat.com:/br
icks/brick1/saturday-saturday_brick5        49154     0          Y       3750 
Brick dhcp46-47.lab.eng.blr.redhat.com:/bri
cks/brick1/saturday-saturday_brick6         49154     0          Y       13011
Brick dhcp47-140.lab.eng.blr.redhat.com:/br
icks/brick1/saturday-saturday_brick7        49154     0          Y       18276
Brick dhcp47-127.lab.eng.blr.redhat.com:/br
icks/brick2/saturday-saturday_brick8        49155     0          Y       22484
Brick dhcp46-181.lab.eng.blr.redhat.com:/br
icks/brick2/saturday-saturday_brick9        49155     0          Y       3758 
Brick dhcp46-47.lab.eng.blr.redhat.com:/bri
cks/brick2/saturday-saturday_brick10        49155     0          Y       13020
Brick dhcp47-140.lab.eng.blr.redhat.com:/br
icks/brick2/saturday-saturday_brick11       49155     0          Y       18284
Snapshot Daemon on localhost                49162     0          Y       22546
Self-heal Daemon on localhost               N/A       N/A        Y       22449
Snapshot Daemon on dhcp46-181.lab.eng.blr.r
edhat.com                                   49162     0          Y       3821 
Self-heal Daemon on dhcp46-181.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       3723 
Snapshot Daemon on dhcp46-47.lab.eng.blr.re
dhat.com                                    49162     0          Y       13083
Self-heal Daemon on dhcp46-47.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       12985
Snapshot Daemon on dhcp47-140.lab.eng.blr.r
edhat.com                                   49162     0          Y       18348
Self-heal Daemon on dhcp47-140.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       18249
 
Task Status of Volume saturday-saturday
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp47-127 samba]# gluster v info saturday-saturday
 
Volume Name: saturday-saturday
Type: Distributed-Replicate
Volume ID: 4a24c34c-1144-4f07-9763-6e232c037a67
Status: Started
Snapshot Count: 2
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: dhcp47-127.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick0
Brick2: dhcp46-181.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick1
Brick3: dhcp46-47.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick2
Brick4: dhcp47-140.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick3
Brick5: dhcp47-127.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick4
Brick6: dhcp46-181.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick5
Brick7: dhcp46-47.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick6
Brick8: dhcp47-140.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick7
Brick9: dhcp47-127.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick8
Brick10: dhcp46-181.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick9
Brick11: dhcp46-47.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick10
Brick12: dhcp47-140.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick11
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.nl-cache: on
features.barrier: disable
features.show-snapshot-directory: enable
features.uss: enable
transport.address-family: inet
nfs.disable: on
server.allow-insecure: on
performance.stat-prefetch: on
storage.batch-fsync-delay-usec: 0
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 50000
performance.cache-samba-metadata: on
performance.parallel-readdir: on
[root@dhcp47-127 samba]#

Comment 3 Ravishankar N 2017-05-17 11:29:03 UTC
Nag, 

1. Can you attach the sosreports of the clients and the servers?

2. "I have check the heal info of the volume and no heals are pending" -> Could this be because you ran heal info *before* there was any fop failure and therefore no pending heals (OR) because you ran heal info *after* the client side heal completed?

Comment 4 Nag Pavan Chilakam 2017-05-17 11:29:49 UTC
root@dhcp47-127 samba]# rpm -qa|egrep "gluster|smb|samba|cifs"
samba-libs-4.6.3-0.el7rhgs.x86_64
libsmbclient-4.6.3-0.el7rhgs.x86_64
samba-winbind-clients-4.6.3-0.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
glusterfs-libs-3.8.4-25.el7rhgs.x86_64
samba-common-libs-4.6.3-0.el7rhgs.x86_64
samba-winbind-4.6.3-0.el7rhgs.x86_64
samba-devel-4.6.3-0.el7rhgs.x86_64
glusterfs-cli-3.8.4-25.el7rhgs.x86_64
samba-common-4.6.3-0.el7rhgs.noarch
samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64
samba-python-4.6.3-0.el7rhgs.x86_64
samba-test-libs-4.6.3-0.el7rhgs.x86_64
samba-dc-4.6.3-0.el7rhgs.x86_64
samba-pidl-4.6.3-0.el7rhgs.noarch
glusterfs-client-xlators-3.8.4-25.el7rhgs.x86_64
glusterfs-server-3.8.4-25.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
cifs-utils-6.2-9.el7.x86_64
samba-4.6.3-0.el7rhgs.x86_64
samba-dc-libs-4.6.3-0.el7rhgs.x86_64
samba-test-4.6.3-0.el7rhgs.x86_64
samba-krb5-printing-4.6.3-0.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-24.el7rhgs.x86_64
glusterfs-api-3.8.4-25.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-25.el7rhgs.x86_64
python-gluster-3.8.4-25.el7rhgs.noarch
samba-client-libs-4.6.3-0.el7rhgs.x86_64
samba-winbind-modules-4.6.3-0.el7rhgs.x86_64
glusterfs-fuse-3.8.4-25.el7rhgs.x86_64
gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64
samba-common-tools-4.6.3-0.el7rhgs.x86_64
samba-client-4.6.3-0.el7rhgs.x86_64
samba-winbind-krb5-locator-4.6.3-0.el7rhgs.x86_64
glusterfs-3.8.4-25.el7rhgs.x86_64
glusterfs-rdma-3.8.4-25.el7rhgs.x86_64

Comment 5 Nag Pavan Chilakam 2017-05-17 11:39:49 UTC
(In reply to Ravishankar N from comment #3)
> Nag, 
> 
> 1. Can you attach the sosreports of the clients and the servers?
> 

I will be attaching the sosreports soon

> 2. "I have check the heal info of the volume and no heals are pending" ->
> Could this be because you ran heal info *before* there was any fop failure
> and therefore no pending heals (OR) because you ran heal info *after* the
> client side heal completed?

I also ran healinfo while the heal messages were getting displayed but the heal info showed zero entries

Comment 6 Nag Pavan Chilakam 2017-05-17 12:34:17 UTC
sosreports http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1451720/

Comment 7 Ravishankar N 2017-05-18 11:09:49 UTC
Observations from the samba mount log 'glusterfs-saturday-saturday.10.70.47.15.log' 

1.
W [MSGID: 138004] [nl-cache.c:415:nlc_unlink_cbk] 0-saturday-saturday-nl-cache: Failed to get GET_LINK_COUNT from dict" repeated 367 times 

nlc_unlink_cbk() unconditionally logs these messages without checking if the op-ret was zero. Poorinma would be sending a fix for it. But according to the bug, the rm -rf was done only from *one* client, so I don't see a reason why the unlink would fail or the link count not be present in the dict.


2. The log contains 2 set of connects for the bricks despite only a single mount, indicating some possible stale mounts. Also there is one set of disconnects after these 2 connects:

#grep -nE "Connected to|disconnected from" glusterfs-saturday-saturday.10.70.47.15.log

26:[2017-05-16 10:57:47.862354] I [MSGID: 104024] [glfs-mgmt.c:801:mgmt_rpc_notify] 0-glfs-mgmt: disconnected from remote-host: localhost
430:[2017-05-16 10:57:51.921229] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-0: Connected to saturday-saturday-client-0, attached to remote volume '/bricks/brick0/saturday-saturday_brick0'.
446:[2017-05-16 10:57:51.970914] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-4: Connected to saturday-saturday-client-4, attached to remote volume '/bricks/brick1/saturday-saturday_brick4'.
461:[2017-05-16 10:57:52.021976] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-8: Connected to saturday-saturday-client-8, attached to remote volume '/bricks/brick2/saturday-saturday_brick8'.
474:[2017-05-16 10:57:52.209561] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-5: Connected to saturday-saturday-client-5, attached to remote volume '/bricks/brick1/saturday-saturday_brick5'.
476:[2017-05-16 10:57:52.209884] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-9: Connected to saturday-saturday-client-9, attached to remote volume '/bricks/brick2/saturday-saturday_brick9'.
478:[2017-05-16 10:57:52.211898] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-1: Connected to saturday-saturday-client-1, attached to remote volume '/bricks/brick0/saturday-saturday_brick1'.
483:[2017-05-16 10:57:52.222755] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-2: Connected to saturday-saturday-client-2, attached to remote volume '/bricks/brick0/saturday-saturday_brick2'.
486:[2017-05-16 10:57:52.224107] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-6: Connected to saturday-saturday-client-6, attached to remote volume '/bricks/brick1/saturday-saturday_brick6'.
489:[2017-05-16 10:57:52.225482] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-10: Connected to saturday-saturday-client-10, attached to remote volume '/bricks/brick2/saturday-saturday_brick10'.
495:[2017-05-16 10:57:52.247176] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-7: Connected to saturday-saturday-client-7, attached to remote volume '/bricks/brick1/saturday-saturday_brick7'.
497:[2017-05-16 10:57:52.248433] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-3: Connected to saturday-saturday-client-3, attached to remote volume '/bricks/brick0/saturday-saturday_brick3'.
499:[2017-05-16 10:57:52.249413] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-11: Connected to saturday-saturday-client-11, attached to remote volume '/bricks/brick2/saturday-saturday_brick11'.
508:[2017-05-16 10:57:56.315455] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-snapd-client: Connected to saturday-saturday-snapd-client, attached to remote volume 'snapd-saturday-saturday'.

-----------------------


560:[2017-05-16 12:52:03.614988] I [MSGID: 104024] [glfs-mgmt.c:801:mgmt_rpc_notify] 0-glfs-mgmt: disconnected from remote-host: localhost
963:[2017-05-16 12:52:07.776297] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-0: Connected to saturday-saturday-client-0, attached to remote volume '/bricks/brick0/saturday-saturday_brick0'.
974:[2017-05-16 12:52:07.821816] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-1: Connected to saturday-saturday-client-1, attached to remote volume '/bricks/brick0/saturday-saturday_brick1'.
981:[2017-05-16 12:52:07.834578] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-3: Connected to saturday-saturday-client-3, attached to remote volume '/bricks/brick0/saturday-saturday_brick3'.
984:[2017-05-16 12:52:07.835603] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-2: Connected to saturday-saturday-client-2, attached to remote volume '/bricks/brick0/saturday-saturday_brick2'.
994:[2017-05-16 12:52:07.862550] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-5: Connected to saturday-saturday-client-5, attached to remote volume '/bricks/brick1/saturday-saturday_brick5'.
1001:[2017-05-16 12:52:07.874481] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-4: Connected to saturday-saturday-client-4, attached to remote volume '/bricks/brick1/saturday-saturday_brick4'.
1007:[2017-05-16 12:52:07.889969] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-6: Connected to saturday-saturday-client-6, attached to remote volume '/bricks/brick1/saturday-saturday_brick6'.
1011:[2017-05-16 12:52:07.891142] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-7: Connected to saturday-saturday-client-7, attached to remote volume '/bricks/brick1/saturday-saturday_brick7'.
1019:[2017-05-16 12:52:07.910413] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-8: Connected to saturday-saturday-client-8, attached to remote volume '/bricks/brick2/saturday-saturday_brick8'.
1027:[2017-05-16 12:52:07.925364] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-9: Connected to saturday-saturday-client-9, attached to remote volume '/bricks/brick2/saturday-saturday_brick9'.
1029:[2017-05-16 12:52:07.926389] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-10: Connected to saturday-saturday-client-10, attached to remote volume '/bricks/brick2/saturday-saturday_brick10'.
1032:[2017-05-16 12:52:07.927940] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-11: Connected to saturday-saturday-client-11, attached to remote volume '/bricks/brick2/saturday-saturday_brick11'.
1038:[2017-05-16 12:52:07.940793] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-snapd-client: Connected to saturday-saturday-snapd-client, attached to remote volume 'snapd-saturday-saturday'.

-----------------------


1064:[2017-05-16 12:52:39.506624] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-0: disconnected from saturday-saturday-client-0. Client process will keep trying to connect to glusterd until brick's port is available
1065:[2017-05-16 12:52:39.506673] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-1: disconnected from saturday-saturday-client-1. Client process will keep trying to connect to glusterd until brick's port is available
1067:[2017-05-16 12:52:39.507435] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-2: disconnected from saturday-saturday-client-2. Client process will keep trying to connect to glusterd until brick's port is available
1068:[2017-05-16 12:52:39.507471] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-3: disconnected from saturday-saturday-client-3. Client process will keep trying to connect to glusterd until brick's port is available
1070:[2017-05-16 12:52:39.507690] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-4: disconnected from saturday-saturday-client-4. Client process will keep trying to connect to glusterd until brick's port is available
1071:[2017-05-16 12:52:39.507724] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-5: disconnected from saturday-saturday-client-5. Client process will keep trying to connect to glusterd until brick's port is available
1073:[2017-05-16 12:52:39.507907] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-6: disconnected from saturday-saturday-client-6. Client process will keep trying to connect to glusterd until brick's port is available
1074:[2017-05-16 12:52:39.507939] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-7: disconnected from saturday-saturday-client-7. Client process will keep trying to connect to glusterd until brick's port is available
1076:[2017-05-16 12:52:39.508482] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-8: disconnected from saturday-saturday-client-8. Client process will keep trying to connect to glusterd until brick's port is available
1077:[2017-05-16 12:52:39.508548] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-9: disconnected from saturday-saturday-client-9. Client process will keep trying to connect to glusterd until brick's port is available
1079:[2017-05-16 12:52:39.508748] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-10: disconnected from saturday-saturday-client-10. Client process will keep trying to connect to glusterd until brick's port is available
1080:[2017-05-16 12:52:39.508782] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-11: disconnected from saturday-saturday-client-11. Client process will keep trying to connect to glusterd until brick's port is available
1082:[2017-05-16 12:52:39.508986] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-snapd-client: disconnected from saturday-saturday-snapd-client. Client process will keep trying to connect to glusterd until brick's port is available
-----------------------

3. Almost all of the 84,000+ metadata heals come after the disconnect happened, which is strange.

4. The volume also had snapshots enabled earlier (although .snaps was inaccessible via the cifs mount for this build).

Some of the gfids that were metadata healed, I found them on the snapshoted bricks:

[root@dhcp46-47 snaps]# find . -name 6ec14ef7-ed47-49c8-8075-6d330e1eef82
./891ba64df6cb411c96f94fef0559a63b/brick7/saturday-saturday_brick6/.glusterfs/6e/c1/6ec14ef7-ed47-49c8-8075-6d330e1eef82
./fb141e06f966496c9045bdc21ac32d9e/brick7/saturday-saturday_brick6/.glusterfs/6e/c1/6ec14ef7-ed47-49c8-8075-6d330e1eef82


Nag, could you see if you can re-create this on a clean setup, possibly without snapshots/uss?

Comment 8 Nag Pavan Chilakam 2017-05-19 14:31:00 UTC
Ravi I am able to see this directory not empty with healing on clientside ,even on a fuse setup without snapshots.
this is the same setup  the brick full issues we were seeing today 



fuse mount
[root@dhcp35-103 aarthy-perf-tool]# 
[root@dhcp35-103 aarthy-perf-tool]#  for j in 1;do for i in {1..25};do rm -rf /mnt/cross3-$i/* ;done;done
rm: cannot remove ‘/mnt/cross3-6/dir1’: Directory not empty
rm: cannot remove ‘/mnt/cross3-25/dir1’: Directory not empty
[root@dhcp35-103 aarthy-perf-tool]# 





fuse log for cross3-6 volume mount





[2017-05-19 14:24:18.102723] I [MSGID: 108026] [afr-self-heal-entry.c:840:afr_selfheal_entry_do] 0-cross3-6-replicate-0: performing entry selfheal on 1be0b7b0-0603-4ffc-b056-2683b4b25fca



1be0b7b0-0603-4ffc-b056-2683b4b25fca is the gfid of dir1



[root@dhcp35-45 glusterfs]# gluster v heal cross3-6 info
Brick 10.70.35.45:/rhs/brick6/cross3-6
/dir1 
Status: Connected
Number of entries: 1

Brick 10.70.35.130:/rhs/brick6/cross3-6
Status: Connected
Number of entries: 0

Brick 10.70.35.122:/rhs/brick6/cross3-6
Status: Connected
Number of entries: 0


n1:
[root@dhcp35-45 glusterfs]# getfattr -d -m . -e hex /rhs/brick6/cross3-6/dir1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick6/cross3-6/dir1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.cross3-6-client-1=0x000000000000000000000001
trusted.afr.cross3-6-client-2=0x000000000000000000000001
trusted.gfid=0x1be0b7b006034ffcb0562683b4b25fca
trusted.glusterfs.dht=0x000000010000000000000000ffffffff


n2:
root@dhcp35-130 glusterfs]# getfattr -d -m . -e hex /rhs/brick6/cross3-6/dir1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick6/cross3-6/dir1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.gfid=0x1be0b7b006034ffcb0562683b4b25fca
trusted.glusterfs.dht=0x000000010000000000000000ffffffff

n3:
[root@dhcp35-122 glusterfs]# getfattr -d -m . -e hex /rhs/brick6/cross3-6/dir1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick6/cross3-6/dir1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.gfid=0x1be0b7b006034ffcb0562683b4b25fca
trusted.glusterfs.dht=0x000000010000000000000000ffffffff


n1:
[root@dhcp35-45 glusterfs]# ll /rhs/brick6/cross3-6/dir1
total 0
-rw-r--r--. 2 root root 0 May 18 14:05 file.10388
-rw-r--r--. 2 root root 0 May 18 14:05 file.10389
-rw-r--r--. 2 root root 0 May 18 14:05 file.10390
-rw-r--r--. 2 root root 0 May 18 14:05 file.10391
-rw-r--r--. 2 root root 0 May 18 14:05 file.10392
-rw-r--r--. 2 root root 0 May 18 14:05 file.10393
-rw-r--r--. 2 root root 0 May 18 14:05 file.10394
-rw-r--r--. 2 root root 0 May 18 14:05 file.10395
-rw-r--r--. 2 root root 0 May 18 14:05 file.10396
-rw-r--r--. 2 root root 0 May 18 14:05 file.10397
-rw-r--r--. 2 root root 0 May 18 14:05 file.10398
-rw-r--r--. 2 root root 0 May 18 14:05 file.10399
-rw-r--r--. 2 root root 0 May 18 14:05 file.10400
-rw-r--r--. 2 root root 0 May 18 14:05 file.10401
-rw-r--r--. 2 root root 0 May 18 14:05 file.10402
-rw-r--r--. 2 root root 0 May 18 14:05 file.10403
-rw-r--r--. 2 root root 0 May 18 14:05 file.10404
-rw-r--r--. 2 root root 0 May 18 14:05 file.10405
-rw-r--r--. 2 root root 0 May 18 14:05 file.10406
-rw-r--r--. 2 root root 0 May 18 14:05 file.10407
-rw-r--r--. 2 root root 0 May 18 14:05 file.10408
-rw-r--r--. 2 root root 0 May 18 14:05 file.10409
-rw-r--r--. 2 root root 0 May 18 14:05 file.10410
-rw-r--r--. 2 root root 0 May 18 14:05 file.10411
-rw-r--r--. 2 root root 0 May 18 14:05 file.10412
-rw-r--r--. 2 root root 0 May 18 14:05 file.10413
-rw-r--r--. 2 root root 0 May 18 14:05 file.10414
-rw-r--r--. 2 root root 0 May 18 14:05 file.10415
-rw-r--r--. 2 root root 0 May 18 14:05 file.10416
-rw-r--r--. 2 root root 0 May 18 14:05 file.10417
-rw-r--r--. 2 root root 0 May 18 14:05 file.10418
-rw-r--r--. 2 root root 0 May 18 14:05 file.10419
-rw-r--r--. 2 root root 0 May 18 14:05 file.10420
-rw-r--r--. 2 root root 0 May 18 14:05 file.10421
-rw-r--r--. 2 root root 0 May 18 14:05 file.10422
-rw-r--r--. 1 root root 0 May 18 14:05 file.10423  =====>extra file which is not seen on n2/3


n2:
[root@dhcp35-130 glusterfs]# ll /rhs/brick6/cross3-6/dir1
total 0
-rw-r--r--. 2 root root 0 May 18 14:05 file.10388
-rw-r--r--. 2 root root 0 May 18 14:05 file.10389
-rw-r--r--. 2 root root 0 May 18 14:05 file.10390
-rw-r--r--. 2 root root 0 May 18 14:05 file.10391
-rw-r--r--. 2 root root 0 May 18 14:05 file.10392
-rw-r--r--. 2 root root 0 May 18 14:05 file.10393
-rw-r--r--. 2 root root 0 May 18 14:05 file.10394
-rw-r--r--. 2 root root 0 May 18 14:05 file.10395
-rw-r--r--. 2 root root 0 May 18 14:05 file.10396
-rw-r--r--. 2 root root 0 May 18 14:05 file.10397
-rw-r--r--. 2 root root 0 May 18 14:05 file.10398
-rw-r--r--. 2 root root 0 May 18 14:05 file.10399
-rw-r--r--. 2 root root 0 May 18 14:05 file.10400
-rw-r--r--. 2 root root 0 May 18 14:05 file.10401
-rw-r--r--. 2 root root 0 May 18 14:05 file.10402
-rw-r--r--. 2 root root 0 May 18 14:05 file.10403
-rw-r--r--. 2 root root 0 May 18 14:05 file.10404
-rw-r--r--. 2 root root 0 May 18 14:05 file.10405
-rw-r--r--. 2 root root 0 May 18 14:05 file.10406
-rw-r--r--. 2 root root 0 May 18 14:05 file.10407
-rw-r--r--. 2 root root 0 May 18 14:05 file.10408
-rw-r--r--. 2 root root 0 May 18 14:05 file.10409
-rw-r--r--. 2 root root 0 May 18 14:05 file.10410
-rw-r--r--. 2 root root 0 May 18 14:05 file.10411
-rw-r--r--. 2 root root 0 May 18 14:05 file.10412
-rw-r--r--. 2 root root 0 May 18 14:05 file.10413
-rw-r--r--. 2 root root 0 May 18 14:05 file.10414
-rw-r--r--. 2 root root 0 May 18 14:05 file.10415
-rw-r--r--. 2 root root 0 May 18 14:05 file.10416
-rw-r--r--. 2 root root 0 May 18 14:05 file.10417
-rw-r--r--. 2 root root 0 May 18 14:05 file.10418
-rw-r--r--. 2 root root 0 May 18 14:05 file.10419
-rw-r--r--. 2 root root 0 May 18 14:05 file.10420
-rw-r--r--. 2 root root 0 May 18 14:05 file.10421
-rw-r--r--. 2 root root 0 May 18 14:05 file.10422

n3:
[root@dhcp35-122 glusterfs]# ll /rhs/brick6/cross3-6/dir1
total 0
-rw-r--r--. 2 root root 0 May 18 14:05 file.10388
-rw-r--r--. 2 root root 0 May 18 14:05 file.10389
-rw-r--r--. 2 root root 0 May 18 14:05 file.10390
-rw-r--r--. 2 root root 0 May 18 14:05 file.10391
-rw-r--r--. 2 root root 0 May 18 14:05 file.10392
-rw-r--r--. 2 root root 0 May 18 14:05 file.10393
-rw-r--r--. 2 root root 0 May 18 14:05 file.10394
-rw-r--r--. 2 root root 0 May 18 14:05 file.10395
-rw-r--r--. 2 root root 0 May 18 14:05 file.10396
-rw-r--r--. 2 root root 0 May 18 14:05 file.10397
-rw-r--r--. 2 root root 0 May 18 14:05 file.10398
-rw-r--r--. 2 root root 0 May 18 14:05 file.10399
-rw-r--r--. 2 root root 0 May 18 14:05 file.10400
-rw-r--r--. 2 root root 0 May 18 14:05 file.10401
-rw-r--r--. 2 root root 0 May 18 14:05 file.10402
-rw-r--r--. 2 root root 0 May 18 14:05 file.10403
-rw-r--r--. 2 root root 0 May 18 14:05 file.10404
-rw-r--r--. 2 root root 0 May 18 14:05 file.10405
-rw-r--r--. 2 root root 0 May 18 14:05 file.10406
-rw-r--r--. 2 root root 0 May 18 14:05 file.10407
-rw-r--r--. 2 root root 0 May 18 14:05 file.10408
-rw-r--r--. 2 root root 0 May 18 14:05 file.10409
-rw-r--r--. 2 root root 0 May 18 14:05 file.10410
-rw-r--r--. 2 root root 0 May 18 14:05 file.10411
-rw-r--r--. 2 root root 0 May 18 14:05 file.10412
-rw-r--r--. 2 root root 0 May 18 14:05 file.10413
-rw-r--r--. 2 root root 0 May 18 14:05 file.10414
-rw-r--r--. 2 root root 0 May 18 14:05 file.10415
-rw-r--r--. 2 root root 0 May 18 14:05 file.10416
-rw-r--r--. 2 root root 0 May 18 14:05 file.10417
-rw-r--r--. 2 root root 0 May 18 14:05 file.10418
-rw-r--r--. 2 root root 0 May 18 14:05 file.10419
-rw-r--r--. 2 root root 0 May 18 14:05 file.10420
-rw-r--r--. 2 root root 0 May 18 14:05 file.10421
-rw-r--r--. 2 root root 0 May 18 14:05 file.10422
[root@dhcp35-122 glusterfs]#

Comment 15 Atin Mukherjee 2018-11-11 21:27:32 UTC
The needinfo is pending since months now. Can we please get this addressed?

Comment 18 Red Hat Bugzilla 2023-09-14 03:57:44 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.