Description of problem: ======================== When I issued an rm -rf of the root of the volume from cifs mount, I see that the cifs log is population lot of selfheal messages. There is no necesssity for healing while deletes. I have check the heal info of the volume and no heals are pending Version-Release number of selected component (if applicable): ======== 3.8.4-25 How reproducible: =========== 2/2 Steps to Reproduce: 1.have a 4 node setup with ctdb samba enabled 2.created a 6x2 volume 3.enabled nl cache and parallel readir options 4. populated lot of files into 3 different directories 5. did negative lookups from two different clients for some time 6. while doing negative lookusp from one client, did rm -rf * from another client seeing below messages in cifs samba mount log The message "W [MSGID: 138004] [nl-cache.c:415:nlc_unlink_cbk] 0-saturday-saturday-nl-cache: Failed to get GET_LINK_COUNT from dict" repeated 367 times between [2017-05-17 11:15:42.274741] and [2017-05-17 11:15:56.343824] [2017-05-17 11:15:56.368917] I [MSGID: 108031] [afr-common.c:2255:afr_local_discovery_cbk] 0-saturday-saturday-replicate-0: selecting local read_child saturday-saturday-client-0 [2017-05-17 11:15:56.369378] I [MSGID: 108031] [afr-common.c:2255:afr_local_discovery_cbk] 0-saturday-saturday-replicate-4: selecting local read_child saturday-saturday-client-8 [2017-05-17 11:15:56.369554] I [MSGID: 108031] [afr-common.c:2255:afr_local_discovery_cbk] 0-saturday-saturday-replicate-2: selecting local read_child saturday-saturday-client-4 The message "I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-saturday-saturday-replicate-4: performing metadata selfheal on 2d79c1cf-798e-4ce7-820f-ce999f9cb089" repeated 11 times between [2017-05-17 11:14:56.916777] and [2017-05-17 11:15:39.798577] The message "I [MSGID: 108026] [afr-self-heal-common.c:1212:afr_log_selfheal] 0-saturday-saturday-replicate-4: Completed metadata selfheal on 2d79c1cf-798e-4ce7-820f-ce999f9cb089. sources=[0] sinks=1 " repeated 11 times between [2017-05-17 11:14:56.934556] and [2017-05-17 11:15:39.815418] The message "I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-saturday-saturday-replicate-4: performing metadata selfheal on 79a010f9-7259-4d67-a793-73bbf6731078" repeated 11 times between [2017-05-17 11:14:56.984354] and [2017-05-17 11:15:41.298009] The message "I [MSGID: 108026] [afr-self-heal-common.c:1212:afr_log_selfheal] 0-saturday-saturday-replicate-4: Completed metadata selfheal on 79a010f9-7259-4d67-a793-73bbf6731078. sources=[0] sinks=1 " repeated 11 times between [2017-05-17 11:14:56.994990] and [2017-05-17 11:15:41.312922]
[root@dhcp47-127 samba]# gluster v list gctdb saturday-saturday [root@dhcp47-127 samba]# gluster v heal saturday-saturday info gBrick dhcp47-127.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick0 Status: Connected Number of entries: 0 Brick dhcp46-181.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick1 Status: Connected Number of entries: 0 Brick dhcp46-47.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick2 Status: Connected Number of entries: 0 Brick dhcp47-140.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick3 Status: Connected Number of entries: 0 Brick dhcp47-127.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick4 Status: Connected Number of entries: 0 Brick dhcp46-181.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick5 Status: Connected Number of entries: 0 Brick dhcp46-47.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick6 Status: Connected Number of entries: 0 Brick dhcp47-140.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick7 Status: Connected Number of entries: 0 Brick dhcp47-127.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick8 Status: Connected Number of entries: 0 Brick dhcp46-181.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick9 Status: Connected Number of entries: 0 Brick dhcp46-47.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick10 Status: Connected Number of entries: 0 Brick dhcp47-140.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick11 Status: Connected Number of entries: 0 [root@dhcp47-127 samba]# gluster v status saturday-saturday Status of volume: saturday-saturday Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dhcp47-127.lab.eng.blr.redhat.com:/br icks/brick0/saturday-saturday_brick0 49153 0 Y 22466 Brick dhcp46-181.lab.eng.blr.redhat.com:/br icks/brick0/saturday-saturday_brick1 49153 0 Y 3740 Brick dhcp46-47.lab.eng.blr.redhat.com:/bri cks/brick0/saturday-saturday_brick2 49153 0 Y 13002 Brick dhcp47-140.lab.eng.blr.redhat.com:/br icks/brick0/saturday-saturday_brick3 49153 0 Y 18266 Brick dhcp47-127.lab.eng.blr.redhat.com:/br icks/brick1/saturday-saturday_brick4 49154 0 Y 22469 Brick dhcp46-181.lab.eng.blr.redhat.com:/br icks/brick1/saturday-saturday_brick5 49154 0 Y 3750 Brick dhcp46-47.lab.eng.blr.redhat.com:/bri cks/brick1/saturday-saturday_brick6 49154 0 Y 13011 Brick dhcp47-140.lab.eng.blr.redhat.com:/br icks/brick1/saturday-saturday_brick7 49154 0 Y 18276 Brick dhcp47-127.lab.eng.blr.redhat.com:/br icks/brick2/saturday-saturday_brick8 49155 0 Y 22484 Brick dhcp46-181.lab.eng.blr.redhat.com:/br icks/brick2/saturday-saturday_brick9 49155 0 Y 3758 Brick dhcp46-47.lab.eng.blr.redhat.com:/bri cks/brick2/saturday-saturday_brick10 49155 0 Y 13020 Brick dhcp47-140.lab.eng.blr.redhat.com:/br icks/brick2/saturday-saturday_brick11 49155 0 Y 18284 Snapshot Daemon on localhost 49162 0 Y 22546 Self-heal Daemon on localhost N/A N/A Y 22449 Snapshot Daemon on dhcp46-181.lab.eng.blr.r edhat.com 49162 0 Y 3821 Self-heal Daemon on dhcp46-181.lab.eng.blr. redhat.com N/A N/A Y 3723 Snapshot Daemon on dhcp46-47.lab.eng.blr.re dhat.com 49162 0 Y 13083 Self-heal Daemon on dhcp46-47.lab.eng.blr.r edhat.com N/A N/A Y 12985 Snapshot Daemon on dhcp47-140.lab.eng.blr.r edhat.com 49162 0 Y 18348 Self-heal Daemon on dhcp47-140.lab.eng.blr. redhat.com N/A N/A Y 18249 Task Status of Volume saturday-saturday ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp47-127 samba]# gluster v info saturday-saturday Volume Name: saturday-saturday Type: Distributed-Replicate Volume ID: 4a24c34c-1144-4f07-9763-6e232c037a67 Status: Started Snapshot Count: 2 Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: dhcp47-127.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick0 Brick2: dhcp46-181.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick1 Brick3: dhcp46-47.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick2 Brick4: dhcp47-140.lab.eng.blr.redhat.com:/bricks/brick0/saturday-saturday_brick3 Brick5: dhcp47-127.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick4 Brick6: dhcp46-181.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick5 Brick7: dhcp46-47.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick6 Brick8: dhcp47-140.lab.eng.blr.redhat.com:/bricks/brick1/saturday-saturday_brick7 Brick9: dhcp47-127.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick8 Brick10: dhcp46-181.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick9 Brick11: dhcp46-47.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick10 Brick12: dhcp47-140.lab.eng.blr.redhat.com:/bricks/brick2/saturday-saturday_brick11 Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on performance.nl-cache: on features.barrier: disable features.show-snapshot-directory: enable features.uss: enable transport.address-family: inet nfs.disable: on server.allow-insecure: on performance.stat-prefetch: on storage.batch-fsync-delay-usec: 0 features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.cache-invalidation: on performance.md-cache-timeout: 600 network.inode-lru-limit: 50000 performance.cache-samba-metadata: on performance.parallel-readdir: on [root@dhcp47-127 samba]#
Nag, 1. Can you attach the sosreports of the clients and the servers? 2. "I have check the heal info of the volume and no heals are pending" -> Could this be because you ran heal info *before* there was any fop failure and therefore no pending heals (OR) because you ran heal info *after* the client side heal completed?
root@dhcp47-127 samba]# rpm -qa|egrep "gluster|smb|samba|cifs" samba-libs-4.6.3-0.el7rhgs.x86_64 libsmbclient-4.6.3-0.el7rhgs.x86_64 samba-winbind-clients-4.6.3-0.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.1.el7rhgs.noarch glusterfs-libs-3.8.4-25.el7rhgs.x86_64 samba-common-libs-4.6.3-0.el7rhgs.x86_64 samba-winbind-4.6.3-0.el7rhgs.x86_64 samba-devel-4.6.3-0.el7rhgs.x86_64 glusterfs-cli-3.8.4-25.el7rhgs.x86_64 samba-common-4.6.3-0.el7rhgs.noarch samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64 samba-python-4.6.3-0.el7rhgs.x86_64 samba-test-libs-4.6.3-0.el7rhgs.x86_64 samba-dc-4.6.3-0.el7rhgs.x86_64 samba-pidl-4.6.3-0.el7rhgs.noarch glusterfs-client-xlators-3.8.4-25.el7rhgs.x86_64 glusterfs-server-3.8.4-25.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch cifs-utils-6.2-9.el7.x86_64 samba-4.6.3-0.el7rhgs.x86_64 samba-dc-libs-4.6.3-0.el7rhgs.x86_64 samba-test-4.6.3-0.el7rhgs.x86_64 samba-krb5-printing-4.6.3-0.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-24.el7rhgs.x86_64 glusterfs-api-3.8.4-25.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-25.el7rhgs.x86_64 python-gluster-3.8.4-25.el7rhgs.noarch samba-client-libs-4.6.3-0.el7rhgs.x86_64 samba-winbind-modules-4.6.3-0.el7rhgs.x86_64 glusterfs-fuse-3.8.4-25.el7rhgs.x86_64 gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64 samba-common-tools-4.6.3-0.el7rhgs.x86_64 samba-client-4.6.3-0.el7rhgs.x86_64 samba-winbind-krb5-locator-4.6.3-0.el7rhgs.x86_64 glusterfs-3.8.4-25.el7rhgs.x86_64 glusterfs-rdma-3.8.4-25.el7rhgs.x86_64
(In reply to Ravishankar N from comment #3) > Nag, > > 1. Can you attach the sosreports of the clients and the servers? > I will be attaching the sosreports soon > 2. "I have check the heal info of the volume and no heals are pending" -> > Could this be because you ran heal info *before* there was any fop failure > and therefore no pending heals (OR) because you ran heal info *after* the > client side heal completed? I also ran healinfo while the heal messages were getting displayed but the heal info showed zero entries
sosreports http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1451720/
Observations from the samba mount log 'glusterfs-saturday-saturday.10.70.47.15.log' 1. W [MSGID: 138004] [nl-cache.c:415:nlc_unlink_cbk] 0-saturday-saturday-nl-cache: Failed to get GET_LINK_COUNT from dict" repeated 367 times nlc_unlink_cbk() unconditionally logs these messages without checking if the op-ret was zero. Poorinma would be sending a fix for it. But according to the bug, the rm -rf was done only from *one* client, so I don't see a reason why the unlink would fail or the link count not be present in the dict. 2. The log contains 2 set of connects for the bricks despite only a single mount, indicating some possible stale mounts. Also there is one set of disconnects after these 2 connects: #grep -nE "Connected to|disconnected from" glusterfs-saturday-saturday.10.70.47.15.log 26:[2017-05-16 10:57:47.862354] I [MSGID: 104024] [glfs-mgmt.c:801:mgmt_rpc_notify] 0-glfs-mgmt: disconnected from remote-host: localhost 430:[2017-05-16 10:57:51.921229] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-0: Connected to saturday-saturday-client-0, attached to remote volume '/bricks/brick0/saturday-saturday_brick0'. 446:[2017-05-16 10:57:51.970914] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-4: Connected to saturday-saturday-client-4, attached to remote volume '/bricks/brick1/saturday-saturday_brick4'. 461:[2017-05-16 10:57:52.021976] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-8: Connected to saturday-saturday-client-8, attached to remote volume '/bricks/brick2/saturday-saturday_brick8'. 474:[2017-05-16 10:57:52.209561] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-5: Connected to saturday-saturday-client-5, attached to remote volume '/bricks/brick1/saturday-saturday_brick5'. 476:[2017-05-16 10:57:52.209884] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-9: Connected to saturday-saturday-client-9, attached to remote volume '/bricks/brick2/saturday-saturday_brick9'. 478:[2017-05-16 10:57:52.211898] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-1: Connected to saturday-saturday-client-1, attached to remote volume '/bricks/brick0/saturday-saturday_brick1'. 483:[2017-05-16 10:57:52.222755] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-2: Connected to saturday-saturday-client-2, attached to remote volume '/bricks/brick0/saturday-saturday_brick2'. 486:[2017-05-16 10:57:52.224107] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-6: Connected to saturday-saturday-client-6, attached to remote volume '/bricks/brick1/saturday-saturday_brick6'. 489:[2017-05-16 10:57:52.225482] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-10: Connected to saturday-saturday-client-10, attached to remote volume '/bricks/brick2/saturday-saturday_brick10'. 495:[2017-05-16 10:57:52.247176] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-7: Connected to saturday-saturday-client-7, attached to remote volume '/bricks/brick1/saturday-saturday_brick7'. 497:[2017-05-16 10:57:52.248433] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-3: Connected to saturday-saturday-client-3, attached to remote volume '/bricks/brick0/saturday-saturday_brick3'. 499:[2017-05-16 10:57:52.249413] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-11: Connected to saturday-saturday-client-11, attached to remote volume '/bricks/brick2/saturday-saturday_brick11'. 508:[2017-05-16 10:57:56.315455] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-snapd-client: Connected to saturday-saturday-snapd-client, attached to remote volume 'snapd-saturday-saturday'. ----------------------- 560:[2017-05-16 12:52:03.614988] I [MSGID: 104024] [glfs-mgmt.c:801:mgmt_rpc_notify] 0-glfs-mgmt: disconnected from remote-host: localhost 963:[2017-05-16 12:52:07.776297] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-0: Connected to saturday-saturday-client-0, attached to remote volume '/bricks/brick0/saturday-saturday_brick0'. 974:[2017-05-16 12:52:07.821816] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-1: Connected to saturday-saturday-client-1, attached to remote volume '/bricks/brick0/saturday-saturday_brick1'. 981:[2017-05-16 12:52:07.834578] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-3: Connected to saturday-saturday-client-3, attached to remote volume '/bricks/brick0/saturday-saturday_brick3'. 984:[2017-05-16 12:52:07.835603] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-2: Connected to saturday-saturday-client-2, attached to remote volume '/bricks/brick0/saturday-saturday_brick2'. 994:[2017-05-16 12:52:07.862550] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-5: Connected to saturday-saturday-client-5, attached to remote volume '/bricks/brick1/saturday-saturday_brick5'. 1001:[2017-05-16 12:52:07.874481] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-4: Connected to saturday-saturday-client-4, attached to remote volume '/bricks/brick1/saturday-saturday_brick4'. 1007:[2017-05-16 12:52:07.889969] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-6: Connected to saturday-saturday-client-6, attached to remote volume '/bricks/brick1/saturday-saturday_brick6'. 1011:[2017-05-16 12:52:07.891142] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-7: Connected to saturday-saturday-client-7, attached to remote volume '/bricks/brick1/saturday-saturday_brick7'. 1019:[2017-05-16 12:52:07.910413] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-8: Connected to saturday-saturday-client-8, attached to remote volume '/bricks/brick2/saturday-saturday_brick8'. 1027:[2017-05-16 12:52:07.925364] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-9: Connected to saturday-saturday-client-9, attached to remote volume '/bricks/brick2/saturday-saturday_brick9'. 1029:[2017-05-16 12:52:07.926389] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-10: Connected to saturday-saturday-client-10, attached to remote volume '/bricks/brick2/saturday-saturday_brick10'. 1032:[2017-05-16 12:52:07.927940] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-client-11: Connected to saturday-saturday-client-11, attached to remote volume '/bricks/brick2/saturday-saturday_brick11'. 1038:[2017-05-16 12:52:07.940793] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-saturday-saturday-snapd-client: Connected to saturday-saturday-snapd-client, attached to remote volume 'snapd-saturday-saturday'. ----------------------- 1064:[2017-05-16 12:52:39.506624] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-0: disconnected from saturday-saturday-client-0. Client process will keep trying to connect to glusterd until brick's port is available 1065:[2017-05-16 12:52:39.506673] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-1: disconnected from saturday-saturday-client-1. Client process will keep trying to connect to glusterd until brick's port is available 1067:[2017-05-16 12:52:39.507435] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-2: disconnected from saturday-saturday-client-2. Client process will keep trying to connect to glusterd until brick's port is available 1068:[2017-05-16 12:52:39.507471] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-3: disconnected from saturday-saturday-client-3. Client process will keep trying to connect to glusterd until brick's port is available 1070:[2017-05-16 12:52:39.507690] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-4: disconnected from saturday-saturday-client-4. Client process will keep trying to connect to glusterd until brick's port is available 1071:[2017-05-16 12:52:39.507724] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-5: disconnected from saturday-saturday-client-5. Client process will keep trying to connect to glusterd until brick's port is available 1073:[2017-05-16 12:52:39.507907] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-6: disconnected from saturday-saturday-client-6. Client process will keep trying to connect to glusterd until brick's port is available 1074:[2017-05-16 12:52:39.507939] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-7: disconnected from saturday-saturday-client-7. Client process will keep trying to connect to glusterd until brick's port is available 1076:[2017-05-16 12:52:39.508482] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-8: disconnected from saturday-saturday-client-8. Client process will keep trying to connect to glusterd until brick's port is available 1077:[2017-05-16 12:52:39.508548] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-9: disconnected from saturday-saturday-client-9. Client process will keep trying to connect to glusterd until brick's port is available 1079:[2017-05-16 12:52:39.508748] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-10: disconnected from saturday-saturday-client-10. Client process will keep trying to connect to glusterd until brick's port is available 1080:[2017-05-16 12:52:39.508782] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-client-11: disconnected from saturday-saturday-client-11. Client process will keep trying to connect to glusterd until brick's port is available 1082:[2017-05-16 12:52:39.508986] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-saturday-saturday-snapd-client: disconnected from saturday-saturday-snapd-client. Client process will keep trying to connect to glusterd until brick's port is available ----------------------- 3. Almost all of the 84,000+ metadata heals come after the disconnect happened, which is strange. 4. The volume also had snapshots enabled earlier (although .snaps was inaccessible via the cifs mount for this build). Some of the gfids that were metadata healed, I found them on the snapshoted bricks: [root@dhcp46-47 snaps]# find . -name 6ec14ef7-ed47-49c8-8075-6d330e1eef82 ./891ba64df6cb411c96f94fef0559a63b/brick7/saturday-saturday_brick6/.glusterfs/6e/c1/6ec14ef7-ed47-49c8-8075-6d330e1eef82 ./fb141e06f966496c9045bdc21ac32d9e/brick7/saturday-saturday_brick6/.glusterfs/6e/c1/6ec14ef7-ed47-49c8-8075-6d330e1eef82 Nag, could you see if you can re-create this on a clean setup, possibly without snapshots/uss?
Ravi I am able to see this directory not empty with healing on clientside ,even on a fuse setup without snapshots. this is the same setup the brick full issues we were seeing today fuse mount [root@dhcp35-103 aarthy-perf-tool]# [root@dhcp35-103 aarthy-perf-tool]# for j in 1;do for i in {1..25};do rm -rf /mnt/cross3-$i/* ;done;done rm: cannot remove ‘/mnt/cross3-6/dir1’: Directory not empty rm: cannot remove ‘/mnt/cross3-25/dir1’: Directory not empty [root@dhcp35-103 aarthy-perf-tool]# fuse log for cross3-6 volume mount [2017-05-19 14:24:18.102723] I [MSGID: 108026] [afr-self-heal-entry.c:840:afr_selfheal_entry_do] 0-cross3-6-replicate-0: performing entry selfheal on 1be0b7b0-0603-4ffc-b056-2683b4b25fca 1be0b7b0-0603-4ffc-b056-2683b4b25fca is the gfid of dir1 [root@dhcp35-45 glusterfs]# gluster v heal cross3-6 info Brick 10.70.35.45:/rhs/brick6/cross3-6 /dir1 Status: Connected Number of entries: 1 Brick 10.70.35.130:/rhs/brick6/cross3-6 Status: Connected Number of entries: 0 Brick 10.70.35.122:/rhs/brick6/cross3-6 Status: Connected Number of entries: 0 n1: [root@dhcp35-45 glusterfs]# getfattr -d -m . -e hex /rhs/brick6/cross3-6/dir1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick6/cross3-6/dir1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.cross3-6-client-1=0x000000000000000000000001 trusted.afr.cross3-6-client-2=0x000000000000000000000001 trusted.gfid=0x1be0b7b006034ffcb0562683b4b25fca trusted.glusterfs.dht=0x000000010000000000000000ffffffff n2: root@dhcp35-130 glusterfs]# getfattr -d -m . -e hex /rhs/brick6/cross3-6/dir1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick6/cross3-6/dir1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0x1be0b7b006034ffcb0562683b4b25fca trusted.glusterfs.dht=0x000000010000000000000000ffffffff n3: [root@dhcp35-122 glusterfs]# getfattr -d -m . -e hex /rhs/brick6/cross3-6/dir1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick6/cross3-6/dir1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0x1be0b7b006034ffcb0562683b4b25fca trusted.glusterfs.dht=0x000000010000000000000000ffffffff n1: [root@dhcp35-45 glusterfs]# ll /rhs/brick6/cross3-6/dir1 total 0 -rw-r--r--. 2 root root 0 May 18 14:05 file.10388 -rw-r--r--. 2 root root 0 May 18 14:05 file.10389 -rw-r--r--. 2 root root 0 May 18 14:05 file.10390 -rw-r--r--. 2 root root 0 May 18 14:05 file.10391 -rw-r--r--. 2 root root 0 May 18 14:05 file.10392 -rw-r--r--. 2 root root 0 May 18 14:05 file.10393 -rw-r--r--. 2 root root 0 May 18 14:05 file.10394 -rw-r--r--. 2 root root 0 May 18 14:05 file.10395 -rw-r--r--. 2 root root 0 May 18 14:05 file.10396 -rw-r--r--. 2 root root 0 May 18 14:05 file.10397 -rw-r--r--. 2 root root 0 May 18 14:05 file.10398 -rw-r--r--. 2 root root 0 May 18 14:05 file.10399 -rw-r--r--. 2 root root 0 May 18 14:05 file.10400 -rw-r--r--. 2 root root 0 May 18 14:05 file.10401 -rw-r--r--. 2 root root 0 May 18 14:05 file.10402 -rw-r--r--. 2 root root 0 May 18 14:05 file.10403 -rw-r--r--. 2 root root 0 May 18 14:05 file.10404 -rw-r--r--. 2 root root 0 May 18 14:05 file.10405 -rw-r--r--. 2 root root 0 May 18 14:05 file.10406 -rw-r--r--. 2 root root 0 May 18 14:05 file.10407 -rw-r--r--. 2 root root 0 May 18 14:05 file.10408 -rw-r--r--. 2 root root 0 May 18 14:05 file.10409 -rw-r--r--. 2 root root 0 May 18 14:05 file.10410 -rw-r--r--. 2 root root 0 May 18 14:05 file.10411 -rw-r--r--. 2 root root 0 May 18 14:05 file.10412 -rw-r--r--. 2 root root 0 May 18 14:05 file.10413 -rw-r--r--. 2 root root 0 May 18 14:05 file.10414 -rw-r--r--. 2 root root 0 May 18 14:05 file.10415 -rw-r--r--. 2 root root 0 May 18 14:05 file.10416 -rw-r--r--. 2 root root 0 May 18 14:05 file.10417 -rw-r--r--. 2 root root 0 May 18 14:05 file.10418 -rw-r--r--. 2 root root 0 May 18 14:05 file.10419 -rw-r--r--. 2 root root 0 May 18 14:05 file.10420 -rw-r--r--. 2 root root 0 May 18 14:05 file.10421 -rw-r--r--. 2 root root 0 May 18 14:05 file.10422 -rw-r--r--. 1 root root 0 May 18 14:05 file.10423 =====>extra file which is not seen on n2/3 n2: [root@dhcp35-130 glusterfs]# ll /rhs/brick6/cross3-6/dir1 total 0 -rw-r--r--. 2 root root 0 May 18 14:05 file.10388 -rw-r--r--. 2 root root 0 May 18 14:05 file.10389 -rw-r--r--. 2 root root 0 May 18 14:05 file.10390 -rw-r--r--. 2 root root 0 May 18 14:05 file.10391 -rw-r--r--. 2 root root 0 May 18 14:05 file.10392 -rw-r--r--. 2 root root 0 May 18 14:05 file.10393 -rw-r--r--. 2 root root 0 May 18 14:05 file.10394 -rw-r--r--. 2 root root 0 May 18 14:05 file.10395 -rw-r--r--. 2 root root 0 May 18 14:05 file.10396 -rw-r--r--. 2 root root 0 May 18 14:05 file.10397 -rw-r--r--. 2 root root 0 May 18 14:05 file.10398 -rw-r--r--. 2 root root 0 May 18 14:05 file.10399 -rw-r--r--. 2 root root 0 May 18 14:05 file.10400 -rw-r--r--. 2 root root 0 May 18 14:05 file.10401 -rw-r--r--. 2 root root 0 May 18 14:05 file.10402 -rw-r--r--. 2 root root 0 May 18 14:05 file.10403 -rw-r--r--. 2 root root 0 May 18 14:05 file.10404 -rw-r--r--. 2 root root 0 May 18 14:05 file.10405 -rw-r--r--. 2 root root 0 May 18 14:05 file.10406 -rw-r--r--. 2 root root 0 May 18 14:05 file.10407 -rw-r--r--. 2 root root 0 May 18 14:05 file.10408 -rw-r--r--. 2 root root 0 May 18 14:05 file.10409 -rw-r--r--. 2 root root 0 May 18 14:05 file.10410 -rw-r--r--. 2 root root 0 May 18 14:05 file.10411 -rw-r--r--. 2 root root 0 May 18 14:05 file.10412 -rw-r--r--. 2 root root 0 May 18 14:05 file.10413 -rw-r--r--. 2 root root 0 May 18 14:05 file.10414 -rw-r--r--. 2 root root 0 May 18 14:05 file.10415 -rw-r--r--. 2 root root 0 May 18 14:05 file.10416 -rw-r--r--. 2 root root 0 May 18 14:05 file.10417 -rw-r--r--. 2 root root 0 May 18 14:05 file.10418 -rw-r--r--. 2 root root 0 May 18 14:05 file.10419 -rw-r--r--. 2 root root 0 May 18 14:05 file.10420 -rw-r--r--. 2 root root 0 May 18 14:05 file.10421 -rw-r--r--. 2 root root 0 May 18 14:05 file.10422 n3: [root@dhcp35-122 glusterfs]# ll /rhs/brick6/cross3-6/dir1 total 0 -rw-r--r--. 2 root root 0 May 18 14:05 file.10388 -rw-r--r--. 2 root root 0 May 18 14:05 file.10389 -rw-r--r--. 2 root root 0 May 18 14:05 file.10390 -rw-r--r--. 2 root root 0 May 18 14:05 file.10391 -rw-r--r--. 2 root root 0 May 18 14:05 file.10392 -rw-r--r--. 2 root root 0 May 18 14:05 file.10393 -rw-r--r--. 2 root root 0 May 18 14:05 file.10394 -rw-r--r--. 2 root root 0 May 18 14:05 file.10395 -rw-r--r--. 2 root root 0 May 18 14:05 file.10396 -rw-r--r--. 2 root root 0 May 18 14:05 file.10397 -rw-r--r--. 2 root root 0 May 18 14:05 file.10398 -rw-r--r--. 2 root root 0 May 18 14:05 file.10399 -rw-r--r--. 2 root root 0 May 18 14:05 file.10400 -rw-r--r--. 2 root root 0 May 18 14:05 file.10401 -rw-r--r--. 2 root root 0 May 18 14:05 file.10402 -rw-r--r--. 2 root root 0 May 18 14:05 file.10403 -rw-r--r--. 2 root root 0 May 18 14:05 file.10404 -rw-r--r--. 2 root root 0 May 18 14:05 file.10405 -rw-r--r--. 2 root root 0 May 18 14:05 file.10406 -rw-r--r--. 2 root root 0 May 18 14:05 file.10407 -rw-r--r--. 2 root root 0 May 18 14:05 file.10408 -rw-r--r--. 2 root root 0 May 18 14:05 file.10409 -rw-r--r--. 2 root root 0 May 18 14:05 file.10410 -rw-r--r--. 2 root root 0 May 18 14:05 file.10411 -rw-r--r--. 2 root root 0 May 18 14:05 file.10412 -rw-r--r--. 2 root root 0 May 18 14:05 file.10413 -rw-r--r--. 2 root root 0 May 18 14:05 file.10414 -rw-r--r--. 2 root root 0 May 18 14:05 file.10415 -rw-r--r--. 2 root root 0 May 18 14:05 file.10416 -rw-r--r--. 2 root root 0 May 18 14:05 file.10417 -rw-r--r--. 2 root root 0 May 18 14:05 file.10418 -rw-r--r--. 2 root root 0 May 18 14:05 file.10419 -rw-r--r--. 2 root root 0 May 18 14:05 file.10420 -rw-r--r--. 2 root root 0 May 18 14:05 file.10421 -rw-r--r--. 2 root root 0 May 18 14:05 file.10422 [root@dhcp35-122 glusterfs]#
The needinfo is pending since months now. Can we please get this addressed?
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days