+++ This bug was initially created as a clone of Bug #1662206 +++ Description of problem: While verifying bz#1645480, hit an issue where directory is pending heal Version-Release number of selected component (if applicable): # rpm -qa | grep gluster gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-rdma-3.12.2-34.el7rhgs.x86_64 glusterfs-server-3.12.2-34.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-34.el7rhgs.x86_64 glusterfs-fuse-3.12.2-34.el7rhgs.x86_64 glusterfs-events-3.12.2-34.el7rhgs.x86_64 How reproducible: 1/1 Steps to Reproduce: 1. Create a 2X(2+1) arbiter volume, brick{1..6} 2. Start the volume and fuse mount it. 3. Create few files (200nos) inside few directories, create hardlinks and softlinks 4. Bring brick 2 and brick 5 down 5. Now do continuous metadata operations like rename, chgrp, chown, this is done continuously 6. Occasionally bring brick 1 and brick 4 down and bring the down bricks from step 4 up. 7. We see few transport endpoint not connected errors as sometime the good brick is down, which is expected 8. Now again bring brick 2 and brick 5 down, brick 1 and brick 4 up. 9. All the above steps are done when IO's per step 5 is ongoing. Actual results: Initial output # gluster v heal test info Brick 10.70.46.55:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.47.184:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.46.193:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.47.67:/bricks/brick1/testing <gfid:725ab1d7-bed9-4a25-b1a0-95e2c15605b7> Status: Connected Number of entries: 1 Brick 10.70.46.169:/bricks/brick1/testing <gfid:725ab1d7-bed9-4a25-b1a0-95e2c15605b7> Status: Connected Number of entries: 1 Brick 10.70.47.122:/bricks/brick1/testing Status: Connected Number of entries: 0 After few minutes heal output started giving dir level00 as pending heal ~]# gluster v heal test info Brick 10.70.46.55:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.47.184:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.46.193:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.47.67:/bricks/brick1/testing /level00 Status: Connected Number of entries: 1 Brick 10.70.46.169:/bricks/brick1/testing /level00 Status: Connected Number of entries: 1 Brick 10.70.47.122:/bricks/brick1/testing Status: Connected Number of entries: 0 Expected results: No files/dir should be in pending heal. Additional info: Volume Info ~]# gluster v info test Volume Name: test Type: Distributed-Replicate Volume ID: 11ee3f35-f99d-49ce-95c7-bbee829bc6f1 Status: Started Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: 10.70.46.55:/bricks/brick1/testing Brick2: 10.70.47.184:/bricks/brick1/testing Brick3: 10.70.46.193:/bricks/brick1/testing (arbiter) Brick4: 10.70.47.67:/bricks/brick1/testing Brick5: 10.70.46.169:/bricks/brick1/testing Brick6: 10.70.47.122:/bricks/brick1/testing (arbiter) Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off Also the getfattr's for the directory level00, for the second sub-vol where client3 blames client 4 and vice versa -55 ~]# getfattr -d -m . -e hex /bricks/brick1/testing/level00 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/testing/level00 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-client-1=0x000000000000000000000000 trusted.gfid=0x725ab1d7bed94a25b1a095e2c15605b7 trusted.glusterfs.dht=0x000000000000000000000000aaac18d5 trusted.glusterfs.dht.mds=0x00000000 184 ~]# getfattr -d -m . -e hex /bricks/brick1/testing/level00 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/testing/level00 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-client-0=0x000000000000000000000000 trusted.gfid=0x725ab1d7bed94a25b1a095e2c15605b7 trusted.glusterfs.dht=0x000000000000000000000000aaac18d5 trusted.glusterfs.dht.mds=0x00000000 193 ~]# getfattr -d -m . -e hex /bricks/brick1/testing/level00 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/testing/level00 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-client-0=0x000000000000000000000000 trusted.afr.test-client-1=0x000000000000000000000000 trusted.gfid=0x725ab1d7bed94a25b1a095e2c15605b7 trusted.glusterfs.dht=0x000000000000000000000000aaac18d5 trusted.glusterfs.dht.mds=0x00000000 67 ~]# getfattr -d -m . -e hex /bricks/brick1/testing/level00 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/testing/level00 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-client-4=0x000000000000000000000192 trusted.gfid=0x725ab1d7bed94a25b1a095e2c15605b7 trusted.glusterfs.dht=0x0000000000000000aaac18d6ffffffff 169 ~]# getfattr -d -m . -e hex /bricks/brick1/testing/level00 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/testing/level00 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-client-3=0x0000000000000000000001a2 trusted.gfid=0x725ab1d7bed94a25b1a095e2c15605b7 trusted.glusterfs.dht=0x0000000000000000aaac18d6ffffffff -122 ~]# getfattr -d -m . -e hex /bricks/brick1/testing/level00 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/testing/level00 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-client-3=0x000000000000000000000000 trusted.afr.test-client-4=0x000000000000000000000000 trusted.gfid=0x725ab1d7bed94a25b1a095e2c15605b7 trusted.glusterfs.dht=0x0000000000000000aaac18d6ffffffff --- Additional comment from Anees Patel on 2018-12-27 04:07:00 UTC --- sosreports are available at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/anees/bz1662206/ --- Additional comment from Ravishankar N on 2018-12-27 04:53:01 UTC --- Karthik, could you take a look at this please? From the state of afr xattrs, a conservative merge should have occurred during self-heal of the directory. --- Additional comment from Karthik U S on 2018-12-28 06:59:04 UTC --- Hi Anees, Can you provide the exact operations that you ran in steps 3 & 5 in the reproducer and the client logs as well please? Did you try running the volume heal or lookup on the dir level00 from mount after this, and was there any change in the heal info output if you did so? -Karthik --- Additional comment from Karthik U S on 2018-12-28 14:34:39 UTC --- Current state: When Anees did glusterd restart on all the nodes today, the directory got healed by itself. From the shd log: [2018-12-28 07:46:05.439143] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-test-replicate-1: performing entry selfheal on 725ab1d7-bed9-4a25-b1a0-95e2c15605b7 [2018-12-28 07:46:05.439205] I [MSGID: 114047] [client-handshake.c:1161:client_setvolume_cbk] 0-testvol_distributed-replicated-client-4: Server and Client lk-version numbers are not same, reopening the fds [2018-12-28 07:46:05.439397] I [MSGID: 108005] [afr-common.c:5037:__afr_handle_child_up_event] 0-testvol_distributed-replicated-replicate-1: Subvolume 'testvol_distributed-replicated-client-4' came back up; going online. [2018-12-28 07:46:05.440653] I [MSGID: 114035] [client-handshake.c:121:client_set_lk_version_cbk] 0-testvol_distributed-replicated-client-4: Server lk version = 1 [2018-12-28 07:46:05.446328] I [MSGID: 114057] [client-handshake.c:1397:select_server_supported_programs] 0-testvol_distributed-replicated-client-5: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2018-12-28 07:46:05.449398] I [MSGID: 114046] [client-handshake.c:1150:client_setvolume_cbk] 0-testvol_distributed-replicated-client-5: Connected to testvol_distributed-replicated-client-5, attached to remote volume '/bricks/brick3/testvol_distributed-replicated_brick5'. [2018-12-28 07:46:05.449474] I [MSGID: 114047] [client-handshake.c:1161:client_setvolume_cbk] 0-testvol_distributed-replicated-client-5: Server and Client lk-version numbers are not same, reopening the fds [2018-12-28 07:46:05.449720] I [MSGID: 108002] [afr-common.c:5334:afr_notify] 0-testvol_distributed-replicated-replicate-1: Client-quorum is met [2018-12-28 07:46:05.450589] I [MSGID: 114035] [client-handshake.c:121:client_set_lk_version_cbk] 0-testvol_distributed-replicated-client-5: Server lk version = 1 [2018-12-28 07:46:05.472093] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-arbitervol-replicate-1: performing entry selfheal on e82cba68-b0ec-4f0a-a6c3-113df7afcccf [2018-12-28 07:46:05.505071] I [MSGID: 108026] [afr-self-heal-common.c:1732:afr_log_selfheal] 0-test-replicate-1: Completed entry selfheal on 725ab1d7-bed9-4a25-b1a0-95e2c15605b7. sources= sinks=0 1 [root@dhcp47-67 ~]# gluster v heal test info Brick 10.70.46.55:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.47.184:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.46.193:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.47.67:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.46.169:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.47.122:/bricks/brick1/testing Status: Connected Number of entries: 0 @Anees, can you try to reproduce this bug please. Looks like some connection issue between the bricks & SHD. --- Additional comment from Anees Patel on 2019-07-02 05:54:22 UTC --- Hi Karthik, Executed all steps provided in comment#0 Did IO's with deep dir and intensive metadata operations, Also did a add-brick and rebalance on latest build (glusterfs-fuse-6.0-7.el7rhgs.x86_64) # gluster v info arb1 Volume Name: arb1 Type: Distributed-Replicate Volume ID: c772ca13-2767-40dc-b6d8-d76e4dd644e0 Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: 10.70.35.50:/bricks/brick1/newt Brick2: 10.70.46.216:/bricks/brick1/newt Brick3: 10.70.46.132:/bricks/brick1/newt (arbiter) Brick4: 10.70.35.50:/bricks/brick2/newt Brick5: 10.70.46.216:/bricks/brick2/newt Brick6: 10.70.46.132:/bricks/brick2/newt (arbiter) Brick7: 10.70.35.50:/bricks/brick3/newt Brick8: 10.70.46.216:/bricks/brick3/newt Brick9: 10.70.46.132:/bricks/brick3/newt (arbiter) Brick10: 10.70.35.50:/bricks/brick5/newt Brick11: 10.70.46.132:/bricks/brick5/newt Brick12: 10.70.46.216:/bricks/brick5/newt (arbiter) Options Reconfigured: performance.client-io-threads: off nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet cluster.shd-max-threads: 30 ========================================================================== Rebalance was completed # gluster v rebal arb1 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.46.216 43281 125.5MB 366134 0 3278 completed 1:25:22 10.70.46.132 43160 131.8MB 366275 0 3213 completed 1:25:22 localhost 43209 109.3MB 365106 0 3135 completed 1:25:22 volume rebalance: arb1: success ========================================================================== No Heals were pending. --- Additional comment from Karthik U S on 2019-07-02 06:01:19 UTC --- Anees, thanks for trying to reproduce this. Since we do not see the issue on the latest build, can we close this? --- Additional comment from Anees Patel on 2019-07-02 06:29:53 UTC --- Hi Karthik, There were many fixes that have been provided since the build this issue was raise (3.12.2-34.el7rhgs.x86_64) and the latest build There is a fix that went into the latest build per BZ#1715438, I was unable to hit this issue during validating that bug. Also during all the 3.5 testing till date, I haven't seen anything like this We are good to close this. I will re-open if issue is seen in future Thanks, --- Additional comment from Karthik U S on 2019-07-02 07:07:54 UTC --- Fix for BZ#1715438 will handle the metadata pending cases and the issue we saw here was entry pending which were not getting healed. As mentioned in comment #5, the issue here was looking like some connection problem with the bricks. If you hit the issue again please feel free to re-open this bug. Closing the as WORKSFORME since it is not reproducible. --- Additional comment from Bala Konda Reddy M on 2019-07-02 10:00:16 UTC --- Hey Karthik, I hit this issue there is one file/directory heal pending while doing in-service upgarde from RHGS3.4.4(glusterfs-3.12.2-47.2.el7rhgs.x86_64 to glusterfs-6.0-7.el7rhgs.x86_64) Scenario I tried. 1. On three nodes cluster, created 1000 volumes and started them(1X3 replicate) 2. only one volume is mounted and IO (linux untar) 3. Stopped glusterd on n1, pkill glusterfsd, pkill gluster 4. Performed yum update and reboot. 5. start glusterd on node n1 6. checked gluster volume heal testvol_-999 info on N1. #gluster vol heal testvol_-999 info Brick rhs-gp-srv13.lab.eng.blr.redhat.com:/bricks/brick9/testvol_-999 <gfid:69c6bf1e-ab9e-4c74-88f2-241199a8ae56> Status: Connected Number of entries: 1 Brick rhs-gp-srv14.lab.eng.blr.redhat.com:/bricks/brick9/testvol_-999 Status: Connected Number of entries: 0 Brick rhs-gp-srv16.lab.eng.blr.redhat.com:/bricks/brick9/testvol_-999 Status: Connected Number of entries: 0 I tried triggering heal again. but it is still in pending state. Waited for more than 10 minutes to get the files get healed. Thanks, Bala --- Additional comment from Karthik U S on 2019-07-03 06:20:35 UTC --- Initial analysis: We have only one file needing heal in the volume "testvol_-999" which has dirty marking for that file on only the upgraded node. From the discussion with Bala, there were no pending heals on the volume before starting the upgrade. The dirty marking could have happened due to a write operation on the file which succeeded on all the bricks but before the post-op could remove the marking, the node would have taken down for upgrading leading to the dirty marking left over on that brick. The .glusterfs/indices/dirty has two base entries and this file's gfid entry in that directory is linked to the old base entry so we can confirm the above scenario. The md5 sum of the file on all the bricks are also matching. There were other entries needing heal after the upgrade and they were completed successfully by the self heal daemons on the other two nodes, but only this entry is not getting cleared from the heal info list, since the dirty marking is present only on the upgraded node and whenever we try to run the index heal we see the following error message in the shd log of the upgraded brick: [2019-07-02 15:16:32.126379] E [glusterfsd-mgmt.c:857:glusterfs_handle_translator_op] 0-glusterfs: xlator testvol_-999-replicate-0 is not loaded We are suspecting that this is due to the recent change of shd-mux, where the upgraded node having trouble in loading the shd. Assigning this to Rafi for further analysis. --- Additional comment from Mohammed Rafi KC on 2019-07-05 08:52:45 UTC --- RCA: The attach request for that volume failed in shd process because the process was not initialized yet. - if (ctx->active) { - ret = mgmt_process_volfile(xlator_req.input.input_val, - xlator_req.input.input_len, xlator_req.name, - dict); - } else { - gf_msg(this->name, GF_LOG_WARNING, EINVAL, glusterfsd_msg_42, - "got attach for %s but no active graph", xlator_req.name); - } This is possible when a process establishes a management connection, but not finished the initialization, then ctx->active would be NULL. I will remove the check as shd process is capable of handling the situation.
REVIEW: https://review.gluster.org/23005 (graph/shd: attach volfile even if ctx->active is NULL) posted (#1) for review on master by mohammed rafi kc
REVIEW: https://review.gluster.org/23005 (graph/shd: attach volfile even if ctx->active is NULL) merged (#10) on master by Amar Tumballi