Description of problem: When I/O is happening on the volume heal info takes more than 2minutes even though there are no entries which needs to be healed. volume heal output when I/O is running: ========================= [root@sulphur ~]# time gluster vol heal data info Brick 10.70.35.53:/rhgs/brick2/data Status: Connected Number of entries: 0 Brick 10.70.35.222:/rhgs/brick2/data Status: Connected Number of entries: 0 Brick 10.70.35.181:/rhgs/brick2/data-new Status: Connected Number of entries: 0 real 2m10.066s user 0m0.358s sys 0m0.303s volume heal info output when no I/O running on the volume: ==================================== [root@sulphur ~]# time gluster vol heal data info Brick 10.70.35.53:/rhgs/brick2/data Status: Connected Number of entries: 0 Brick 10.70.35.222:/rhgs/brick2/data Status: Connected Number of entries: 0 Brick 10.70.35.181:/rhgs/brick2/data-new Status: Connected Number of entries: 0 real 0m3.690s user 0m0.147s sys 0m0.126s Version-Release number of selected component (if applicable): glusterfs-3.7.9-10.el7rhgs.x86_64 How reproducible: Steps to Reproduce: 1. start running I/O on the volume. 2. Now run "gluster volume heal <vol_name> info " command. 3. stop I/O on the volume. 4. Run "gluster volume heal <vol_name> info" command. Actual results: After step2 i see that heal info returns longer time to return where as when there is no I/O going on the volume it takes lesser time. Expected results: Heal info should not take longer time to return when I/O is happening on the volume and there is no entries to return. Additional info:
Can you provide info on time taken for command when o-direct options are turned off?
I have set cluster.eager-lock to off as suggested by pranith and ran "gluster volume heal <vol_name> info" and it took more than 3 mins. captured the output when heal info ran. gluster volume info output: ============================== Volume Name: data Type: Replicate Volume ID: 23bc0673-d57a-443f-bbba-2d732f0298ac Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.37.51:/rhgs/brick3/data Brick2: 10.70.37.60:/rhgs/brick3/data Brick3: 10.70.37.61:/rhgs/brick3/data Options Reconfigured: cluster.eager-lock: off auth.allow: 10.70.37.60,10.70.37.61,10.70.37.51 nfs.disable: enable user.cifs: off network.ping-timeout: 30 performance.strict-o-direct: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: off performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on Heal info when I/o is running on the volume: ============================================ [root@rhsqa1 ~]# time gluster volume heal data info Brick 10.70.37.51:/rhgs/brick3/data /.shard/150a00af-45a2-432f-9bfc-e962ed213f6a.7 /.shard/a1e351cc-983e-49a0-94a3-2afb68123056.6 /.shard/a5074af7-4cbf-487e-b4f1-7505b298dd30.7 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/3498983a-2956-40e2-8e49-8ec335048d99/0b57c93a-b2de-4522-a9c1-deea6f0fae00 /.shard/d58abaee-37d1-47e5-9ee8-bdc1ef720cf2.7 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/81660179-5d74-49ed-9f5b-771e513a9ff6/0eea9bac-b76b-49a4-9a6c-ebea0829209a /.shard/d58abaee-37d1-47e5-9ee8-bdc1ef720cf2.5 /.shard/1d4b25c6-230b-4b4b-9cc0-63f1f1828d7a.8 /.shard/d1f489e9-32c3-4e33-8563-aa485e5c86f6.8 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.8 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.9 /.shard/8c457176-009c-4fc9-9f43-44283108cd48.6 /.shard/150a00af-45a2-432f-9bfc-e962ed213f6a.9 /.shard/c3c3bd81-9fe0-4334-ac4d-87ec941f1832.8 /.shard/c3c3bd81-9fe0-4334-ac4d-87ec941f1832.9 /.shard/1d4b25c6-230b-4b4b-9cc0-63f1f1828d7a.11 /.shard/c3c3bd81-9fe0-4334-ac4d-87ec941f1832.11 /.shard/150a00af-45a2-432f-9bfc-e962ed213f6a.10 /.shard/d58abaee-37d1-47e5-9ee8-bdc1ef720cf2.9 /.shard/8c457176-009c-4fc9-9f43-44283108cd48.8 /.shard/a1e351cc-983e-49a0-94a3-2afb68123056.9 /.shard/d1f489e9-32c3-4e33-8563-aa485e5c86f6.10 /.shard/ec224e5c-276b-4a4e-accd-a883bbd677f9.12 /.shard/a5074af7-4cbf-487e-b4f1-7505b298dd30.9 /.shard/150a00af-45a2-432f-9bfc-e962ed213f6a.11 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.10 /.shard/c3c3bd81-9fe0-4334-ac4d-87ec941f1832.12 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.11 /.shard/d1f489e9-32c3-4e33-8563-aa485e5c86f6.11 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/13d47bad-06d8-4efe-8541-037d9fe45188/8311cfec-7dce-44da-ab10-4c7fdb6dc0b4 /.shard/d58abaee-37d1-47e5-9ee8-bdc1ef720cf2.8 /.shard/a1e351cc-983e-49a0-94a3-2afb68123056.8 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/e64d08b7-dea9-4029-a70c-f9d321ff81e2/df320576-c427-461c-9015-56b56a98ea51 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/9c77afe1-f8b3-4b0c-a486-8b27aa75b639/49d50d2a-75f1-46da-b932-402682a047ac /09991046-8f6b-438f-a69c-4cbea3fa5597/images/9a638de2-83b8-416a-8c6a-1dc83d4c55f7/5617abc9-f7c5-463b-aae2-5b9a558c2e66 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/54e45cb6-82be-4565-9e95-3e60b2d1e319/9d058473-1657-4650-bf45-8bc008bdab11 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/13a9fdc7-8883-4977-8349-6d13d5ec0d54/7d7c35fb-df9c-45ad-95be-9f8f1f4db142 /.shard/1d4b25c6-230b-4b4b-9cc0-63f1f1828d7a.12 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/8e4d2a62-dc67-4cd8-9916-eb742dcbfd71/d7ab018a-c8b8-4efb-b1fd-9482b5c7665e /09991046-8f6b-438f-a69c-4cbea3fa5597/images/1a139294-3a7c-4d29-b9d3-490813779b4c/ef9089de-f34d-4cc3-840e-767af6b1efc9 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.12 /.shard/ec224e5c-276b-4a4e-accd-a883bbd677f9.11 /.shard/a5074af7-4cbf-487e-b4f1-7505b298dd30.8 /.shard/d1f489e9-32c3-4e33-8563-aa485e5c86f6.12 /.shard/a1e351cc-983e-49a0-94a3-2afb68123056.10 /.shard/c3c3bd81-9fe0-4334-ac4d-87ec941f1832.10 /.shard/1d4b25c6-230b-4b4b-9cc0-63f1f1828d7a.13 /.shard/150a00af-45a2-432f-9bfc-e962ed213f6a.12 /.shard/ec224e5c-276b-4a4e-accd-a883bbd677f9.13 /.shard/8c457176-009c-4fc9-9f43-44283108cd48.9 /.shard/ec224e5c-276b-4a4e-accd-a883bbd677f9.14 /.shard/1d4b25c6-230b-4b4b-9cc0-63f1f1828d7a.14 /.shard/8c457176-009c-4fc9-9f43-44283108cd48.10 /.shard/ec224e5c-276b-4a4e-accd-a883bbd677f9.15 /.shard/d1f489e9-32c3-4e33-8563-aa485e5c86f6.13 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.13 /.shard/a1e351cc-983e-49a0-94a3-2afb68123056.7 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.14 /.shard/d1f489e9-32c3-4e33-8563-aa485e5c86f6.14 /.shard/c3c3bd81-9fe0-4334-ac4d-87ec941f1832.13 /.shard/d1f489e9-32c3-4e33-8563-aa485e5c86f6.15 /.shard/a1e351cc-983e-49a0-94a3-2afb68123056.20 /.shard/150a00af-45a2-432f-9bfc-e962ed213f6a.13 /.shard/150a00af-45a2-432f-9bfc-e962ed213f6a.14 /.shard/8c457176-009c-4fc9-9f43-44283108cd48.11 /.shard/1d4b25c6-230b-4b4b-9cc0-63f1f1828d7a.15 /.shard/8c457176-009c-4fc9-9f43-44283108cd48.12 /.shard/a5074af7-4cbf-487e-b4f1-7505b298dd30.10 /.shard/c3c3bd81-9fe0-4334-ac4d-87ec941f1832.14 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/32d92bc7-3c57-4377-86ec-ec4c80d03ce9/3157be36-ef04-49c4-8115-19e98b10b494 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/32d92bc7-3c57-4377-86ec-ec4c80d03ce9/3157be36-ef04-49c4-8115-19e98b10b494.meta /09991046-8f6b-438f-a69c-4cbea3fa5597/images/32d92bc7-3c57-4377-86ec-ec4c80d03ce9/3157be36-ef04-49c4-8115-19e98b10b494.lease /.shard/d58abaee-37d1-47e5-9ee8-bdc1ef720cf2.10 /.shard/a5074af7-4cbf-487e-b4f1-7505b298dd30.11 /.shard/d58abaee-37d1-47e5-9ee8-bdc1ef720cf2.11 /.shard/a5074af7-4cbf-487e-b4f1-7505b298dd30.12 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.15 Status: Connected Number of entries: 77 Brick 10.70.37.60:/rhgs/brick3/data /09991046-8f6b-438f-a69c-4cbea3fa5597/images/32d92bc7-3c57-4377-86ec-ec4c80d03ce9/3157be36-ef04-49c4-8115-19e98b10b494 Status: Connected Number of entries: 1 Brick 10.70.37.61:/rhgs/brick3/data /.shard/150a00af-45a2-432f-9bfc-e962ed213f6a.7 /.shard/a1e351cc-983e-49a0-94a3-2afb68123056.6 /.shard/a5074af7-4cbf-487e-b4f1-7505b298dd30.7 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/3498983a-2956-40e2-8e49-8ec335048d99/0b57c93a-b2de-4522-a9c1-deea6f0fae00 /.shard/d58abaee-37d1-47e5-9ee8-bdc1ef720cf2.7 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/81660179-5d74-49ed-9f5b-771e513a9ff6/0eea9bac-b76b-49a4-9a6c-ebea0829209a /.shard/d58abaee-37d1-47e5-9ee8-bdc1ef720cf2.5 /.shard/1d4b25c6-230b-4b4b-9cc0-63f1f1828d7a.8 /.shard/d1f489e9-32c3-4e33-8563-aa485e5c86f6.8 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.8 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.9 /.shard/8c457176-009c-4fc9-9f43-44283108cd48.6 /.shard/150a00af-45a2-432f-9bfc-e962ed213f6a.9 /.shard/c3c3bd81-9fe0-4334-ac4d-87ec941f1832.8 /.shard/c3c3bd81-9fe0-4334-ac4d-87ec941f1832.9 /.shard/1d4b25c6-230b-4b4b-9cc0-63f1f1828d7a.11 /.shard/c3c3bd81-9fe0-4334-ac4d-87ec941f1832.11 /.shard/150a00af-45a2-432f-9bfc-e962ed213f6a.10 /.shard/d58abaee-37d1-47e5-9ee8-bdc1ef720cf2.9 /.shard/8c457176-009c-4fc9-9f43-44283108cd48.8 /.shard/a1e351cc-983e-49a0-94a3-2afb68123056.9 /.shard/a5074af7-4cbf-487e-b4f1-7505b298dd30.9 /.shard/d1f489e9-32c3-4e33-8563-aa485e5c86f6.10 /.shard/ec224e5c-276b-4a4e-accd-a883bbd677f9.12 /.shard/150a00af-45a2-432f-9bfc-e962ed213f6a.11 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.10 /.shard/c3c3bd81-9fe0-4334-ac4d-87ec941f1832.12 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.11 /.shard/d1f489e9-32c3-4e33-8563-aa485e5c86f6.11 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/13d47bad-06d8-4efe-8541-037d9fe45188/8311cfec-7dce-44da-ab10-4c7fdb6dc0b4 /.shard/d58abaee-37d1-47e5-9ee8-bdc1ef720cf2.8 /.shard/a1e351cc-983e-49a0-94a3-2afb68123056.8 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/e64d08b7-dea9-4029-a70c-f9d321ff81e2/df320576-c427-461c-9015-56b56a98ea51 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/9c77afe1-f8b3-4b0c-a486-8b27aa75b639/49d50d2a-75f1-46da-b932-402682a047ac /09991046-8f6b-438f-a69c-4cbea3fa5597/images/9a638de2-83b8-416a-8c6a-1dc83d4c55f7/5617abc9-f7c5-463b-aae2-5b9a558c2e66 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/54e45cb6-82be-4565-9e95-3e60b2d1e319/9d058473-1657-4650-bf45-8bc008bdab11 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/13a9fdc7-8883-4977-8349-6d13d5ec0d54/7d7c35fb-df9c-45ad-95be-9f8f1f4db142 /.shard/1d4b25c6-230b-4b4b-9cc0-63f1f1828d7a.12 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/8e4d2a62-dc67-4cd8-9916-eb742dcbfd71/d7ab018a-c8b8-4efb-b1fd-9482b5c7665e /09991046-8f6b-438f-a69c-4cbea3fa5597/images/1a139294-3a7c-4d29-b9d3-490813779b4c/ef9089de-f34d-4cc3-840e-767af6b1efc9 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.12 /.shard/ec224e5c-276b-4a4e-accd-a883bbd677f9.11 /.shard/a5074af7-4cbf-487e-b4f1-7505b298dd30.8 /.shard/d1f489e9-32c3-4e33-8563-aa485e5c86f6.12 /.shard/a1e351cc-983e-49a0-94a3-2afb68123056.10 /.shard/c3c3bd81-9fe0-4334-ac4d-87ec941f1832.10 /.shard/1d4b25c6-230b-4b4b-9cc0-63f1f1828d7a.13 /.shard/150a00af-45a2-432f-9bfc-e962ed213f6a.12 /.shard/ec224e5c-276b-4a4e-accd-a883bbd677f9.13 /.shard/8c457176-009c-4fc9-9f43-44283108cd48.9 /.shard/ec224e5c-276b-4a4e-accd-a883bbd677f9.14 /.shard/1d4b25c6-230b-4b4b-9cc0-63f1f1828d7a.14 /.shard/8c457176-009c-4fc9-9f43-44283108cd48.10 /.shard/ec224e5c-276b-4a4e-accd-a883bbd677f9.15 /.shard/d1f489e9-32c3-4e33-8563-aa485e5c86f6.13 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.13 /.shard/a1e351cc-983e-49a0-94a3-2afb68123056.7 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.14 /.shard/d1f489e9-32c3-4e33-8563-aa485e5c86f6.14 /.shard/c3c3bd81-9fe0-4334-ac4d-87ec941f1832.13 /.shard/d1f489e9-32c3-4e33-8563-aa485e5c86f6.15 /.shard/a1e351cc-983e-49a0-94a3-2afb68123056.20 /.shard/150a00af-45a2-432f-9bfc-e962ed213f6a.13 /.shard/150a00af-45a2-432f-9bfc-e962ed213f6a.14 /.shard/8c457176-009c-4fc9-9f43-44283108cd48.11 /.shard/1d4b25c6-230b-4b4b-9cc0-63f1f1828d7a.15 /.shard/8c457176-009c-4fc9-9f43-44283108cd48.12 /.shard/a5074af7-4cbf-487e-b4f1-7505b298dd30.10 /.shard/c3c3bd81-9fe0-4334-ac4d-87ec941f1832.14 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/32d92bc7-3c57-4377-86ec-ec4c80d03ce9/3157be36-ef04-49c4-8115-19e98b10b494 /09991046-8f6b-438f-a69c-4cbea3fa5597/images/32d92bc7-3c57-4377-86ec-ec4c80d03ce9/3157be36-ef04-49c4-8115-19e98b10b494.meta /09991046-8f6b-438f-a69c-4cbea3fa5597/images/32d92bc7-3c57-4377-86ec-ec4c80d03ce9/3157be36-ef04-49c4-8115-19e98b10b494.lease /.shard/d58abaee-37d1-47e5-9ee8-bdc1ef720cf2.10 /.shard/a5074af7-4cbf-487e-b4f1-7505b298dd30.11 /.shard/d58abaee-37d1-47e5-9ee8-bdc1ef720cf2.11 /.shard/a5074af7-4cbf-487e-b4f1-7505b298dd30.12 /.shard/4c256d85-11ef-41bb-bfac-2a5549640023.15 Status: Connected Number of entries: 77 real 3m2.249s user 0m0.337s sys 0m0.263s
Hi Ravi, I still see that when I/O is happening on the volume heal info still takes more than 3 minutes for the first time and after that the time taken is almost equal to 3 minutes. I ran the test thrice and below are the results. glusterfs version used: =========================== glusterfs-3.8.4-42.el7rhgs.x86_64 Trial1 : time taken for heal info command to return: =============================================== [root@yarrow ~]# time gluster volume heal data info Brick 10.70.36.78:/gluster_bricks/data/data Status: Connected Number of entries: 0 Brick 10.70.36.77:/gluster_bricks/data/data Status: Connected Number of entries: 0 Brick 10.70.36.76:/gluster_bricks/data/data Status: Connected Number of entries: 0 real 3m9.958s user 0m0.756s sys 0m0.595s Trial2 : ================================= [root@yarrow ~]# time gluster volume heal data info Brick 10.70.36.78:/gluster_bricks/data/data Status: Connected Number of entries: 0 Brick 10.70.36.77:/gluster_bricks/data/data Status: Connected Number of entries: 0 Brick 10.70.36.76:/gluster_bricks/data/data Status: Connected Number of entries: 0 real 2m33.913s user 0m0.727s sys 0m0.516s Trail3: ============================ [root@yarrow ~]# time gluster volume heal data info Brick 10.70.36.78:/gluster_bricks/data/data Status: Connected Number of entries: 0 Brick 10.70.36.77:/gluster_bricks/data/data Status: Connected Number of entries: 0 Brick 10.70.36.76:/gluster_bricks/data/data Status: Connected Number of entries: 0 real 2m44.595s user 0m0.695s sys 0m0.528s gluster volume info data: =================================== [root@zod ~]# gluster volume info data Volume Name: data Type: Replicate Volume ID: 0763bd54-ea1c-4707-8abb-3d80d62df20f Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.36.78:/gluster_bricks/data/data Brick2: 10.70.36.77:/gluster_bricks/data/data Brick3: 10.70.36.76:/gluster_bricks/data/data Options Reconfigured: nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on cluster.granular-entry-heal: enable features.shard-block-size: 64MB server.ssl: on client.ssl: on auth.ssl-allow: 10.70.36.78,10.70.36.77,10.70.36.76
Looked at Kasturi's setup and found that when fio was run, around 150 odd entries were getting created inside indices/dirty folder. heal-info needs to process all the entries which is why it was taking time. Tried running a 'dd' inside the VM which created only a couple of files, and heal-info completed quickly.
(In reply to Ravishankar N from comment #9) > Looked at Kasturi's setup and found that when fio was run, around 150 odd > entries were getting created inside indices/dirty folder. heal-info needs to > process all the entries which is why it was taking time. Tried running a > 'dd' inside the VM which created only a couple of files, and heal-info > completed quickly. Hi Pranith, do you think it is reasonable to close this as not a bug? So around 150 entries x 3 bricks = 450 entries needed to be processed by glfsheal. IIRC this run took about 2 minutes when we tried.
(In reply to Ravishankar N from comment #10) > (In reply to Ravishankar N from comment #9) > > Looked at Kasturi's setup and found that when fio was run, around 150 odd > > entries were getting created inside indices/dirty folder. heal-info needs to > > process all the entries which is why it was taking time. Tried running a > > 'dd' inside the VM which created only a couple of files, and heal-info > > completed quickly. > > Hi Pranith, do you think it is reasonable to close this as not a bug? So > around 150 entries x 3 bricks = 450 entries needed to be processed by > glfsheal. IIRC this run took about 2 minutes when we tried. I think it is a bug that needs to be fixed. Waiting for 2 minutes for showing 150 files is not good from usability p.o.v
Moving this out of 3.4.0 after discussing with Kasturi. While the current behaviour is in line with how heal-info is implemented, we should try to improve its response time as comment #11 indicates. Note to self: Look at changes made in EC to reduce heal-info times and see if AFR can do something similar. https://review.gluster.org/#/c/15543/ https://review.gluster.org/#/c/16468/ https://review.gluster.org/#/c/17923/