Bug 1414287
Summary: | repeated operation failed warnings in gluster mount logs with disperse volume | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Ashish Pandey <aspandey> | |
Component: | disperse | Assignee: | Ashish Pandey <aspandey> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | mainline | CC: | amukherj, asoman, bugs, mpillai, nchilaka, pkarampu, rcyriac, rgowdapp, rhs-bugs, rsussman, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.11.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1406322 | |||
: | 1415082 1419824 (view as bug list) | Environment: | ||
Last Closed: | 2017-05-30 18:38:45 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1406322, 1415082, 1419824, 1426559 |
Comment 1
Ashish Pandey
2017-01-18 09:12:30 UTC
REVIEW: http://review.gluster.org/16435 (cluster/disperse: Do not log fop failed for lockless fops) posted (#1) for review on master by Ashish Pandey (aspandey) REVIEW: http://review.gluster.org/16435 (cluster/disperse: Do not log fop failed for lockless fops) posted (#2) for review on master by Ashish Pandey (aspandey) REVIEW: https://review.gluster.org/16468 (cluster/ec: Don't trigger heal on Lookups) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu) REVIEW: https://review.gluster.org/16468 (cluster/ec: Don't trigger heal on Lookups) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu) REVIEW: https://review.gluster.org/16468 (cluster/ec: Don't trigger heal on Lookups) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu) REVIEW: https://review.gluster.org/16468 (cluster/ec: Don't trigger data/metadata heal on Lookups) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu) REVIEW: https://review.gluster.org/16468 (cluster/ec: Don't trigger data/metadata heal on Lookups) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu) REVIEW: https://review.gluster.org/16468 (cluster/ec: Don't trigger data/metadata heal on Lookups) posted (#6) for review on master by Pranith Kumar Karampuri (pkarampu) REVIEW: https://review.gluster.org/16468 (cluster/ec: Don't trigger data/metadata heal on Lookups) posted (#7) for review on master by Pranith Kumar Karampuri (pkarampu) Pranith, While working on one of my patch, I observed that when all the bricks are UP and everything is fine, heal info was listing entries for one of the random brick while IO was going on. I removed following patch - https://review.gluster.org/#/c/16377/ and then did not observe this issue. I think the above patch is causing some issue. Even with the latest patch sent by you https://review.gluster.org/16468 , I am seeing these entries. Steps - 1 - create a volume with or without your patch- 2 - mount the volume and start creating files on mount point using following command - for i in {1..10000}; do dd if=/dev/zero of=test-$i count=1 bs=1M; done 3 - Watch heal info from different terminal. watch gluster v heal vol info [root@apandey glusterfs]# gluster v heal vol info Brick apandey:/brick/gluster/vol-1 /test-379 /test-357 /test-350 /test-397 /test-333 /test-394 /test-371 /test-355 /test-339 /test-359 /test-336 /test-318 /test-348 /test-367 /test-356 /test-335 /test-384 /test-395 /test-389 /test-380 /test-330 /test-385 /test-329 /test-368 /test-337 /test-340 /test-341 /test-332 /test-402 /test-386 /test-353 /test-361 /test-382 /test-401 /test-362 /test-393 /test-372 /test-381 /test-390 /test-370 /test-369 /test-399 /test-375 /test-377 /test-343 /test-364 /test-351 /test-363 /test-354 /test-331 /test-346 /test-378 /test-342 /test-338 /test-396 /test-365 /test-376 /test-383 /test-360 /test-347 /test-373 /test-392 /test-400 /test-349 /test-352 /test-345 /test-334 /test-391 /test-366 /test-387 /test-344 /test-388 /test-358 /test-374 /test-398 Status: Connected Number of entries: 75 Brick apandey:/brick/gluster/vol-2 Status: Connected Number of entries: 0 Brick apandey:/brick/gluster/vol-3 Status: Connected Number of entries: 0 Brick apandey:/brick/gluster/vol-4 Status: Connected Number of entries: 0 Brick apandey:/brick/gluster/vol-5 Status: Connected Number of entries: 0 Brick apandey:/brick/gluster/vol-6 Status: Connected Number of entries: 0 (In reply to Ashish Pandey from comment #11) > Pranith, > > While working on one of my patch, I observed that when all the bricks are UP > and everything is fine, heal info was listing entries for one of the random > brick while IO was going on. > > I removed following patch - > https://review.gluster.org/#/c/16377/ and then did not observe this issue. > > I think the above patch is causing some issue. > Even with the latest patch sent by you https://review.gluster.org/16468 , I > am seeing these entries. > > > Steps - > > 1 - create a volume with or without your patch- > 2 - mount the volume and start creating files on mount point using following > command - > for i in {1..10000}; do dd if=/dev/zero of=test-$i count=1 bs=1M; done > > 3 - Watch heal info from different terminal. > watch gluster v heal vol info > > > [root@apandey glusterfs]# gluster v heal vol info > Brick apandey:/brick/gluster/vol-1 > /test-379 > /test-357 > /test-350 > /test-397 > /test-333 > /test-394 > /test-371 > /test-355 > /test-339 > /test-359 > /test-336 > /test-318 > /test-348 > /test-367 > /test-356 > /test-335 > /test-384 > /test-395 > /test-389 > /test-380 > /test-330 > /test-385 > /test-329 > /test-368 > /test-337 > /test-340 > /test-341 > /test-332 > /test-402 > /test-386 > /test-353 > /test-361 > /test-382 > /test-401 > /test-362 > /test-393 > /test-372 > /test-381 > /test-390 > /test-370 > /test-369 > /test-399 > /test-375 > /test-377 > /test-343 > /test-364 > /test-351 > /test-363 > /test-354 > /test-331 > /test-346 > /test-378 > /test-342 > /test-338 > /test-396 > /test-365 > /test-376 > /test-383 > /test-360 > /test-347 > /test-373 > /test-392 > /test-400 > /test-349 > /test-352 > /test-345 > /test-334 > /test-391 > /test-366 > /test-387 > /test-344 > /test-388 > /test-358 > /test-374 > /test-398 > Status: Connected > Number of entries: 75 > > Brick apandey:/brick/gluster/vol-2 > Status: Connected > Number of entries: 0 > > Brick apandey:/brick/gluster/vol-3 > Status: Connected > Number of entries: 0 > > Brick apandey:/brick/gluster/vol-4 > Status: Connected > Number of entries: 0 > > Brick apandey:/brick/gluster/vol-5 > Status: Connected > Number of entries: 0 > > Brick apandey:/brick/gluster/vol-6 > Status: Connected > Number of entries: 0 Good catch. Let me try and fix this too. REVIEW: https://review.gluster.org/16468 (cluster/ec: Don't trigger data/metadata heal on Lookups) posted (#8) for review on master by Pranith Kumar Karampuri (pkarampu) REVIEW: https://review.gluster.org/16468 (cluster/ec: Don't trigger data/metadata heal on Lookups) posted (#9) for review on master by Pranith Kumar Karampuri (pkarampu) REVIEW: https://review.gluster.org/16468 (cluster/ec: Don't trigger data/metadata heal on Lookups) posted (#10) for review on master by Pranith Kumar Karampuri (pkarampu) REVIEW: https://review.gluster.org/16468 (cluster/ec: Don't trigger data/metadata heal on Lookups) posted (#11) for review on master by Pranith Kumar Karampuri (pkarampu) COMMIT: https://review.gluster.org/16468 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit c1fc1fc9cb5a13e6ddf8c9270deb0c7609333540 Author: Pranith Kumar K <pkarampu> Date: Wed Jan 25 15:31:44 2017 +0530 cluster/ec: Don't trigger data/metadata heal on Lookups Problem-1 If Lookup which doesn't take any locks observes version mismatch it can't be trusted. If we launch a heal based on this information it will lead to self-heals which will affect I/O performance in the cases where Lookup is wrong. Considering self-heal-daemon and operations on the inode from client which take locks can still trigger heal we can choose to not attempt a heal on Lookup. Problem-2: Fixed spurious failure of tests/bitrot/bug-1373520.t For the issues above, what was happening was that ec_heal_inspect() is preventing 'name' heal to happen Problem-3: tests/basic/ec/ec-background-heals.t To be honest I don't know what the problem was, while fixing the 2 problems above, I made some changes to ec_heal_inspect() and ec_need_heal() after which when I tried to recreate the spurious failure it just didn't happen even after a long time. BUG: 1414287 Signed-off-by: Pranith Kumar K <pkarampu> Change-Id: Ife2535e1d0b267712973673f6d474e288f3c6834 Reviewed-on: https://review.gluster.org/16468 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Xavier Hernandez <xhernandez> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Ashish Pandey <aspandey> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report. glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html [2] https://www.gluster.org/pipermail/gluster-users/ |