Bug 1464359
| Summary: | selfheal deamon cpu consumption not reducing when IOs are going on and all redundant bricks are brought down one after another | |||
|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Ashish Pandey <aspandey> | |
| Component: | disperse | Assignee: | Ashish Pandey <aspandey> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | mainline | CC: | bugs, nchilaka, pkarampu, rhs-bugs, sheggodu, storage-qa-internal | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.12.0 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1464336 | |||
| : | 1468457 (view as bug list) | Environment: | ||
| Last Closed: | 2017-08-21 08:05:13 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1464336 | |||
| Bug Blocks: | 1468457 | |||
|
Comment 1
Ashish Pandey
2017-06-23 09:27:28 UTC
RCA: SHD takes care of this scenario when redundant number of bricks are down. SHD does not trigger heal in this case. Now consider 4+2 volume. When a continuous IO is going on and 2 bricks are down, an update fop will find out that the fop is not successful on 2 bricks and so it will immediately trigger a heal. This is what client side heal does. In this case also, it is of NO use to trigger a heal when you can not actually heal a file as 2 bricks are down. This is causing unnecessary CPU hogging. Solution: 1 - While triggering a client side heal check if more than 4 bricks are UP or not. Trigger heal accordingly. OR 2 - Disable background heal as soon as 2 bricks go down and we can not heal. Enable it again as soon as we see more than 4 bricks UP. I think [1] would be a better solution which can further be improved on. What if the brick, which requires heal, is down. Even in this case we should not trigger heal. REVIEW: https://review.gluster.org/17692 (cluster/ec : Don't try to heal when no sink is UP) posted (#1) for review on master by Ashish Pandey (aspandey) COMMIT: https://review.gluster.org/17692 committed in master by Xavier Hernandez (xhernandez) ------ commit 0ae38df6403942a2438404d46a6e05b503db3485 Author: Ashish Pandey <aspandey> Date: Tue Jul 4 16:18:20 2017 +0530 cluster/ec : Don't try to heal when no sink is UP Problem: 4 + 2 EC volume configuration. If untar of linux is going on and we kill a brick, indices will be created for the files/dir which need to be healed. ec_shd_index_sweep spawns threads to scan these entries and start heal. If in the middle of this we kill one more brick, we end up in a situation where we can not heal an entry as there are only "ec->fragment" number of bricks are UP. However, the scan will be continued and it will trigger the heal for those entries. Solution: When a heal is triggered for an entry, check if it *CAN* be healed or not. If not come out with ENOTCONN. Change-Id: I305be7701c289f36bd7bde22491b71074771424f BUG: 1464359 Signed-off-by: Ashish Pandey <aspandey> Reviewed-on: https://review.gluster.org/17692 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> Reviewed-by: Sunil Kumar Acharya <sheggodu> Reviewed-by: Xavier Hernandez <xhernandez> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/ |