Bug 1743988
Summary: | Setting cluster.heal-timeout requires volume restart | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Glen K <glenk1973> | |
Component: | selfheal | Assignee: | Ravishankar N <ravishankar> | |
Status: | CLOSED NEXTRELEASE | QA Contact: | ||
Severity: | low | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 6 | CC: | bugs, ravishankar | |
Target Milestone: | --- | Keywords: | Triaged | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1744548 (view as bug list) | Environment: | ||
Last Closed: | 2019-09-06 08:13:38 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1744548, 1747301, 1807431 |
Description
Glen K
2019-08-21 06:01:00 UTC
I should add that `cluster.quorum-type` is set to `none` for the test. Okay, so after some investigation, I don't think this is an issue. When you change the heal-timeout, it does get propagated to the self-heal daemon. But since the default value is 600 seconds, the threads that do the heal only wake up after that time. Once it wakes up, subsequent runs do seem to honour the new heal-timeout value. On a glusterfs 6.5 setup: #gluster v create testvol replica 2 127.0.0.2:/home/ravi/bricks/brick{1..2} force #gluster v set testvol client-log-level DEBUG #gluster v start testvol #gluster v set testvol heal-timeout 5 #tail -f /var/log/glusterfs/glustershd.log|grep finished You don't see anything in the log yet about the crawls. But once you manually launch heal, the threads are woken up and further crawls happen every 5 seconds. #gluster v heal testvol Now in glustershd.log: [2019-08-21 09:55:02.024160] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-0. [2019-08-21 09:55:02.024271] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-1. [2019-08-21 09:55:08.023252] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-1. [2019-08-21 09:55:08.023358] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-0. [2019-08-21 09:55:14.024438] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-1. [2019-08-21 09:55:14.024546] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-0. Glen, could you check if that works for you? i.e. after setting the heal-timeout, manually launch heal via `gluster v heal testvol`. In my steps above, I set the heal-timeout while the self-heal daemon is stopped: ... 4. Stop the self-heal daemon. ... 15. Set `cluster.heal-timeout` to `60`. 16. Start the self-heal daemon. ... I would expect that the configuration would certainly take effect after a restart of the self-heal daemon. Yes, launching heal manually causes the heal to happen right away, but the purpose of the test is to verify the heal happens automatically. From a user perspective, the current behaviour of the heal-timeout setting appears to be at odds with the "configuration changes take effect without restart" feature; I think it is reasonable to request that changing the heal-timeout setting results in the thread sleeps being reset to the new setting. (In reply to Glen K from comment #3) > > I would expect that the configuration would certainly take effect after a > restart of the self-heal daemon. In step-4 and 16, I assume you toggled `cluster.self-heal-daemon` off and on respectively. This actually does not kill the shd process per se and just disables/enables the heal crawls. In 6.5, a volume start force does restart shd so changing the order of the tests should do the trick, i.e. 13. Set `cluster.heal-timeout` to `60`. 14. Force start the volume. 15. Verify that "split-brain" appears in the output of `gluster volume heal <volume> info` command. > Yes, launching heal manually causes the heal to happen right away, but the > purpose of the test is to verify the heal happens automatically. From a user > perspective, the current behaviour of the heal-timeout setting appears to be > at odds with the "configuration changes take effect without restart" > feature; I think it is reasonable to request that changing the heal-timeout > setting results in the thread sleeps being reset to the new setting. Fair enough, I'll attempt a fix on master, let us see how the review goes. REVIEW: https://review.gluster.org/23332 (afr: wake up index healer threads) posted (#1) for review on release-6 by Ravishankar N REVIEW: https://review.gluster.org/23332 (afr: wake up index healer threads) merged (#3) on release-6 by Ravishankar N |