Bug 1403120
Summary: | Files remain unhealed forever if shd is disabled and re-enabled while healing is in progress. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ravishankar N <ravishankar> |
Component: | replicate | Assignee: | Ravishankar N <ravishankar> |
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.2 | CC: | amukherj, bugs, rcyriac, rhinduja, rhs-bugs, storage-qa-internal |
Target Milestone: | --- | ||
Target Release: | RHGS 3.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.8.4-9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1402841 | Environment: | |
Last Closed: | 2017-03-23 05:55:23 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1402841, 1403187, 1403192 | ||
Bug Blocks: | 1351528 |
Description
Ravishankar N
2016-12-09 06:11:30 UTC
Downstream patch: https://code.engineering.redhat.com/gerrit/#/c/92588/ validation: I have run the test on 3.8.4-10 and the fix is working 1. Create a 1x2 replica vol using a 2 node cluster. 2. Fuse mount the vol and create 2000 files 3. Bring one brick down, write to those files, leading to 2000 pending data heals. 4. Bring back the brick and launch index heal 5. The shd log on the source brick prints completed heals for the the processed files. 6. Before the heal completes, do a `gluster vol set volname self-heal-daemon off` 7. The heal stops as expected. 8. Re-enable the shd: `gluster vol set volname self-heal-daemon on` 9. Observe the shd log, the heal started to work and shd log gets populated with heal info moving to verified While verifying, I hit the bz 1409084 - heal enable/disable is restarting the selfheal deamon we don't see any files getting healed. [root@dhcp46-131 ~]# gluster v status nagtest Status of volume: nagtest Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.111:/bricks/brick5/nagtest 49157 0 Y 24148 Brick 10.70.46.115:/bricks/brick5/nagtest 49157 0 Y 22323 Brick 10.70.46.139:/bricks/brick5/nagtest 49157 0 Y 29066 Brick 10.70.46.124:/bricks/brick5/nagtest 49152 0 Y 21470 Self-heal Daemon on localhost N/A N/A Y 25456 Self-heal Daemon on dhcp46-152.lab.eng.blr. redhat.com N/A N/A Y 20233 Self-heal Daemon on dhcp46-124.lab.eng.blr. redhat.com N/A N/A Y 21505 Self-heal Daemon on dhcp46-139.lab.eng.blr. redhat.com N/A N/A Y 7467 Self-heal Daemon on dhcp46-111.lab.eng.blr. redhat.com N/A N/A Y 24181 Self-heal Daemon on dhcp46-115.lab.eng.blr. redhat.com N/A N/A Y 32083 Task Status of Volume nagtest ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp46-131 ~]# gluster v info nagtest Volume Name: nagtest Type: Distributed-Replicate Volume ID: df313590-6db1-47ff-ab4e-6167d681ee80 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.46.111:/bricks/brick5/nagtest Brick2: 10.70.46.115:/bricks/brick5/nagtest Brick3: 10.70.46.139:/bricks/brick5/nagtest Brick4: 10.70.46.124:/bricks/brick5/nagtest Options Reconfigured: ganesha.enable: on cluster.self-heal-daemon: enable cluster.use-compound-fops: on performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: off transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |