| Summary: | [Bitrot]: Document the change in steps to heal a detected corruption | ||
|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Sweta Anandpara <sanandpa> |
| Component: | doc-Administration_Guide | Assignee: | Laura Bailey <lbailey> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Sweta Anandpara <sanandpa> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.2 | CC: | asriram, khiremat, lbailey, pgurusid, rhs-bugs, rwheeler, sanandpa, sankarshan, storage-doc, storage-qa-internal |
| Target Milestone: | --- | ||
| Target Release: | RHGS 3.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-03-24 01:09:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 1351553 | ||
|
Description
Sweta Anandpara
2016-11-04 06:58:37 UTC
The overall procedure that is mentioned above looks okay, Laura. Regarding the questions: >> - Can you give me a brief overview of why these options need to be set this way during file recovery? The steps to recovery require change as, as there were changes introduced with enabling md-cache. >> - Is it okay for customers to use the same mount point, or should they create a separate recovery mount point? I just want to check, since the description specifically mentions a "new" mount point. It is safe to create a separate mountpoint with the options mentioned, as the existing mountpoint might not have the options 'entry-timeout' and 'attribute-timeout' set to '0'. >> - Do the entry and attribute timeouts also need to be reset following heal? entry and attribute timeouts are just mount options, and we set those specifically by creating a new mountpoint, only when we would want to perform a recovery. The old mountpoints can continue with their workload, as is. Kotresh, would you please confirm if all that is mentioned above is right? Or if there are any misses anywhere.. The change in steps is nothing related to 3.2 md-cache enhancements. It is applicable in general for the releases < 3.2 as well. Reason: The steps given in admin guide mostly works for all the cases except for few corner cases where the lookup doesn't reach server and being served from performace xlators. Bitrot relies on lookup to clear bad file attribute in inode context. With md-cache enabled and with default entry-timeout, attribute-timeout, the lookup might not reach server always. Hence bitrot would fail to clear the bad file attribute and hence causing recovery to fail on following recovery steps. The additional steps from new mount guaranties lookup to server and cleans up bad file attribute in inode context. So new additional steps: 1. Disable md-cache 2. Do a fresh mount (becuase let's not disturb existing mounts) with entry-timeout and attribute-timeout being zero 3. Access the corrupted files from the fresh mount, say stat each corrupted file. 4. Enable md-cache 5. Umount the fresh mount done. The above steps by itself will heal the corrupted files if client side self heal is enabled. If it is not enabled, as mentioned in the admin guide excplicit heal is required. The new steps we are talking about is under Step 5. 'Heal the file'.
Let me give the complete steps to avoid confusion.
Procedure 20.1. Restoring a bad file from a replicate volume
1. Note the identifiers of bad files
2. Determine the path of each corrupted object
3. Delete the corrupted files
4. Delete the GFID file
5. Heal the file
5.1 If enabled, disable md-cache.
5.2 Create a mount point to use for this recovery process.
For example, mkdir /mnt/recovery
5.3 Mount the volume with --entry-timeout and --attribute-timeout
set to 0.
5.4 Access all the corrupted files from the /mnt/recovery mountpoint.
If hardlinks are present, access each hardlink.
For example: stat /mnt/recovery/<corrupted file>
5.5 The step 5.4 heals the corrupted file if client self heal is enabled.
If it is healed ignore this step. If it is not then heal the files
as below.
# gluster volume heal VOLNAME
5.6 Enable md-cache if disabled in step 5.1
5.7 Umount /mnt/recovery
And I think disabling stat-prefetch is sufficient to disable md-cache. Rest of the options will have no effect. Putting needinfo Poornima to confirm the same.
(In reply to Kotresh HR from comment #8) > The new steps we are talking about is under Step 5. 'Heal the file'. > Let me give the complete steps to avoid confusion. > > Procedure 20.1. Restoring a bad file from a replicate volume > 1. Note the identifiers of bad files > 2. Determine the path of each corrupted object > 3. Delete the corrupted files > 4. Delete the GFID file > 5. Heal the file > 5.1 If enabled, disable md-cache. > 5.2 Create a mount point to use for this recovery process. > For example, mkdir /mnt/recovery > 5.3 Mount the volume with --entry-timeout and --attribute-timeout > set to 0. > 5.4 Access all the corrupted files from the /mnt/recovery mountpoint. > If hardlinks are present, access each hardlink. > For example: stat /mnt/recovery/<corrupted file> > 5.5 The step 5.4 heals the corrupted file if client self heal is > enabled. > If it is healed ignore this step. If it is not then heal the files > as below. > # gluster volume heal VOLNAME > 5.6 Enable md-cache if disabled in step 5.1 > 5.7 Umount /mnt/recovery > > And I think disabling stat-prefetch is sufficient to disable md-cache. Rest > of the options will have no effect. Putting needinfo Poornima to confirm the > same. Yes, that should be sufficient. 5.1 and 5.6 can be elaborated to specify the command to disable md-cache: #glustervolume set <VOLNAME> stat-prefetch on/off The section looks good, with all the necessary steps mentioned. Would like to propose a couple of minor amend. * Could you please interchange points 'e' and 'f' under 'Step 5: Restore the file'? It is good to get they system back to its original state in the (reverse) order in which it was changed. * Point d - If client heal is enabled, the user is not required to do any of points a,b,c,e,f. So, please put that in the beginning. Something like: Step5: Restore bad file If client self heal is enabled, it will automatically heal .... If is not enabled, follow the steps below: Point a: Point b: ... ... Point f Hi Laura, I think we'll have to stick to the changes that you had done in comment 15. Sorry for the confusion. Had another discussion with Kotresh, and I'll be raising a new BZ for the new changes after a decsion is taken. Could you please revert the changes to how they were before? Will revert today or tomorrow; leaving NI in place until I do so. (Working under another deadline, apologies for delay.) Looks good Laura. Final two comments: * Please interchange steps 'e' and 'f' as mentioned in comment 16. * Minor change in step 'd' , mentioned below: -------------- Presently ----------- If you have client self-heal enabled, access files and hard links to heal them. For example, run the stat command on the files and hard links you need to heal. $ stat /mnt/recovery/corrupt-file If you do not have client self-heal enabled, you must manually heal the volume with the following command. # gluster volume heal VOLNAME -------------- Expected ----------------- Access files and hard links to heal them. For example, run the stat command on the files and hard links you need to heal. $ stat /mnt/recovery/corrupt-file If you do not have client self-heal enabled, you must manually heal the volume with the following additional command. # gluster volume heal VOLNAME ------------------------------------------------- We can more this BZ to its closure once this is addressed. Perfect! Thanks for persisting with this, Laura. Moving this to verified in 3.2. Moving to CLOSED CURRENTRELEASE since RHGS 3.2 GA was yesterday. All documentation is available from https://access.redhat.com/documentation/en/red-hat-gluster-storage/. |