Bug 1206149
| Summary: | fsck.gfs2: Duplicate refs inside one dinode are not fixed | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Robert Peterson <rpeterso> | |
| Component: | cluster | Assignee: | Robert Peterson <rpeterso> | |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 6.7 | CC: | ccaulfie, cluster-maint, dwysocha, gfs2-maint, jpayne, phracek, rpeterso, sbradley, teigland | |
| Target Milestone: | rc | Keywords: | Patch, ZStream | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | cluster-3.0.12.1-74.el6 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1236669 1277030 (view as bug list) | Environment: | ||
| Last Closed: | 2016-05-10 19:06:16 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1172231, 1236669, 1268411, 1277030 | |||
| Attachments: | ||||
|
Description
Robert Peterson
2015-03-26 12:51:05 UTC
This section of fsck.gfs2, pass1b, is very delicate and prone to breaking. For now I'm bumping it to 6.8 to give us more time for testing and such. We do need to fix this, so I'm setting it to assigned and devel_ack+. Did you have a time to look at the problem? Is there any progress? Do we have a patch for this problem? I know what the problem is, but it's not easy to fix without severely impacting fsck.gfs2 performance. Progress has been slow because of other high-priority problems. The fsck issue is serious, but this condition should be extremely rare. This is the first time I've seen it in 9 years, and I have other issues that are impacting customers on a daily basis. Sorry, I don't have a patch for the problem at this time. Created attachment 1044413 [details] Patch 1 of 2: fsck.gfs2: Change duptree structure to have generic flags This is the first of two patches for bug #1206149. It does not change functionality. It merely changes the duptree structure to use a generic flag (dup_flags) rather than a highly specialize flag (first_ref_found). This will allow for a future patch that adds a new flag for marking situations where duplicates are found within one dinode, without adding more memory to the structure. Created attachment 1044414 [details]
Patch 2 of 2: fsck.gfs2: Detect, fix and clone duplicate block refs within a dinode
Prior to this patch, fsck.gfs2 was unable to detect and fix duplicate
block references within the same file. This patch detects when data
blocks are duplicated within a dinode, then tries to clone the data
to a new block.
Created attachment 1081380 [details]
Patch 2 of 2: fsck.gfs2: Detect, fix and clone duplicate block refs within a dinode
This version fixes some obvious mistakes in the first version,
but they're related to differences between rhel7 and rhel6
gfs2-utils. This version is tested and fixes the failing case
as well as several other metadata sets.
I have no problem shipping these latest two patches for RHEL6.8. They are tested and fix the failing test case plus some other tough metadata sets, as I mentioned in comment #8. However, the resulting fsck.gfs2 will not pass my latest version of the fsck.gfs2.nightmare2.sh test because I've added several metadata sets that have unique corruption. Those new cases of metadata corruption are fixed by the 20 new patches to fsck.gfs2 in RHEL7 bug #1257625. I'm debating whether I want to close that bug to RHEL6 so I can port those 20 as well. The two latest patches were pushed to the rhel6 branch of the cluster.git repo for inclusion into rhel6.8. They are already in the upstream gfs2-utils repo. They were tested on system gfs-a16c-04. I tested the failing case and also several of the toughest metadata sets from my fsck.gfs2.nightmare2.sh test. I was unable to complete the entire test as explained in comment #9. Changing status to POST until the patches are built into a new release of the cluster rpms. Verified in gfs2-utils-3.0.12.1-77.el6 [root@host-002 ~]# rpm -q gfs2-utils gfs2-utils-3.0.12.1-77.el6.x86_64 [root@host-002 ~]# gfs2_edit restoremeta T9479dsb01a_mqm_data_metadata /dev/sda Savemeta file format 1 Created Wed Mar 25 16:14:54 2015 File system size 999.1023G There are 314572800 blocks of 4096 bytes in the destination device. 314572800 inodes processed, 152638 blocks saved (100%) processed, File T9479dsb01a_mqm_data_metadata restore successful. [root@host-002 ~]# fsck.gfs2 -y /dev/sda &> /tmp/fsck.out [root@host-002 ~]# fsck.gfs2 /dev/sda Initializing fsck Validating Resource Group index. Level 1 rgrp check: Checking if all rgrp and rindex values are good. (level 1 passed) Starting pass1 Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Pass5 complete gfs2_fsck complete Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0729.html |