Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1348703

Summary: fsck.gfs2: "undo" functions can stop too early on duplicates
Product: Red Hat Enterprise Linux 7 Reporter: Robert Peterson <rpeterso>
Component: gfs2-utilsAssignee: Robert Peterson <rpeterso>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3CC: cluster-maint, djansa, gfs2-maint, jpayne
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: gfs2-utils-3.1.9-2.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 06:31:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Proposed RHEL7 patch
none
Additional non-metadata blocks required none

Description Robert Peterson 2016-06-21 20:11:20 UTC
Description of problem:
I ran into this while testing some recent performance
changes to fsck.gfs2, but the older versions have it as well.
(So I'm not sure why I haven't seen it before). Basically
the problem is: the "undo" functions in pass1 can stop early
rather than undoing everything they should. This leads to
blocks that are not properly freed until fsck.gfs2 is run a
second time.

Version-Release number of selected component (if applicable):
All

How reproducible:
Not sure. Seems pretty reliable for me.

Steps to Reproduce:
1.gfs2_edit restoremeta /home/bob/metadata/gfs/dirty/Case_00434286_metadata_vg3 /dev/emc/scratch
2.fsck.gfs2 -y /dev/emc/scratch &> /tmp/fsck.out
3.fsck.gfs2 /dev/emc/scratch
(Run fsck.gfs2 a second time)

Actual results:
Starting pass1
Large file at 196 (0xc4) - 100 percent complete.
Reconciling bitmaps.
Block 432864494 (0x19ccfcee) bitmap says 1 (data) but FSCK saw 0 (free)
Fix bitmap for block 432864494 (0x19ccfcee) ? (y/n) 

Expected results:
Block 432864494 (0x19ccfcee) should have been freed during the
first run of fsck.gfs2, and the second run should have come up
clean.

Additional info:
I've got a patch for this. See the patch description for more details.

Comment 1 Robert Peterson 2016-06-21 20:12:09 UTC
Created attachment 1170387 [details]
Proposed RHEL7 patch

Comment 2 Robert Peterson 2016-07-05 14:41:17 UTC
Created attachment 1176465 [details]
Additional non-metadata blocks required

Notes on recreation for QE:

Like so many fsck.gfs2 problems I've found, this one requires
some special non-saved metadata blocks to recreate it (see
the attached file). Without this additional bit, it won't recreate.
So to reliably recreate the problem, do this:

gfs2_edit restoremeta /home/bob/metadata/gfs/dirty/Case_00434286_metadata_vg3 <device>
dd if=/home/bob/bz1348703.blk.0x19ccfd10.to.4f of=<device> bs=4096 seek=432864528
fsck.gfs2 -y <device> &> /tmp/fsck.out
fsck.gfs2 <device>

Actual output:

[root@gfs-i24c-01 /home/bob]# gfs2_edit restoremeta /home/bob/metadata/gfs/dirty/Case_00434286_metadata_vg3 /dev/monsta/scratch1 
No valid file header found. Falling back to old format...
Block size is 4096B
This is gfs1 metadata.
There are 536870912 free blocks on the destination device.
Highest saved block is 438957066 (0x1a29f40a)
438957067 blocks processed, 1290843 saved (100%)
File /home/bob/metadata/gfs/dirty/Case_00434286_metadata_vg3 restore successful.
You have new mail in /var/spool/mail/root
[root@gfs-i24c-01 /home/bob]# dd if=/home/bob/bz1348703.blk.0x19ccfd10.to.4f of=/dev/monsta/scratch1 bs=4096 seek=432864528
64+0 records in
64+0 records out
262144 bytes (262 kB) copied, 0.00216448 s, 121 MB/s
[root@gfs-i24c-01 /home/bob]# sync
[root@gfs-i24c-01 /home/bob]# fsck.gfs2 -y /dev/monsta/scratch1 &> /tmp/fsck.1348703.out
[root@gfs-i24c-01 /home/bob]# echo $?
1
[root@gfs-i24c-01 /home/bob]# fsck.gfs2 /dev/monsta/scratch1
Initializing fsck
Validating resource group index.
Level 1 resource group check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Clearing GFS journals (this may take a while)
..
Journals cleared.

Journal recovery complete.
Starting pass1
Large file at 196 (0xc4) - 100 percent complete.                                    
Reconciling bitmaps.
Block 432864494 (0x19ccfcee) bitmap says 1 (data) but FSCK saw 0 (free)
Fix bitmap for block 432864494 (0x19ccfcee) ? (y/n)

Comment 3 Robert Peterson 2016-07-06 13:05:18 UTC
The upstream patch is now pushed here:
https://git.fedorahosted.org/cgit/gfs2-utils.git/commit/?id=6b40deabbbbd59a1cbddbe633984214b6752d25a

The RHEL7 patch is now pushed here:
https://git.fedorahosted.org/cgit/gfs2-utils.git/commit/?h=RHEL7&id=0a0d4af99c9be8abc863e272d6e7f46bb6ad4b41

It was tested on gfs-i24c-01.mpc.lab.eng.bos.redhat.com.
Changing status to POST until it gets built into a new
gfs2-utils for RHEL7.3.

Comment 6 Justin Payne 2016-08-17 13:55:45 UTC
Verified in gfs2-utils-3.1.9-3.el7:

[root@south-16 ~]# rpm -q gfs2-utils
gfs2-utils-3.1.9-3.el7.x86_64
[root@south-16 ~]# gfs2_edit restoremeta Case_00434286_metadata_vg3 /dev/sdb1
No valid file header found. Falling back to old format...
Block size is 4096B
This is gfs1 metadata.
There are 469649399 free blocks on the destination device.
Highest saved block is 438957066 (0x1a29f40a)
438957067 blocks processed, 1290843 saved (100%)
File Case_00434286_metadata_vg3 restore successful.
[root@south-16 ~]# fsck.gfs2 -y /dev/sdb1 &> /tmp/fsck.out
[root@south-16 ~]# fsck.gfs2 /dev/sdb1
Initializing fsck
Validating resource group index.
Level 1 resource group check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Clearing GFS journals (this may take a while)
..
Journals cleared.

Journal recovery complete.
Starting pass1
Large file at 196 (0xc4) - 100 percent complete.                                    
Reconciling bitmaps.
reconcile_bitmaps completed in 8.286s
pass1 completed in 2m4.530s
Starting pass1b
pass1b completed in 0.000s
Starting pass2
pass2 completed in 55.084s
Starting pass3
pass3 completed in 0.000s
Starting pass4
pass4 completed in 2.115s
Starting check_statfs
check_statfs completed in 0.000s
gfs2_fsck complete

Comment 8 errata-xmlrpc 2016-11-04 06:31:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2438.html