Bug 1332728

Summary: fsck.gfs2: Wrong value used for GFS1's Used metadata value in rgrp
Product: Red Hat Enterprise Linux 7 Reporter: Robert Peterson <rpeterso>
Component: gfs2-utilsAssignee: Robert Peterson <rpeterso>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.3CC: cluster-maint, gfs2-maint, jpayne
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: gfs2-utils-3.1.9-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 06:31:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1271674    
Bug Blocks:    
Attachments:
Description Flags
Script to count the data, metadata, free, etc., in a gfs1 file system
none
Upstream post-memsave patch to fix the problem
none
Patch 1 of 2 - fsck.gfs2: Use BLKST constants to make pass5 more clear
none
Patch 2 of 2 - fsck.gfs2: Fix GFS1 "used meta" accounting bug none

Description Robert Peterson 2016-05-03 23:01:50 UTC
Description of problem:
If you run fsck.gfs2 on a GFS1 file system, it will almost
certainly screw up the value for "used metadata" because it
confuses "metadata" and "data" in some cases. This only affects
GFS1 file systems.

Version-Release number of selected component (if applicable):
All

How reproducible:
Always

Steps to Reproduce:
1.gfs2_edit restoremeta /home/bob/metadata/gfs/dirty/lvol3.meta.before /dev/emc/scratch
2.gfs2_edit -p rgs /dev/emc/scratch |grep -A11 "RG #0"|grep rg_usedmeta
3.fsck.gfs2 -y dev/emc/scratch &> /tmp/fsck.out
4.gfs2_edit -p rgs /dev/emc/scratch |grep -A11 "RG #0"|grep rg_usedmeta

Actual results:
Before fsck:
  rg_usedmeta           24                  0x18
After fsck:
  rg_usedmeta           8633                0x21b9

Expected results:
  rg_usedmeta           577                 0x241

Additional info:
I have a patch that fixes the problem, plus a little tool script
I wrote to give better numbers using gfs2_edit. Here is the output
from the tool:

[root@gfs-i24c-01 ~]# /home/bob/tools/count_rg_dinodes.sh -r 17 -d /dev/emc/scratch
Searching for dinodes in rgrp 17
17 is a GFS2 Resource Group Header
18 is a GFS2 bitmap
19 is a GFS2 bitmap
20 is a GFS2 bitmap
21 is a GFS2 bitmap
Block: 129 , free:  0 , data: 0 , meta: 108 , inodes: 100
Block: 255 , free:  0 , data: 26 , meta: 208 , inodes: 200
Block: 676 , free:  0 , data: 342 , meta: 313 , inodes: 300
Block: 7858 , free:  4 , data: 7416 , meta: 417 , inodes: 400
Block: 8150 , free:  6 , data: 7603 , meta: 520 , inodes: 500
Block: 65562 , free:  55878 , data: 9080 , meta: 582 , inodes: 558

So the tool tells me there should really be about 582 "used meta"
blocks listed in the gfs1 rgrp. The fsck screwed it up.

Comment 1 Robert Peterson 2016-05-03 23:03:37 UTC
Created attachment 1153595 [details]
Script to count the data, metadata, free, etc., in a gfs1 file system

This little script tells you the real numbers for an rgrp.
But it's slow.

Comment 2 Robert Peterson 2016-05-03 23:08:02 UTC
Also note: The script was run before the fsck, and found 582
"used meta" blocks. The correct value of 577 blocks is after
fsck is run, so the fsck fixed some problems and deleted some
corrupt dinodes.

To recreate the problem, you don't need this set of metadata.
Almost any gfs1 metadata will work: You just need some data
blocks and metadata blocks in the same rgrp. I used this one
because it was small and fast.

Comment 3 Robert Peterson 2016-05-04 18:17:57 UTC
Created attachment 1153978 [details]
Upstream post-memsave patch to fix the problem

Here is my patch to fix the problem. It was built on top of
my "memsave10" version of fsck.gfs2, so lots of preceding patches.
However, it should generally work if applied to older fsck.gfs2.

Comment 4 Robert Peterson 2016-05-06 13:19:10 UTC
It turns out I was mis-remembering how GFS1 counted its metadata.
After careful examination, it turns out that "used meta" and
"used dinodes" are two separate and distinct numbers in GFS1.
"Used dinodes" is just what it says it is. "Used meta" in GFS1
means "Used 'other' metadata that is NOT dinode". So the metadata
set "gfs.clean" was taken right after a clean gfs_mkfs, and it
shows:

Block #17    (0x11) of 4294966272 (0xfffffc00) (rsrc grp hdr)

Resource Group Header:
  mh_magic              0x01161970                (hex)
  mh_type               2                         0x2
  mh_format             200                       0xc8
  rg_flags              0                         0x0
  rg_free               65592                     0x10038
  rg_useddi             5                         0x5
  rg_freedi             0                         0x0
  no_formal_ino         0                         0x0
  no_addr               0                         0x0
  rg_usedmeta           23                        0x17
  rg_freemeta           0                         0x0

The 23 (0x17) blocks that are "usedmeta" are really 23 blocks of
all rindex data, and does not include the 5 system dinodes.

Comment 5 Robert Peterson 2016-05-09 17:15:18 UTC
Created attachment 1155404 [details]
Patch 1 of 2 - fsck.gfs2: Use BLKST constants to make pass5 more clear

This patch just changes some constants to make pass5's accounting
more intuitively obvious.

Comment 6 Robert Peterson 2016-05-09 17:18:44 UTC
Created attachment 1155405 [details]
Patch 2 of 2 - fsck.gfs2: Fix GFS1 "used meta" accounting bug

This is a replacement upstream bug for this problem. The previous
version had a problem: Function check_block_status needs an else
in the accounting of "used meta" because in GFS1, a block marked
"metadata" (type 3) may either be counted as "dinode" or "used
metadata" but not counted as both. In other words, it's either or.

Comment 7 Robert Peterson 2016-05-09 17:22:12 UTC
My plan is to push the 2 patches to the upstream gfs2-utils
git tree, and rely upon the fact that bug #1271674 will pull
in those changes.

Comment 8 Robert Peterson 2016-05-13 13:19:37 UTC
These patches are now in the upstream gfs2-utils master branch.
Changing status to POST. They should be picked up automatically
as noted in comment #7.

Comment 11 Justin Payne 2016-08-17 12:04:54 UTC
Verified in gfs2-utils-3.1.9-3.el7:

[root@south-16 ~]# rpm -q gfs2-utils
gfs2-utils-3.1.9-3.el7.x86_64
[root@south-16 ~]# gfs2_edit restoremeta lvol3.meta.before /dev/sdb1
No valid file header found. Falling back to old format...
Block size is 4096B
This is gfs1 metadata.
There are 469649399 free blocks on the destination device.
Highest saved block is 6749199 (0x66fc0f)
6749200 blocks processed, 82461 saved (100%)
File lvol3.meta.before restore successful.
[root@south-16 ~]# gfs2_edit -p rgs /dev/sdb1 |grep -A11 "RG #0"|grep rg_usedmeta
  rg_usedmeta           24                  0x18
[root@south-16 ~]# fsck.gfs2 -y /dev/sdb1 &> /tmp/fsck.out
[root@south-16 ~]# gfs2_edit -p rgs /dev/sdb1 |grep -A11 "RG #0"|grep rg_usedmeta
  rg_usedmeta           21                  0x15

Comment 13 errata-xmlrpc 2016-11-04 06:31:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2438.html