Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Cause:
When fsck.gfs2 repairs the superblock, it tries to look up the locking configuration fields from cluster.conf using a simple strategy and assumes that it is being run from a cluster node.
Consequence:
When the superblock is repaired, it can have its lockproto and locktable fields set wrongly.
Fix:
The lockproto and locktable fields are now set to sensible default values and the user is informed that they should set these fields with tunegfs2 at the end of the fsck.gfs2 run.
Result:
fsck.gfs2 no longer looks at cluster.conf to rebuild the superblock.
Description of problem:
When fsck.gfs2 tries to rebuild a broken superblock it looks at cluster.conf on the local machine to find a locktable and lockproto to use. This makes some bad assumptions, such as the device is being fscked from a cluster node, and that if cluster.conf isn't present then lock_nolock should be used. The code which does this is also very sensitive to the layout of cluster.conf and perfectly valid formatting will confuse it.
Steps to Reproduce:
1. mkfs.gfs2 -t mycluster:mygfs2 /dev/foo
2. dd if=/dev/zero of=/dev/foo bs=1 seek=65540 count=4 conv=notrunc # Sets sb.mh_type to 0 to break the superblock
3. fsck.gfs2 -y /dev/foo
Actual results:
As I'm not on a cluster node, the lockproto is set to lock_nolock and the locktable is left blank. The warning about the lockproto and lock table scrolls off the screen.
Expected results:
fsck.gfs2 should set the fields to sane defaults ("unknown", "lock_dlm") and then warn the user at the end of the fsck that they should set the locktable with tunegfs2 before mounting the fs. This is more user friendly, less likely to fail and doesn't depend on any particular cluster infrastructure.
Additional info:
Upstream patch available: https://git.fedorahosted.org/cgit/gfs2-utils.git/commit/?id=49361e27628cf26554c1f690440dfca021713658
Verified in gfs2-utils-3.0.12.1-59.el6.x86_64
[root@dash-01 ~]# rpm -q gfs2-utils
gfs2-utils-3.0.12.1-49.el6.x86_64
[root@dash-01 ~]# mkfs.gfs2 -t mycluster:mygfs2 /dev/sda1
This will destroy any data on /dev/sda1.
It appears to contain: data
Are you sure you want to proceed? [y/n] y
Device: /dev/sda1
Blocksize: 4096
Device Size 50.01 GB (13109032 blocks)
Filesystem Size: 50.01 GB (13109031 blocks)
Journals: 1
Resource Groups: 201
Locking Protocol: "lock_dlm"
Lock Table: "mycluster:mygfs2"
UUID: a50dd396-b655-e90c-ccb6-0e4521dcba9b
[root@dash-01 ~]# dd if=/dev/zero of=/dev/sda1 bs=1 seek=65540 count=4 conv=notrunc
4+0 records in
4+0 records out
4 bytes (4 B) copied, 0.00541332 s, 0.7 kB/s
[root@dash-01 ~]# fsck.gfs2 -y /dev/sda1
Initializing fsck
Either the super block is corrupted, or this is not a GFS2 filesystem
Gathering information to repair the gfs2 superblock. This may take some time.
Block size determined to be: 4096
Found system jindex file at: 0x18
Found system per_node directory at: 0x805b
From per_node's '..' I backtracked the master directory to: 0x17
Found system statfs file at: 0x805d
Found system inum file at: 0x815f
Found system rindex file at: 0x8161
Found system quota file at: 0x8167
Lock protocol assumed to be: lock_dlm
Lock table determined to be: dash:sda1
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Pass4 complete
Starting pass5
Pass5 complete
Writing changes to disk
gfs2_fsck complete
[root@dash-01 ~]# rpm -q gfs2-utils
gfs2-utils-3.0.12.1-59.el6.x86_64
[root@dash-01 ~]# mkfs.gfs2 -t mycluster:mygfs2 /dev/sda1
This will destroy any data on /dev/sda1.
It appears to contain: data
Are you sure you want to proceed? [y/n] y
Device: /dev/sda1
Blocksize: 4096
Device Size 50.01 GB (13109032 blocks)
Filesystem Size: 50.01 GB (13109031 blocks)
Journals: 1
Resource Groups: 201
Locking Protocol: "lock_dlm"
Lock Table: "mycluster:mygfs2"
UUID: 4be91eb9-d2f0-8b53-84b8-e85c0c6f34ef
[root@dash-01 ~]# dd if=/dev/zero of=/dev/sda1 bs=1 seek=65540 count=4 conv=notrunc
4+0 records in
4+0 records out
4 bytes (4 B) copied, 0.00857313 s, 0.5 kB/s
[root@dash-01 ~]# fsck.gfs2 -y /dev/sda1
Initializing fsck
Either the super block is corrupted, or this is not a GFS2 filesystem
Gathering information to repair the gfs2 superblock. This may take some time.
Block size determined to be: 4096
Found system jindex file at: 0x18
Found system per_node directory at: 0x805b
From per_node's '..' I backtracked the master directory to: 0x17
Found system statfs file at: 0x805d
Found system inum file at: 0x815f
Found system rindex file at: 0x8161
Found system quota file at: 0x8167
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Pass4 complete
Starting pass5
Pass5 complete
Writing changes to disk
Superblock was reset. Use tunegfs2 to manually set lock table before mounting.
gfs2_fsck complete
[root@dash-01 ~]#
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
http://rhn.redhat.com/errata/RHBA-2013-1617.html
Description of problem: When fsck.gfs2 tries to rebuild a broken superblock it looks at cluster.conf on the local machine to find a locktable and lockproto to use. This makes some bad assumptions, such as the device is being fscked from a cluster node, and that if cluster.conf isn't present then lock_nolock should be used. The code which does this is also very sensitive to the layout of cluster.conf and perfectly valid formatting will confuse it. Steps to Reproduce: 1. mkfs.gfs2 -t mycluster:mygfs2 /dev/foo 2. dd if=/dev/zero of=/dev/foo bs=1 seek=65540 count=4 conv=notrunc # Sets sb.mh_type to 0 to break the superblock 3. fsck.gfs2 -y /dev/foo Actual results: As I'm not on a cluster node, the lockproto is set to lock_nolock and the locktable is left blank. The warning about the lockproto and lock table scrolls off the screen. Expected results: fsck.gfs2 should set the fields to sane defaults ("unknown", "lock_dlm") and then warn the user at the end of the fsck that they should set the locktable with tunegfs2 before mounting the fs. This is more user friendly, less likely to fail and doesn't depend on any particular cluster infrastructure. Additional info: Upstream patch available: https://git.fedorahosted.org/cgit/gfs2-utils.git/commit/?id=49361e27628cf26554c1f690440dfca021713658