985796 – fsck.gfs2: Locktable and lockproto guessing on sb rebuild is broken

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 985796 - fsck.gfs2: Locktable and lockproto guessing on sb rebuild is broken

Summary: fsck.gfs2: Locktable and lockproto guessing on sb rebuild is broken

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	cluster
Sub Component:
Version:	6.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Andrew Price
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-07-18 09:26 UTC by Andrew Price
Modified:	2013-11-21 11:25 UTC (History)
CC List:	6 users (show)
Fixed In Version:	cluster-3.0.12.1-52.el6
Doc Type:	Bug Fix
Doc Text:	Cause: When fsck.gfs2 repairs the superblock, it tries to look up the locking configuration fields from cluster.conf using a simple strategy and assumes that it is being run from a cluster node. Consequence: When the superblock is repaired, it can have its lockproto and locktable fields set wrongly. Fix: The lockproto and locktable fields are now set to sensible default values and the user is informed that they should set these fields with tunegfs2 at the end of the fsck.gfs2 run. Result: fsck.gfs2 no longer looks at cluster.conf to rebuild the superblock.
Clone Of:
Environment:
Last Closed:	2013-11-21 11:25:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:1617	0	normal	SHIPPED_LIVE	cluster and gfs2-utils bug fix update	2013-11-20 21:38:45 UTC

Description Andrew Price 2013-07-18 09:26:29 UTC

Description of problem:

When fsck.gfs2 tries to rebuild a broken superblock it looks at cluster.conf on the local machine to find a locktable and lockproto to use. This makes some bad assumptions, such as the device is being fscked from a cluster node, and that if cluster.conf isn't present then lock_nolock should be used. The code which does this is also very sensitive to the layout of cluster.conf and perfectly valid formatting will confuse it.

Steps to Reproduce:

1. mkfs.gfs2 -t mycluster:mygfs2 /dev/foo
2. dd if=/dev/zero of=/dev/foo bs=1 seek=65540 count=4 conv=notrunc # Sets sb.mh_type to 0 to break the superblock
3. fsck.gfs2 -y /dev/foo

Actual results:

As I'm not on a cluster node, the lockproto is set to lock_nolock and the locktable is left blank. The warning about the lockproto and lock table scrolls off the screen.

Expected results:

fsck.gfs2 should set the fields to sane defaults ("unknown", "lock_dlm") and then warn the user at the end of the fsck that they should set the locktable with tunegfs2 before mounting the fs. This is more user friendly, less likely to fail and doesn't depend on any particular cluster infrastructure.

Additional info:

Upstream patch available: https://git.fedorahosted.org/cgit/gfs2-utils.git/commit/?id=49361e27628cf26554c1f690440dfca021713658

Comment 1 Andrew Price 2013-07-18 15:45:27 UTC

Upstream patch pushed to cluster.git/RHEL6 without modification.

Comment 2 Andrew Price 2013-07-18 18:17:50 UTC

https://brewweb.devel.redhat.com/buildinfo?buildID=282570

Comment 4 Justin Payne 2013-10-09 23:11:44 UTC

Verified in gfs2-utils-3.0.12.1-59.el6.x86_64

[root@dash-01 ~]# rpm -q gfs2-utils
gfs2-utils-3.0.12.1-49.el6.x86_64
[root@dash-01 ~]# mkfs.gfs2 -t mycluster:mygfs2 /dev/sda1
This will destroy any data on /dev/sda1.
It appears to contain: data

Are you sure you want to proceed? [y/n] y

Device:                    /dev/sda1
Blocksize:                 4096
Device Size                50.01 GB (13109032 blocks)
Filesystem Size:           50.01 GB (13109031 blocks)
Journals:                  1
Resource Groups:           201
Locking Protocol:          "lock_dlm"
Lock Table:                "mycluster:mygfs2"
UUID:                      a50dd396-b655-e90c-ccb6-0e4521dcba9b

[root@dash-01 ~]# dd if=/dev/zero of=/dev/sda1 bs=1 seek=65540 count=4 conv=notrunc
4+0 records in
4+0 records out
4 bytes (4 B) copied, 0.00541332 s, 0.7 kB/s

[root@dash-01 ~]# fsck.gfs2 -y /dev/sda1
Initializing fsck
Either the super block is corrupted, or this is not a GFS2 filesystem
Gathering information to repair the gfs2 superblock.  This may take some time.
Block size determined to be: 4096
Found system jindex file at: 0x18
Found system per_node directory at: 0x805b
From per_node's '..' I backtracked the master directory to: 0x17
Found system statfs file at: 0x805d
Found system inum file at: 0x815f
Found system rindex file at: 0x8161
Found system quota file at: 0x8167
Lock protocol assumed to be: lock_dlm
Lock table determined to be: dash:sda1
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
Pass1 complete      
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete      
Starting pass3
Pass3 complete      
Starting pass4
Pass4 complete      
Starting pass5
Pass5 complete      
Writing changes to disk
gfs2_fsck complete

[root@dash-01 ~]# rpm -q gfs2-utils
gfs2-utils-3.0.12.1-59.el6.x86_64
[root@dash-01 ~]# mkfs.gfs2 -t mycluster:mygfs2 /dev/sda1
This will destroy any data on /dev/sda1.
It appears to contain: data

Are you sure you want to proceed? [y/n] y

Device:                    /dev/sda1
Blocksize:                 4096
Device Size                50.01 GB (13109032 blocks)
Filesystem Size:           50.01 GB (13109031 blocks)
Journals:                  1
Resource Groups:           201
Locking Protocol:          "lock_dlm"
Lock Table:                "mycluster:mygfs2"
UUID:                      4be91eb9-d2f0-8b53-84b8-e85c0c6f34ef

[root@dash-01 ~]# dd if=/dev/zero of=/dev/sda1 bs=1 seek=65540 count=4 conv=notrunc
4+0 records in
4+0 records out
4 bytes (4 B) copied, 0.00857313 s, 0.5 kB/s

[root@dash-01 ~]# fsck.gfs2 -y /dev/sda1
Initializing fsck
Either the super block is corrupted, or this is not a GFS2 filesystem
Gathering information to repair the gfs2 superblock.  This may take some time.
Block size determined to be: 4096
Found system jindex file at: 0x18
Found system per_node directory at: 0x805b
From per_node's '..' I backtracked the master directory to: 0x17
Found system statfs file at: 0x805d
Found system inum file at: 0x815f
Found system rindex file at: 0x8161
Found system quota file at: 0x8167
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
Pass1 complete      
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete      
Starting pass3
Pass3 complete      
Starting pass4
Pass4 complete      
Starting pass5
Pass5 complete      
Writing changes to disk
Superblock was reset. Use tunegfs2 to manually set lock table before mounting.
gfs2_fsck complete
[root@dash-01 ~]#

Comment 5 errata-xmlrpc 2013-11-21 11:25:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1617.html

Note You need to log in before you can comment on or make changes to this bug.