Bug 985796

Summary:	fsck.gfs2: Locktable and lockproto guessing on sb rebuild is broken
Product:	Red Hat Enterprise Linux 6	Reporter:	Andrew Price <anprice>
Component:	cluster	Assignee:	Andrew Price <anprice>
Status:	CLOSED ERRATA	QA Contact:	Cluster QE <mspqa-list>
Severity:	low	Docs Contact:
Priority:	unspecified
Version:	6.4	CC:	ccaulfie, cluster-maint, jpayne, rpeterso, swhiteho, teigland
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	cluster-3.0.12.1-52.el6	Doc Type:	Bug Fix
Doc Text:	Cause: When fsck.gfs2 repairs the superblock, it tries to look up the locking configuration fields from cluster.conf using a simple strategy and assumes that it is being run from a cluster node. Consequence: When the superblock is repaired, it can have its lockproto and locktable fields set wrongly. Fix: The lockproto and locktable fields are now set to sensible default values and the user is informed that they should set these fields with tunegfs2 at the end of the fsck.gfs2 run. Result: fsck.gfs2 no longer looks at cluster.conf to rebuild the superblock.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-11-21 11:25:31 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Andrew Price 2013-07-18 09:26:29 UTC

Description of problem:

When fsck.gfs2 tries to rebuild a broken superblock it looks at cluster.conf on the local machine to find a locktable and lockproto to use. This makes some bad assumptions, such as the device is being fscked from a cluster node, and that if cluster.conf isn't present then lock_nolock should be used. The code which does this is also very sensitive to the layout of cluster.conf and perfectly valid formatting will confuse it.

Steps to Reproduce:

1. mkfs.gfs2 -t mycluster:mygfs2 /dev/foo
2. dd if=/dev/zero of=/dev/foo bs=1 seek=65540 count=4 conv=notrunc # Sets sb.mh_type to 0 to break the superblock
3. fsck.gfs2 -y /dev/foo

Actual results:

As I'm not on a cluster node, the lockproto is set to lock_nolock and the locktable is left blank. The warning about the lockproto and lock table scrolls off the screen.

Expected results:

fsck.gfs2 should set the fields to sane defaults ("unknown", "lock_dlm") and then warn the user at the end of the fsck that they should set the locktable with tunegfs2 before mounting the fs. This is more user friendly, less likely to fail and doesn't depend on any particular cluster infrastructure.

Additional info:

Upstream patch available: https://git.fedorahosted.org/cgit/gfs2-utils.git/commit/?id=49361e27628cf26554c1f690440dfca021713658

Comment 1 Andrew Price 2013-07-18 15:45:27 UTC

Upstream patch pushed to cluster.git/RHEL6 without modification.

Comment 2 Andrew Price 2013-07-18 18:17:50 UTC

https://brewweb.devel.redhat.com/buildinfo?buildID=282570

Comment 4 Justin Payne 2013-10-09 23:11:44 UTC

Verified in gfs2-utils-3.0.12.1-59.el6.x86_64

[root@dash-01 ~]# rpm -q gfs2-utils
gfs2-utils-3.0.12.1-49.el6.x86_64
[root@dash-01 ~]# mkfs.gfs2 -t mycluster:mygfs2 /dev/sda1
This will destroy any data on /dev/sda1.
It appears to contain: data

Are you sure you want to proceed? [y/n] y

Device:                    /dev/sda1
Blocksize:                 4096
Device Size                50.01 GB (13109032 blocks)
Filesystem Size:           50.01 GB (13109031 blocks)
Journals:                  1
Resource Groups:           201
Locking Protocol:          "lock_dlm"
Lock Table:                "mycluster:mygfs2"
UUID:                      a50dd396-b655-e90c-ccb6-0e4521dcba9b

[root@dash-01 ~]# dd if=/dev/zero of=/dev/sda1 bs=1 seek=65540 count=4 conv=notrunc
4+0 records in
4+0 records out
4 bytes (4 B) copied, 0.00541332 s, 0.7 kB/s

[root@dash-01 ~]# fsck.gfs2 -y /dev/sda1
Initializing fsck
Either the super block is corrupted, or this is not a GFS2 filesystem
Gathering information to repair the gfs2 superblock.  This may take some time.
Block size determined to be: 4096
Found system jindex file at: 0x18
Found system per_node directory at: 0x805b
From per_node's '..' I backtracked the master directory to: 0x17
Found system statfs file at: 0x805d
Found system inum file at: 0x815f
Found system rindex file at: 0x8161
Found system quota file at: 0x8167
Lock protocol assumed to be: lock_dlm
Lock table determined to be: dash:sda1
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
Pass1 complete      
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete      
Starting pass3
Pass3 complete      
Starting pass4
Pass4 complete      
Starting pass5
Pass5 complete      
Writing changes to disk
gfs2_fsck complete

[root@dash-01 ~]# rpm -q gfs2-utils
gfs2-utils-3.0.12.1-59.el6.x86_64
[root@dash-01 ~]# mkfs.gfs2 -t mycluster:mygfs2 /dev/sda1
This will destroy any data on /dev/sda1.
It appears to contain: data

Are you sure you want to proceed? [y/n] y

Device:                    /dev/sda1
Blocksize:                 4096
Device Size                50.01 GB (13109032 blocks)
Filesystem Size:           50.01 GB (13109031 blocks)
Journals:                  1
Resource Groups:           201
Locking Protocol:          "lock_dlm"
Lock Table:                "mycluster:mygfs2"
UUID:                      4be91eb9-d2f0-8b53-84b8-e85c0c6f34ef

[root@dash-01 ~]# dd if=/dev/zero of=/dev/sda1 bs=1 seek=65540 count=4 conv=notrunc
4+0 records in
4+0 records out
4 bytes (4 B) copied, 0.00857313 s, 0.5 kB/s

[root@dash-01 ~]# fsck.gfs2 -y /dev/sda1
Initializing fsck
Either the super block is corrupted, or this is not a GFS2 filesystem
Gathering information to repair the gfs2 superblock.  This may take some time.
Block size determined to be: 4096
Found system jindex file at: 0x18
Found system per_node directory at: 0x805b
From per_node's '..' I backtracked the master directory to: 0x17
Found system statfs file at: 0x805d
Found system inum file at: 0x815f
Found system rindex file at: 0x8161
Found system quota file at: 0x8167
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
Pass1 complete      
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete      
Starting pass3
Pass3 complete      
Starting pass4
Pass4 complete      
Starting pass5
Pass5 complete      
Writing changes to disk
Superblock was reset. Use tunegfs2 to manually set lock table before mounting.
gfs2_fsck complete
[root@dash-01 ~]#

Comment 5 errata-xmlrpc 2013-11-21 11:25:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1617.html