Bug 985796
| Summary: | fsck.gfs2: Locktable and lockproto guessing on sb rebuild is broken | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Andrew Price <anprice> |
| Component: | cluster | Assignee: | Andrew Price <anprice> |
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.4 | CC: | ccaulfie, cluster-maint, jpayne, rpeterso, swhiteho, teigland |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | cluster-3.0.12.1-52.el6 | Doc Type: | Bug Fix |
| Doc Text: |
Cause:
When fsck.gfs2 repairs the superblock, it tries to look up the locking configuration fields from cluster.conf using a simple strategy and assumes that it is being run from a cluster node.
Consequence:
When the superblock is repaired, it can have its lockproto and locktable fields set wrongly.
Fix:
The lockproto and locktable fields are now set to sensible default values and the user is informed that they should set these fields with tunegfs2 at the end of the fsck.gfs2 run.
Result:
fsck.gfs2 no longer looks at cluster.conf to rebuild the superblock.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-11-21 11:25:31 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Upstream patch pushed to cluster.git/RHEL6 without modification. Verified in gfs2-utils-3.0.12.1-59.el6.x86_64 [root@dash-01 ~]# rpm -q gfs2-utils gfs2-utils-3.0.12.1-49.el6.x86_64 [root@dash-01 ~]# mkfs.gfs2 -t mycluster:mygfs2 /dev/sda1 This will destroy any data on /dev/sda1. It appears to contain: data Are you sure you want to proceed? [y/n] y Device: /dev/sda1 Blocksize: 4096 Device Size 50.01 GB (13109032 blocks) Filesystem Size: 50.01 GB (13109031 blocks) Journals: 1 Resource Groups: 201 Locking Protocol: "lock_dlm" Lock Table: "mycluster:mygfs2" UUID: a50dd396-b655-e90c-ccb6-0e4521dcba9b [root@dash-01 ~]# dd if=/dev/zero of=/dev/sda1 bs=1 seek=65540 count=4 conv=notrunc 4+0 records in 4+0 records out 4 bytes (4 B) copied, 0.00541332 s, 0.7 kB/s [root@dash-01 ~]# fsck.gfs2 -y /dev/sda1 Initializing fsck Either the super block is corrupted, or this is not a GFS2 filesystem Gathering information to repair the gfs2 superblock. This may take some time. Block size determined to be: 4096 Found system jindex file at: 0x18 Found system per_node directory at: 0x805b From per_node's '..' I backtracked the master directory to: 0x17 Found system statfs file at: 0x805d Found system inum file at: 0x815f Found system rindex file at: 0x8161 Found system quota file at: 0x8167 Lock protocol assumed to be: lock_dlm Lock table determined to be: dash:sda1 Validating Resource Group index. Level 1 rgrp check: Checking if all rgrp and rindex values are good. (level 1 passed) Starting pass1 Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Pass5 complete Writing changes to disk gfs2_fsck complete [root@dash-01 ~]# rpm -q gfs2-utils gfs2-utils-3.0.12.1-59.el6.x86_64 [root@dash-01 ~]# mkfs.gfs2 -t mycluster:mygfs2 /dev/sda1 This will destroy any data on /dev/sda1. It appears to contain: data Are you sure you want to proceed? [y/n] y Device: /dev/sda1 Blocksize: 4096 Device Size 50.01 GB (13109032 blocks) Filesystem Size: 50.01 GB (13109031 blocks) Journals: 1 Resource Groups: 201 Locking Protocol: "lock_dlm" Lock Table: "mycluster:mygfs2" UUID: 4be91eb9-d2f0-8b53-84b8-e85c0c6f34ef [root@dash-01 ~]# dd if=/dev/zero of=/dev/sda1 bs=1 seek=65540 count=4 conv=notrunc 4+0 records in 4+0 records out 4 bytes (4 B) copied, 0.00857313 s, 0.5 kB/s [root@dash-01 ~]# fsck.gfs2 -y /dev/sda1 Initializing fsck Either the super block is corrupted, or this is not a GFS2 filesystem Gathering information to repair the gfs2 superblock. This may take some time. Block size determined to be: 4096 Found system jindex file at: 0x18 Found system per_node directory at: 0x805b From per_node's '..' I backtracked the master directory to: 0x17 Found system statfs file at: 0x805d Found system inum file at: 0x815f Found system rindex file at: 0x8161 Found system quota file at: 0x8167 Validating Resource Group index. Level 1 rgrp check: Checking if all rgrp and rindex values are good. (level 1 passed) Starting pass1 Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Pass5 complete Writing changes to disk Superblock was reset. Use tunegfs2 to manually set lock table before mounting. gfs2_fsck complete [root@dash-01 ~]# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1617.html |
Description of problem: When fsck.gfs2 tries to rebuild a broken superblock it looks at cluster.conf on the local machine to find a locktable and lockproto to use. This makes some bad assumptions, such as the device is being fscked from a cluster node, and that if cluster.conf isn't present then lock_nolock should be used. The code which does this is also very sensitive to the layout of cluster.conf and perfectly valid formatting will confuse it. Steps to Reproduce: 1. mkfs.gfs2 -t mycluster:mygfs2 /dev/foo 2. dd if=/dev/zero of=/dev/foo bs=1 seek=65540 count=4 conv=notrunc # Sets sb.mh_type to 0 to break the superblock 3. fsck.gfs2 -y /dev/foo Actual results: As I'm not on a cluster node, the lockproto is set to lock_nolock and the locktable is left blank. The warning about the lockproto and lock table scrolls off the screen. Expected results: fsck.gfs2 should set the fields to sane defaults ("unknown", "lock_dlm") and then warn the user at the end of the fsck that they should set the locktable with tunegfs2 before mounting the fs. This is more user friendly, less likely to fail and doesn't depend on any particular cluster infrastructure. Additional info: Upstream patch available: https://git.fedorahosted.org/cgit/gfs2-utils.git/commit/?id=49361e27628cf26554c1f690440dfca021713658