Bug 622576
Summary: | fsck.gfs2 segfaults if journals are missing | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Robert Peterson <rpeterso> | ||||||
Component: | cluster | Assignee: | Robert Peterson <rpeterso> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 6.1 | CC: | ccaulfie, cluster-maint, djansa, edamato, lhh, marcobillpeter, rpeterso, ssaha, syeghiay, teigland, theophanis_kontogiannis | ||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||
Target Release: | 6.0 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | cluster-3.0.12-24.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | 620384 | ||||||||
: | 683104 (view as bug list) | Environment: | |||||||
Last Closed: | 2011-05-19 12:53:30 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 575968, 620384, 624689, 624691 | ||||||||
Bug Blocks: | 637699 | ||||||||
Attachments: |
|
Description
Robert Peterson
2010-08-09 19:20:38 UTC
Created attachment 439456 [details]
Final patch for 6.1
Here is the final patch I'm hoping to push to the git repo.
It was tested on system roth-08.
The patch was pushed to the master branch of the gfs2-utils git repo, and the STABLE3 and RHEL6 branches of the cluster git repo for inclusion into 6.1. Changing status to POST until we start doing 6.1 builds. Upping the priority/severity on this. This is ugly. We don't need this going out the door on 6.0. I came up with two scenarios I would like clarification on. * removing multiple journals in the middle If I have a file system with five journals and I remove the middle three, fsck.gfs2 will only recreate one journal at a time. I have to run fsck.gfs2 three times to get the journals all back. This seems like a bug that should be fixed. * removing the last journal If I remove the last journal, fsck.gfs2 will not try to recreate it. Does GFS2 know how many journals should be in the file system besides the number of entries in the jindex directory? In answer to comment #11: fsck.gfs2 should probably recover multiple journals. Do you have output I can look at from this scenario where it didn't? I'd just like to double-check that it didn't act up. Regarding the removal of the last journal: It entirely depends on how/why the journals were missing. When fsck.gfs2 checks the journals, it goes by the jindex directory and how many dirents it has, taking "." and ".." into account. If a journal is removed by manually deleting it through the metafs, the jindex will be adjusted properly so fsck.gfs2 won't know it ever existed. There's no way for it to know the journal was ever there. We have a gfs2_jadd command to add journals, but if there was a gfs2_jdel command, it would do just that, and we wouldn't want fsck.gfs2 to assume the last journal ever existed. In other words, it doesn't "keep count" in the superblock or anything. It only goes by the jindex directory. This patch was primarily created to recover situations where journals disappear abnormally, not unlinked from the metafs. In the original scenario, the journal was missing because a much older version of fsck.gfs2 had detected corruption and mistakenly tossed it into lost+found. The need to recover journals this way should be rare, so I don't think we should hold up the release of 6.1 because of it. If you want, you can open a new bugzilla and we could add new code to fsck.gfs2 that analyzes journal file names. In other words, if it finds "journal0" and "journal5" we could add the smarts to fill in the missing gap. Created attachment 482716 [details]
fsck.gfs2 log while rebuilding journals
Attached is the complete output of fsck.gfs2 while I run it until all of the journals are rebuilt.
The interesting parts are probably these lines:
Initializing fsck
File system journal "journal1" is missing: pass1 will try to recreate it.
Journal recovery complete.
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Error: resource group 17 (0x11): free space (0) does not match bitmap (3)
(3 blocks were reclaimed)
The rgrp was fixed.
RGs: Consistent: 799 Inconsistent: 1 Fixed: 1 Total: 800
Starting pass1
Invalid or missing journal1 system inode (should be 4, is 0).
Rebuilding system file "journal1"
Pass1 complete
...
Initializing fsck
File system journal "journal2" is missing: pass1 will try to recreate it.
Journal recovery complete.
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
Invalid or missing journal2 system inode (should be 4, is 0).
Rebuilding system file "journal2"
Pass1 complete
...
alizing fsck
File system journal "journal3" is missing: pass1 will try to recreate it.
Journal recovery complete.
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
Invalid or missing journal3 system inode (should be 4, is 0).
Rebuilding system file "journal3"
Pass1 complete
I split the multiple journal rebuild issue to bug 683104 and marking this bug as VERIFIED. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0537.html |