Bug 89281

Summary: ext3 filesystem won't fsck after power loss
Product: [Retired] Red Hat Linux Reporter: Drew Vogel <andrew.vogel>
Component: e2fsprogsAssignee: Florian La Roche <laroche>
Status: CLOSED NOTABUG QA Contact: Jay Turner <jturner>
Severity: high Docs Contact:
Priority: high    
Version: 8.0CC: barryn, menscher, srevivo
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-06-03 12:51:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Drew Vogel 2003-04-22 03:44:21 UTC
Description of problem:
A power outage/brownout seems to have corrupted several of my ext3 partitions. 
When running fsck on them, it returns "short read; bad superblock".

During bootup, RH stops with "Can't find matching filesystem: LABEL=/home" and 
offers the console.

All attempts to fix have failed.

Version-Release number of selected component (if applicable):
RedHat 8.0, kept up2date.


How reproducible:
Hard to reproduce.

Steps to Reproduce:
1.
2.
3.
    
Actual results:


Expected results:


Additional info:

Comment 1 Drew Vogel 2003-04-22 03:51:23 UTC
The alternate superblocks can be listed with mke2fs -n <device>, but using 
e2fsck -b <superblock> <device> returns an error ("Invalid arguement while 
reading block -2147450360, invalid arguement reading journal superblock, 
invalid arguement while checking ext3 journal for /var").


Comment 2 Drew Vogel 2003-04-22 13:04:46 UTC
This offers additional information in the form of my /etc/fstab, a listing of 
what we've determined to be the partitioning scheme on the drive, and a log 
from an IRC chat showing what we've tried.

/etc/fstab:
===========
LABEL=/                 /                       ext3    defaults        1 1
LABEL=/boot	/boot	ext3	 exec,dev,suid,rw 1 2
none                    /dev/pts                devpts  gid=5,mode=620  0 0
LABEL=/home	/home	ext3	 exec,dev,suid,rw 1 2
none                    /proc                   proc    defaults        0 0
none	/dev/shm	tmpfs	 exec,dev,suid,rw 0 0
LABEL=/tmp	/tmp	ext3	 exec,dev,suid,rw 1 2
LABEL=/var	/var	ext3	 exec,dev,suid,rw 1 2
/dev/hda3               swap                    swap    defaults        0 0
/dev/cdrom              /mnt/cdrom              iso9660 noauto,owner,kudzu,rw 0 
0
/dev/hdd4               /mnt/zip100.0           vfat    noauto,owner,kudzu,rw 0 
0
/dev/fd0                /mnt/floppy             auto    noauto,owner,kudzu 0 0

PARTITIONS:
===========
HDA1 = /boot
HDA2 = /
HDA3 = SWAP
HDA4 = ??? (short read; bad superblock)
HDA5 = /tmp

HDB1 = /var
HDB2 = /home

IRC LOG
=======
[21:48:26] <drew> I'm wondering if anyone can help with a bootup problem I'm 
having with my RH8.0 server...  
[21:49:24] <drew> Anyone? 
[21:49:54] <vexas-z> state the problem and pray...
[21:50:03] <drew> hehehe. Thanks. 
[21:51:35] <drew> Last night at 8:30pm, I was on my machine from my parent's 
house, and it was working fine. Got home at 9:30, and it was crashed. Rebooted, 
and when I reboot, it locates and mounts my FIRST physical drive (master, IDE 
channel 1), but then when it tries to hit the SECOND drive (slave, IDE channel 
1), it gets an error.  
[21:52:12] <vexas-z> whats the error say?
[21:52:12] <drew> "Couldn't find matching filesystem: LABEL=/home" 
[21:52:38] <vexas-z> is the bios at boot seeing both drives?
[21:53:15] <drew> Yes. Bios sees both physical drives. It's NOT seeing my ZIP 
drive (second IDE channel, slave). The CD burner, IDE channel 2, master, shows 
in bios. 
[21:54:15] <vexas-z> did you check the /etc/fstab file?
[21:54:32] <drew> Yes, it's there, but honestly, I don't know what I'm looking 
at there. I can copy it for you. 
[21:55:03] <vexas-z> usually....I believe this is where AUTO mounting occurs of 
drives.
[21:55:45] <drew>
LABEL=/                 /                       ext3    defaults        1 1
LABEL=/boot	/boot	ext3	 exec,dev,suid,rw 1 2
none                    /dev/pts                devpts  gid=5,mode=620  0 0
LABEL=/home	/home	ext3	 exec,dev,suid,rw 1 2
none                    /proc                   proc    defaults        0 0
none	/dev/shm	tmpfs	 exec,dev,suid,rw 0 0
LABEL=/tmp	/tmp	ext3	 exec,dev,suid,rw 1 2
LABEL=/var	/var	ext3	 exec,dev,suid,rw 1 2
/dev/hda3               swap                    swap    defaults        0 0
/dev/cdrom              /mnt/cdrom              iso9660 noauto,owner,kudzu,rw 0 
0
/dev/hdd4               /mnt/zip100.0           vfat    noauto,owner,kudzu,rw 0 
0
/dev/fd0                /mnt/floppy             auto    noauto,owner,kudzu 0 0

[21:58:33] <vexas-z> did you you try to manually mount your second drive?
[21:59:33] <drew> How? 
[22:00:55] <vexas-z> mount /dev/hdb /mnt/whereyouwant_it _mounted
[22:01:20] <drew> Must specify FS type... The drives are Ext3. 
[22:02:02] <drew> Is that "mount -t ext3 /dev/hdb /home"? 
[22:02:49] <vexas-z> dont mount it at /home.
[22:03:20] <vexas-z> was home lcated on your second ide drive?
[22:03:42] <drew> I _THINK_ so... I think it had /var and /home on it. I don't 
know how to check that, though... 
[22:04:25] <vexas-z> you can just type "mount" without anything else -- it will 
display whats going on.
[22:06:03] <drew> Three lines... 1. /dev/hda2 on / type ext3 (rw) 
[22:06:20] <drew> 2. none on /proc type proc (rw) 
[22:06:46] <drew> 3. usbdevfs on /proc/bus/usb type usbdevfs (rw) 
[22:06:48] <drew> That's it. 
[22:08:51] <vexas-z> In think your are over my head.
[22:10:40] <drew> Thanks for working on it with me, Vexas! I appreciate it. 
[22:10:51] <vexas-z> i try..sorry I cant help more.
[22:11:14] <drew> Nononono... You know where to look, which is further than 
_I've_ gotten on my own, so I appreciate it! 
[22:12:17] <drew> The closest I've gotten is that it might be a zero-sector 
partition, which it's not. 
[22:12:44] <drew> This is NOT a new install; it's been running reliably for 
months and months before last night. We had a thunderstorm, and I expect that 
the house browned-out during a write or something. 
[22:12:52] <drew> It's also said something about a bad superblock... 
[22:13:28] <drew> Anyone else have any ideas? 
[22:13:47] <Hydrogenum> drew: wasn't really paying attention, but sounds like 
you can't fsck?
[22:14:50] <drew> True... Hi Hydro... It's getting part way through system 
reboot -- finds my first physical drive just fine. Gets to the second and 
says "Couldn't find matching filesystem: LABEL=/home". 
[22:14:56] <drew> Gives me a console. 
[22:15:32] <Hydrogenum> what happens if you try to fsck it?
[22:16:23] <drew> What should the command line be? 
[22:16:28] <drew> fsck /dev/hdb ??? 
[22:16:36] <|Jef|> drew: sounds like a job for........rescue mode
[22:16:45] <Hydrogenum> probably fsck /dev/hdb1
[22:16:53] <drew> Lemme try it, Hydro. 
[22:16:54] <Hydrogenum> depends on your partition layout
[22:17:01] <Hydrogenum> I dont' know which partition /home is for you
[22:17:08] <|Jef|> Hydrogenum: silly labels
[22:17:22] <|Jef|> Hydrogenum: will fdisk show you the labels?
[22:17:28] <Hydrogenum> doubtful
[22:17:36] <drew> I think it does... 
[22:17:41] <Hydrogenum> drew: what's mounted now?
[22:17:54] <|Jef|> Hydrogenum: i thought the labels as used from fstab were the 
volume labels on the partitions
[22:17:57] <drew> Err... Fdisk does NOT show labels. 
[22:18:14] <drew> Typing mount shows me three lines: 
[22:18:18] <drew> 1. /dev/hda2 on / type ext3 (rw)  
[22:18:25] <drew> 2. none on /proc type proc (rw)  
[22:18:32] <drew> 3. usbdevfs on /proc/bus/usb type usbdevfs (rw)  
[22:18:35] <drew> That's it. 
[22:18:54] <Hydrogenum> and you're sure /home is /dev/hda4 ?
[22:18:56] <drew> Using fsconf, I can manually mount all EXCEPT for /home 
and /var 
[22:19:13] <drew> Hydro... I THOUGHT something else showed me that. fsconf? How 
would I find out? 
[22:19:31] <drew> RH8.0... Something called fsconf 
[22:19:44] <strawman> try fdisk -l /dev/hda
[22:21:36] <drew> Strawman... It's got /dev/hda1 through /dev/hda5. hda3 is 
SWAP. hda5 is EXTENDED, and has the same start/end sectors as hda5, though hda5 
has slightly fewer BLOCKS than hda4. 
[22:22:42] <drew> Doing 'fdisk -l /dev/hdb' shows /dev/hdb1 & /dev/hdb2, both 
system type of LINUX. 
[22:22:57] <Hydrogenum> drew: FYI, there can only be four primary partitions.  
They get around that by allowing them to be "extended" partitions, which can be 
subdivided
[22:23:09] <Hydrogenum> so hda5 is really a subset of hda4
[22:23:37] <drew> Gotcha. I don't remember how, but I think I saw somewhere in 
this process that /home lived in /dev/hda4. 
[22:23:53] <|Jef|> Hydrogenum: so the trick is...how do you go about figureing 
which parition is /home since fstab is using labels instead of actually device 
listings
[22:24:09] <drew> That's the trick. 
[22:24:15] <Hydrogenum> |Jef|: no idea... labels scare me
[22:24:30] <drew> RH8 musta set them up automagically... I wouldn't do that. 
[22:24:33] <strawman> fsck each partition manually, reboot :)
[22:24:50] <drew> And I've heard from others that "labels are scary"... Dunno 
WHY, but I believe it! 
[22:25:09] <Hydrogenum> drew: now you know why  ;)
[22:25:44] <drew> Straw... If I do "fsck /dev/hda4",  I get "Couldn't find 
matching filesystem: LABEL=/home". 
[22:25:50] <drew> True, Hydro. 
[22:26:17] <Hydrogenum> seriously, labels are probably a good thing.  It's just 
that nobody is used to them.
[22:26:23] <drew> OIC. 
[22:26:49] <drew> Any ideas how to figure out which partion is /home? 
[22:26:54] <drew> partion ==partition 
[22:27:19] <Hydrogenum> what is the partition type for hda4, etc/ ?
[22:27:31] <Hydrogenum> fdisk should at least know that...
[22:27:36] <drew> hda4 == EXTENDED. 
[22:27:51] <drew> hda3 == SWAP 
[22:27:55] <Hydrogenum> and hdb1 and hdb2 were LINUX, right?
[22:28:02] <drew> hda1,2,5==LINUX 
[22:28:08] <strawman> i'll bet /home lives on hda5
[22:28:22] <Hydrogenum> ok... try to fsck /dev/hda5
[22:28:26] <drew> Remember, though, I've got two partitions on /dev/hdb, too... 
[22:28:50] <drew> "fsck /dev/hda5" gives me "Couldn't find matching filesystem: 
LABEL=/home".  
[22:28:51] <drew> Hrm. 
[22:29:02] <|Jef|> Hydrogenum: ah it looks like tune2fs is where you set the 
label
[22:29:09] <drew> So does fsck /dev/hda3 
[22:29:18] <drew> tune2fs? 
[22:29:27] <Hydrogenum> do *not* fsck /dev/hda3
[22:29:35] <haji> how easy is it to network with windows on RH?
[22:30:02] <drew> OK, Hydro. hda2 complained about active fs, so I said "NO". 
[22:30:23] <Hydrogenum> drew: try the hdb? partitions?
[22:30:38] <|Jef|> /sbin/tune2fs -l /dev/hda3 for example should list the info 
including the volume label
[22:30:58] <Hydrogenum> |Jef|: even if not mounted?
[22:31:04] <|Jef|> Hydrogenum: its just a list
[22:31:05] <drew> Both of the hdb? partitions gave me "Couldn't find matching 
filesystem: LABEL=/home".  
[22:31:19] <Hydrogenum> why are these all saying /home ???
[22:31:21] <|Jef|> Hydrogenum: you can actually use tune2fs to mounted systems
[22:31:48] <|Jef|> Hydrogenum: maybe because hes not in rescue mode yet
[22:31:59] <|Jef|> Hydrogenum: and the running system is expecting a /home 
directory to be mounted
[22:32:03] <drew> Would it be beneficial to boot from RH8 CD into RESCUE mode? 
[22:32:11] <Hydrogenum> drew: yes, do that -- might help
[22:32:27] <drew> Okay. Working. 
[22:33:27] <drew> Rebooting from RH8 Cd... 
[22:33:51] <|Jef|> drew: once a boot process craps out with a filesystem error 
that a journal recovery cant get around...its probably wisest to go into rescue 
mode so you can work out of the ramdisk from the cdrom and leave the harddrive 
partitions unmounted...so you can fix them
[22:34:27] <drew> Thanks, Jef! I'm booting into rescue mode now -- typed "linux 
rescue". 
[22:35:43] <drew> Do I want to mount file systems READ-ONLY or 
into /mnt/sysimage? 
[22:36:07] <drew> Or I can SKIP and go right to a command prompt... Is that 
what I want, Jef? 
[22:36:12] <|Jef|> drew: you want to leave the harddrtive partitions unmounted 
so you can fsck them
[22:36:25] <drew> So, SKIP or READ-ONLY? 
[22:36:38] <|Jef|> drew: skip
[22:36:43] <drew> K. SKIPPING. 
[22:36:56] <drew> At command shell. 
[22:37:45] <|Jef|> drew: tune2fs -l /dev/hda3   for example
[22:37:52] <|Jef|> drew: to see what the label for that partition is
[22:37:55] <drew> Ok. 
[22:38:37] <|Jef|> drew: its spew a lot of info the volume name is at the top
[22:39:00] <|Jef|> drew: so now the trick is...to find the /home partition and 
fsck it
[22:40:49] <Hydrogenum> keep in mind your /dev/hda3 is swap, so you shouldn't 
expect that one to have a volume label
[22:40:51] <drew> Here's the report from /dev/hda?:
[22:40:58] <drew> hda1 == /boot 
[22:41:03] <drew> hda2 == / 
[22:41:16] <drew> hda3 == short read; bad superblock 
[22:41:21] <drew> hda4 = short read; bad superblock 
[22:41:25] <drew> hda5 == /tmp 
[22:41:28] <drew> Working on hdb? now. 
[22:41:39] <Hydrogenum> ooh!  progress!
[22:42:00] <|Jef|> Hydrogenum: so is hda5 inside hda3 or hda4
[22:42:06] <Hydrogenum> it's inside hda4
[22:42:06] <drew> Both hdb1 & hdb2 == short read; bad superblock 
[22:42:11] <Hydrogenum> hda3 is his swap partition
[22:42:18] <|Jef|> Hydrogenum: ah well...
[22:42:24] <Hydrogenum> and hdb? sounds screwed...  :-(
[22:42:31] <drew> My guess is that /var & /home are on hdb. 
[22:42:32] <|Jef|> drew: fsck the hdb* paritions
[22:42:46] <drew> Jef... "fsck /dev/hdb1"??? 
[22:43:06] <Hydrogenum> yes
[22:43:33] <drew> Did "fsck /dev/hdb1": Got a LONG error... Want it verbatum? 
[22:43:55] <Hydrogenum> just the first line
[22:44:15] <drew> fsck.ext2: Filesystem revision too high while trying to 
open /dev/hdb1. 
[22:44:36] <Hydrogenum> you're running RH9, right?
[22:44:41] <Hydrogenum> but that was a RH8 bootdisk?
[22:44:52] <drew> RH8 == installed, rescue == RH8 
[22:45:29] <drew> Shall I try fsck /dev/hdb2? 
[22:45:31] <|Jef|> Hydrogenum: what the hell does that revision thing mean
[22:45:50] <drew> Same error for "fsck /dev/hdb2". 
[22:45:59] <Hydrogenum> |Jef|: I was guessing it has a later version of ext2 
than the rescue disk can handle
[22:46:30] <drew> At the bottom of the error, it says "The superblock could not 
be read or does not describe a correct ext2 filesystem..." 
[22:46:37] <|Jef|> drew: and hdb paritions are not mounted?
[22:46:47] <drew> Suggests that the superblock is corrupt... 
[22:46:53] <Hydrogenum> drew: I think we need to have it read an alternate 
superblock
[22:47:01] <drew> Jef: Nope. Not mounted. 
[22:47:31] <drew> It suggests "e2fsck -b 8193 <device>". Shall I try that? 
[22:47:41] <Hydrogenum> yes, that's the command I was looking for, actually
[22:48:50] <|Jef|> Hydrogenum: looks like the revision error is superblock 
related too
[22:48:52] <drew> Tried "e2fsck -b 8193 /dev/hdb1" and "... /dev/hdb2". 
Got "bad magic number in super-block while trying to open /dev/hdb?". 
[22:49:13] <drew> I think I've got a RH7 CD around here. Run RESCUE from that? 
[22:49:14] <Hydrogenum> drew: try again, but with 16384 instead of 8193
[22:49:33] <Hydrogenum> and if that doesn't work, try 32768
[22:50:07] <drew> 16384 returns "Filesystem has unexpected block size while 
trying to open..." 
[22:50:25] <Hydrogenum> basically, we're looking for the alternate superblock 
here
[22:50:37] <drew> 32768 gives a different looking error... Here: 
[22:50:40] <Hydrogenum> just so you know what's going on
[22:51:28] <drew> on HDB1 it says "/var: Invalid arguement while reading block -
2147450360. So we know that HDB1 == /var 
[22:51:49] <drew> On hdb2 it's the same error, except it's "/home" instead 
of "/var". 
[22:51:51] <|Jef|> Hydrogenum: grrr....googles giving me some similar power 
outage spawned error reports...but no solutions yet
[22:52:05] <drew> And the block is -2147450361 
[22:52:23] <drew> Jef: So we're thinking it was a power outage/blip? 
[22:52:33] <|Jef|> drew: didnt say you had a power blip?
[22:52:42] <drew> Yes, Jef. 
[22:53:05] <Hydrogenum> hrmm... those numbers are close to 2^31
[22:53:06] <drew> I've got a good surge protector, but it mighta been a 
brownout because no clocks were blinking, etc. 
[22:53:08] <|Jef|> grrrr...all the most relavent google responses are not in 
english....
[22:53:36] <strawman> yes :) farking German
[22:53:37] <drew> Do we think a RH_7_ rescue CD would be useful? 
[22:53:41] <drew> babelfish? 
[22:54:06] <|Jef|> Hydrogenum: do we have to roll out the journal?
[22:54:09] <Hydrogenum> drew: doubt RH7 CD would be any better
[22:54:14] <|Jef|> drew: was this a fresh install?
[22:54:21] <Hydrogenum> |Jef|: define "roll out the journal" ?
[22:54:30] <drew> This was a fresh 8.0 install way back when, kept up2date. 
[22:54:34] <|Jef|> Hydrogenum: force the removal of the journal...
[22:54:44] <Hydrogenum> |Jef|: shouldn't matter at all
[22:54:46] <drew> Hydro... <<gulp> 
[22:55:30] <drew> Dunno what "remove the journal" means, but it sounds like I 
should get some rubber gloves. 
[22:55:53] <Hydrogenum> drew: don't worry about it... the journal is irrelevant 
here
[22:56:12] <|Jef|> drew: at this point im just groping
[22:56:22] <|Jef|> drew: im even on google...
[22:56:25] <drew> I am, of course, worried about data loss... 
[22:56:32] <Hydrogenum> drew: if you're not in a rush, you might want to wait 
until later, when others can help
[22:56:52] <Hydrogenum> as |Jef| said, we're at the limit of our knowledge
[22:56:58] <drew> I'm either in a rush, or not... I'm leaving Wednesday morning 
and will be AFK until Sun evening... 
[22:57:25] <Hydrogenum> well, I'm just saying that maybe in an hour some expert 
will sign on
[22:57:31] <drew> Tomorrow, I gotta work (AF server), then packing that 
evening... Can work on it after ~9:00pm EST tomorrow... 


Comment 3 Barry K. Nathan 2003-05-16 09:30:34 UTC
Drew, what kind of hard drive experienced this problem?

Some (if not most?) IDE drives can render sectors completely unreadable if the
power goes out in the middle of a disk write. To the best of my knowledge, few
(if any) SCSI drives are affected by this problem (I guess they can ensure that
the entire sectors gets written out before the drive completely loses power).

If the drive did a partial write, then when it tries to read it back it will
seem like a bad sector (with the same type of error reported back to the Linux
kernel). That could be the cause of the "short read" errors.

Comment 4 Florian La Roche 2003-06-03 12:51:58 UTC
You need to guess alternative superblocks to get access to this filesystem again.
I think there are tools which try to find partitions on a disk with a lost
partition table, but I have not used them until now and have not included them
into Red Hat Linux until now.

greetings and hope you have recovered your data,

Florian La Roche