Bug 442106

Summary:

on initial reboot, filesystem had errors

Product:

[Fedora] Fedora

Reporter:

Bill Nottingham <notting>

Component:

anaconda

Assignee:

Anaconda Maintenance Team <anaconda-maint-list>

Status:

CLOSED RAWHIDE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

low

Docs Contact:

Priority:

low

Version:

rawhide

CC:

cpanceac, esandeen, rvokal

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2008-04-21 21:38:52 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

235706

Attachments:

Description	Flags
dumpe2fs output	none
dmesg	none

Description Bill Nottingham 2008-04-11 19:03:44 UTC

Description of problem:

So, it fscked and then rebooted.

Have not tried to reproduce yet, will do shortly.

Version-Release number of selected component (if applicable):

F9 PR (x86_64)

Comment 1 Bill Nottingham 2008-04-11 19:31:56 UTC

Created attachment 302162 [details]
dumpe2fs output

Here's the dumpe2fs output.

Comment 2 Bill Nottingham 2008-04-11 19:48:00 UTC

[root@localhost ~]# e2fsck -n -v /dev/mapper/moofoo 
e2fsck 1.40.8 (13-Mar-2008)
/ contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(577587--577611)
Fix? no


/: ********** WARNING: Filesystem still has errors **********


   87480 inodes used (0.87%)
     517 non-contiguous inodes (0.6%)
         # of inodes with ind/dind/tind blocks: 6144/30/0
 1200262 blocks used (2.99%)
       0 bad blocks
       1 large file

   65955 regular files
    7776 directories
       8 character device files
       0 block device files
       0 fifos
    3499 links
   13732 symbolic links (13697 fast symbolic links)
       0 sockets
--------
   90970 files

Comment 3 Bill Nottingham 2008-04-11 19:52:38 UTC

I have in the log:

EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks not in datazone -
block = 202366355, count = 1
EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks not in datazone -
block = 1673128691, count = 1
EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks not in datazone -
block = 1148093888, count = 1
EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks not in datazone -
block = 1338544658, count = 1
EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks not in datazone -
block = 3927257361, count = 1
EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks not in datazone -
block = 1674848470, count = 1
EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks not in datazone -
block = 2292416204, count = 1
EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks not in datazone -
block = 2231364692, count = 1
EXT3-fs error (device dm-2): ext3_free_blocks_sb: bit already cleared for block
28802063
....

Comment 4 Bill Nottingham 2008-04-11 19:57:56 UTC

Created attachment 302164 [details]
dmesg

Comment 5 Bill Nottingham 2008-04-11 20:20:52 UTC

This appears to be happening (at least the errors appear) when running mkinitrd
in the post-install step. No, that doesn't make sense to me either.

Comment 6 Eric Sandeen 2008-04-11 21:28:48 UTC

hrm, can't do an x86_64 install right now, so, I tried:

get the livecd iso notting used.
jump through the hoops to get ext3fs.img extracted from it.
under the 2.6.25-0.218.rc8.git7.fc9.x86_64 kernel...
truncate the image to 164587324416 (the size of notting's lvm volume)
mount the image
grow the image
mount up /sys /boot /proc /dev and whatnot, and chroot in
try mkinitrd
... no problems.

Also, the ext3fs.img seems clean before & after the resize.

when I can try a livecd i'll either try with errors=panic (assuming ext3 can be
remounted that way... need to check...) or try replacing ext3 to get more info
on how it got to that error point.

-Eric

Comment 7 Eric Sandeen 2008-04-11 21:30:51 UTC

notting, if you get to it before me, see if you can pop over to a shell and do
mount -o remount,errors=panic on the dm-2 fs early in the install process...

Comment 8 Bill Nottingham 2008-04-14 22:07:34 UTC

Initial error is:

EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks not in datazone -
block = 202366355, count = 1
Kernel panic - not syncing: EXT3-fs (device dm-2): panic forced after error

Comment 9 Bill Nottingham 2008-04-14 22:08:20 UTC

panic isn't doing a stack trace. Haven't tried to rebuild the kernel for the livecd.

Comment 10 Eric Sandeen 2008-04-14 22:14:46 UTC

yep... yay, ext3_error() :/

rebuilding ext3.ko to provide more info would maybe help.  I could provide that
if you want... put it in the initrd... rebuild/reburn... yuck.

FWIW I can hit this too.  :)

Trying to cook up a way to hit it outside of anaconda so I can debug it more
easily...

so far, no luck.

-Eric

Comment 11 Eric Sandeen 2008-04-15 22:10:14 UTC

What does anaconda actually copy when it copies the fs image to the system root
disk?

I put a pre-resize early return in livecd.py and the un-resized image has the
corruption that notting and I both saw...

however:

[root@localhost]# e2fsck -fn /dev/mapper/live-osimg-min 

this checks clean...

is this what anaconda is copying?

Thanks,
-Eric

Comment 12 Bill Nottingham 2008-04-15 22:15:48 UTC

Believe so, yes.

Comment 13 Eric Sandeen 2008-04-15 22:26:43 UTC

hrm.

Ok will keep trying to narrow it down....

The inode for the corrupted file itself is intact, but the first indirect block
that it points to is full of garbage, even on my not-resized-but-just-copied LV.

The inode metadata matches the original fs image; i.e. atime, ctime etc are not
changed; whatever corruption happened doesn't seem related to operations on this
inode, it was just that an indirect block seems clobbered.

-Eric

Comment 14 Eric Sandeen 2008-04-15 22:28:29 UTC

in debugfs-speak:

[root@localhost foo]# debugfs /dev/mapper/live-osimg-min
debugfs 1.40.8 (13-Mar-2008)
debugfs:  stat <86555>
Inode: 86555   Type: regular    Mode:  0644   Flags: 0x0   Generation:
2462646136
User:     0   Group:     0   Size: 150359
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 304
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x47ff7b37 -- Fri Apr 11 09:52:39 2008
atime: 0x47ff78ca -- Fri Apr 11 09:42:18 2008
mtime: 0x47ff78ca -- Fri Apr 11 09:42:18 2008
Size of extra inode fields: 4
Extended attributes stored in inode body:
  selinux = "system_u:object_r:modules_dep_t:s0\000" (35)
BLOCKS:
(0-11):577574-577585, (IND):577586, (12-36):577587-577611
TOTAL: 38

debugfs:  quit
[root@localhost foo]# debugfs /dev/mapper/VolGroup00-LogVol00
debugfs 1.40.8 (13-Mar-2008)
debugfs:  stat <86555>
Inode: 86555   Type: regular    Mode:  0644   Flags: 0x0   Generation:
2462646136
User:     0   Group:     0   Size: 150359
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 304
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x47ff7b37 -- Fri Apr 11 09:52:39 2008
atime: 0x47ff78ca -- Fri Apr 11 09:42:18 2008
mtime: 0x47ff78ca -- Fri Apr 11 09:42:18 2008
Size of extra inode fields: 4
Extended attributes stored in inode body:
  selinux = "system_u:object_r:modules_dep_t:s0\000" (35)
BLOCKS:
(0-11):577574-577585, (IND):577586, (12):2943334241, (13):1038652331,

so the indirect block (577586) is just full of junk at this point.

Comment 15 Bill Nottingham 2008-04-15 22:34:39 UTC

And this block is correct on the 'live' filesystem, theoretically?

Comment 16 Eric Sandeen 2008-04-16 00:07:44 UTC

Because of the way the copy loop gets sizes:

        readamt = 1024 * 1024 * 8 # 8 megs at a time
        size = float(self._getLiveSizeMB() * 1024 * 1024)
        copied = 0
        while copied < size:
            buf = os.read(osfd, readamt)
            written = os.write(rootfd, buf)
            if (written < readamt) and (written < len(buf)):
                raise RuntimeError, "error copying filesystem!"
            copied += written
            progress.set_fraction(pct = copied / size)
            progress.processEvents()

and _getLiveSizeMB does:

        return blkcnt * blksize / 1024 / 1024

This is going to round the copied size down to the nearest megabyte, no?  And
miss the last part of the filesystem... where this corrupt block we saw just
happens to live...

-Eric

Comment 17 Bill Nottingham 2008-04-16 02:03:36 UTC

Thanks for the debugging!

Fixed in
http://git.fedorahosted.org/git/?p=anaconda.git;a=commit;h=9083f70668bfeb72b5dfea73d5cc68685e057e8b

Comment 18 Eric Sandeen 2008-04-16 03:16:27 UTC

*** Bug 431647 has been marked as a duplicate of this bug. ***

Comment 19 Bill Nottingham 2008-04-21 21:38:52 UTC

Closing, did a test of this patch.