Bug 431647 - efsprogs seems broken
Summary: efsprogs seems broken
Keywords:
Status: CLOSED DUPLICATE of bug 442106
Alias: None
Product: Fedora
Classification: Fedora
Component: e2fsprogs
Version: rawhide
Hardware: All
OS: Linux
low
urgent
Target Milestone: ---
Assignee: Eric Sandeen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-02-06 05:21 UTC by cornel panceac
Modified: 2008-04-16 03:16 UTC (History)
2 users (show)

Fixed In Version: f9beta
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-03-28 19:35:08 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
dump output before the update (22.79 KB, text/plain)
2008-02-07 17:30 UTC, cornel panceac
no flags Details
fsck -n before update and on mounted fs (51.17 KB, text/plain)
2008-02-07 17:30 UTC, cornel panceac
no flags Details
partitions as seen from the installed f9a before the update (296 bytes, application/octet-stream)
2008-02-07 17:31 UTC, cornel panceac
no flags Details
fsck output after partial update (10.90 MB, text/plain)
2008-02-08 19:13 UTC, cornel panceac
no flags Details
fsck after partial update 9 february 2008 8:20 (60.73 KB, text/plain)
2008-02-09 06:28 UTC, cornel panceac
no flags Details
e2i after partial update 9 february 2008 8:15 (14 bytes, application/x-bzip2)
2008-02-09 06:29 UTC, cornel panceac
no flags Details

Description cornel panceac 2008-02-06 05:21:08 UTC
Description of problem:
f9 alpha x86_64 live cd : after instal completes, it seems that e2fsprogs are
broken, an attempt to update using "software updater" ends with broken
filesystem and unusable grub. however, if i first update e2fsprogs and then yum
(and nothing else) using yum command-line, ends with a bootable grub. have yet
to check the state of f9 alpha though.

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1.install f9 alpha from x86_64 live cd
2.try to update the system using "software updater"
3.
  
Actual results:
software updater dissapears and filesystem is broken

Expected results:
update completes successfully

Additional info:

Comment 1 Eric Sandeen 2008-02-06 14:08:34 UTC
I'm going to need more info than that.

"it seems that e2fsprogs are broken" -- broken how, what doesn't work?
"broken filesystem" -- what makes you say that? Broken how?
"unusable grub" -- how is it unusable?  What works, or doesn't; what messages.

Please include actual information about what is broken and how, as well as any
relevant messages from the kernel (dmesg / console), grub, e2fsck, or whatever.

I can try this myself, too.

Comment 2 cornel panceac 2008-02-06 15:28:37 UTC
ok, i'll reinstall now.

Comment 3 cornel panceac 2008-02-06 16:59:06 UTC
ok, after first boot, i've downloaded the available updates and, after updating
about hlf of them, software updater freezed apparently. ctrl-alt-f1 shows the
following errors:

EXT3-fs error (device sda 6): ext3_get_inode_loc: unable to read inode block -
inode 655421, block=1201656879
(same message again)
EXT3-fs error (device sda6)in ext3_reserve_inode_write: IO failure

after this, i logout and X doesn't restart and shutdown could not be run due to
the ext3 errors, so the only option i was aware of was cold reboot. after
reboot, grub prompt shows up so i reinstalled f9 alpha and then without updating
anything just added f8 to grub and started it to report this. 

Comment 4 Eric Sandeen 2008-02-06 20:52:56 UTC
ok, good info, thanks.

can you show me what "grep sda6 /proc/partitions" looks like?
(just to make sure your disk really isn't in the terabyte range...!)

Were there any other messages prior to those that might indicate storage problems?

can you also boot the rescue disk, and do:

# dumpe2fs /dev/sda6
# e2fsck -n /dev/sda6
and maybe even
# e2image -r /dev/sda6 - | bzip2 > sda6.e2i.bz2

and then put that sda6.e2i.bz2 somewhere I can get to it?

FWIW I did a live install on x86_64, to a pre-existing /dev/sda11, and a
pre-existing /boot (which I did not format...) then did an update, and
everything went fine for me :(

Thanks,

-Eric

Comment 5 Eric Sandeen 2008-02-06 20:57:45 UTC
Oh.  You reinstalled f9 alpha after the problems... so we don't know if your fs
is corrupt now, or not.

If you would be so kind... re-running your install/updatesteps to get a
corrupted filesystem, and then following the steps I requested in comment #4,
would give me something to look at.

Thanks!

-Eric

Comment 6 cornel panceac 2008-02-06 21:01:54 UTC
sure, but not now, 20 hours later. now is not possible. where can i get the
rescue cd from? i see no torrent for it.

Comment 7 cornel panceac 2008-02-06 21:03:35 UTC
also, 
grep sda6 /proc/partitions
should be run from rescue cd also? can another live cd be useful? (like system
rescue cd, sysresccd.org )

Comment 8 Eric Sandeen 2008-02-06 21:06:18 UTC
actually just running the live cd to do those steps will be fine; just open a
terminal and do those things.  (rather than grepping, though, perhaps give me
all of /proc/partitions just to be sure)

(FWIW: disk 1 of the install can be used in rescue mode; I'm not certain about
the live cd, but simply running the live CD and using it to analyze your
problematic filesystem should be fine)

Thanks!

-Eric

Comment 9 cornel panceac 2008-02-06 21:14:35 UTC
ok, it will be done 20 hours from now.

Comment 10 Eric Sandeen 2008-02-06 21:17:02 UTC
Great, thanks for your help tracking this down!

-Eric

Comment 11 cornel panceac 2008-02-07 17:30:00 UTC
Created attachment 294230 [details]
dump output before the update

Comment 12 cornel panceac 2008-02-07 17:30:54 UTC
Created attachment 294232 [details]
fsck -n before update and on mounted fs

Comment 13 cornel panceac 2008-02-07 17:31:43 UTC
Created attachment 294233 [details]
partitions as seen from the installed f9a before the update

Comment 14 cornel panceac 2008-02-07 17:35:31 UTC
e2image output is here:
http://www.sendspace.com/file/y5s6yp

i wanna mention that before login to the newly installed system (that's before
update) i go to c-a-f1 and create an unprivileged user, then i login in this new
account.
also, i've noticed that before creating any user, ls /home says
/home/lmacken
.

Comment 15 cornel panceac 2008-02-07 17:36:31 UTC
this time i've left the broken f9 intact and get grub back using puppy.

Comment 16 Eric Sandeen 2008-02-07 17:51:06 UTC
Thanks for all the info!  I suppose I should have also asked for dmesg output
after the original anaconda install, to see if there were any other errors
reported.  I'll look over the image to see if I can work backwards to what went
wrong...

Has this system been happy & stable with other OSes?

Comment 17 cornel panceac 2008-02-07 18:07:00 UTC
it worked fine with rawhide until i've replaced it with f9 alpha. but i was not
working full time with rawhide, only checking from time to time. however, all
updates completed successfully on it :) 
the only other OS installed is f8 and i use it daily without any significant issue.


Comment 18 Eric Sandeen 2008-02-07 18:37:38 UTC
Just to be sure - was the e2image created before the updates were applied, or
after?  And was it created while the fs was mounted?

From the fsck -n output pre-upgrade, it looks like it was corrupted from the
start, but since it's mounted sometimes odd things show up...  but it looks like
either the original image was corrupted, something went wrong in the transfer,
or something went wrong in the growfs stage, I think...

Overall it'd be best to get fsck & e2image info from the fs while it's
unmounted, both before & after you apply the updates, i.e.:

boot livecd
install to HD
boot into f8, gather e2fsck -n and e2image from /dev/sda6
boot into f9alpha, apply updates (I assume this is how you updated?)
boot back into f8, gather e2fsck -n and e2image from /dev/sda6 again

that'd tell us for sure if it was corrupted prior to the updates or not.

But, I know that's more work; I can see what I can find from what you've given
me, too.

Thanks,
-Eric

Comment 19 cornel panceac 2008-02-07 19:00:55 UTC
all data was gathered before the update and while the filesystem was mounted. i
have also some data after the update with filesystem umounted but basicaly it
says the file system is dead :) let me know if you wanna see it anyway.
20 hours from now i'll try the new test plan and report back. (now i'm tired at
the end of the day and i definitely don't wanna lose my data just because i'm
tired :) )

Comment 20 Eric Sandeen 2008-02-07 19:19:57 UTC
So, I checked the original ext3 image in the livecd; it's clean:

[root@localhost ~]# e2fsck -nf ext3fs.img 
e2fsck 1.40.5 (27-Jan-2008)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
F9-Alpha-x86_64-: 87955/524288 files (1.1% non-contiguous), 615559/1048576 blocks

-Eric

Comment 21 Eric Sandeen 2008-02-07 20:19:11 UTC
I also extracted the original ext3 image, grew it to the size of your block
device, and resized it just as the installer would.  This was also fine.

if I stat the problematic inode in my fs...

BLOCKS:
(0-11):608280-608291, (IND):608292, (12-61):608293-608342

and in yours:
BLOCKS:
(0-11):608280-608291, (IND):608292, (12):262144, (13):878903296, (14):79927,
(15):65536, (16):65536, (17):196608, (18):87890329
6, (19):312, (20):256, (21):256, (22):1024, (23):942957312, (24):312, (25):256,
(26):256, (27):1024, (28):959734528, (29):312, 
....

odd.

So it appears the indirect block is corrupted (IND).  If we dump it out:

[root@neon src2]# dd if=bad_sda6.img bs=4096 skip=608292 count=1 | hexdump -C
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 3.025e-05 s, 135 MB/s
00000000  00 00 04 00 00 00 63 34  37 38 01 00 00 00 01 00  |......c478......|
00000010  00 00 01 00 00 00 03 00  00 00 63 34 38 01 00 00  |..........c48...|
00000020  00 01 00 00 00 01 00 00  00 04 00 00 00 63 34 38  |.............c48|
00000030  38 01 00 00 00 01 00 00  00 01 00 00 00 04 00 00  |8...............|
00000040  00 63 34 39 38 01 00 00  00 01 00 00 00 01 00 00  |.c498...........|
00000050  00 04 00 00 00 63 35 30  38 01 00 00 00 01 00 00  |.....c508.......|
00000060  00 01 00 00 00 04 00 00  00 63 35 31 38 01 00 00  |.........c518...|
00000070  00 01 00 00 00 01 00 00  00 04 00 00 00 63 35 32  |.............c52|
00000080  38 01 00 00 00 01 00 00  00 01 00 00 00 04 00 00  |8...............|
....

ok now what could that be....



Comment 22 Eric Sandeen 2008-02-07 20:41:19 UTC
The rest of that block contains things like:

# Directory patterns (dir)
# Parameters:
# 1. domain type
# 2. container (directory) type
# 3. directory type
# Regular file patterns (file)
# Parameters:
# 1. domain type
# 2. container (directory) type
# 3. file type

this is from selinux, those strings can be found in selinux-policy-targeted
files for example.

If I look at the image I extracted & grew:

[root@localhost ~]# dd if=ext3fs.img bs=4096 skip=608292 count=1 | hexdump -C
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.0267473 s, 153 kB/s
00000000  25 48 09 00 26 48 09 00  27 48 09 00 28 48 09 00  |%H..&H..'H..(H..|
00000010  29 48 09 00 2a 48 09 00  2b 48 09 00 2c 48 09 00  |)H..*H..+H..,H..|
00000020  2d 48 09 00 2e 48 09 00  2f 48 09 00 30 48 09 00  |-H...H../H..0H..|
00000030  31 48 09 00 32 48 09 00  33 48 09 00 34 48 09 00  |1H..2H..3H..4H..|
00000040  35 48 09 00 36 48 09 00  37 48 09 00 38 48 09 00  |5H..6H..7H..8H..|
00000050  39 48 09 00 3a 48 09 00  3b 48 09 00 3c 48 09 00  |9H..:H..;H..<H..|
....

we get the proper indirect blocks.

So in your case either this block was not copied properly, or something else
copied over it afterwards.

BTW:  in comment #18, if after these steps:

boot livecd
install to HD
boot into f8, gather e2fsck -n and e2image from /dev/sda6

e2fsck finds corruption, no need to try the updates.

Comment 23 Eric Sandeen 2008-02-07 20:59:01 UTC
also, just to be certain, can you run memtestx86 on the box for a while?



Comment 24 cornel panceac 2008-02-08 15:56:00 UTC
memtest ran successfully for more than two passes. however smartctl said that
once the drive that holds the / partition once in the past was overheated. now
i'll proceed with the reinstall and then i'll report back.

Comment 25 cornel panceac 2008-02-08 16:44:54 UTC
before first f9a boot, fsck says clean! however there's one little change, i
didn't install grub this time.

Comment 26 cornel panceac 2008-02-08 16:48:28 UTC
e2i before first boot.

http://www.sendspace.com/file/7lh6iv

Comment 27 cornel panceac 2008-02-08 17:39:05 UTC
after firstboot ; c-a-del , filsystem checked from f8 is still clean! no login
yet on f9.

Comment 28 Eric Sandeen 2008-02-08 17:44:32 UTC
Could you please describe the exact steps you took intially when you got the
corruptd filesystem, including each boot/reboot, what you booted into, what you
clicked/ran/mounted/updated, etc?

Thanks,
-Eric

Comment 29 cornel panceac 2008-02-08 17:47:32 UTC
e2i after firstboot

http://www.sendspace.com/file/ebqctf

until now, the only difference is that i no longer installed grub. i'll go now
and login as root in console and adduser ... passwd ... . then i'll go offline
to check the filesystem.

Comment 30 cornel panceac 2008-02-08 18:06:09 UTC
after i login as root in console, first thing i've run was e2fsck -n /dev/sda6.
and it reported errors! then c-a-del, back into f8, e2fsck says clean. the dump:
http://www.sendspace.com/file/sgzl2g

Comment 31 cornel panceac 2008-02-08 18:25:45 UTC
after useradd in console and reboot, f8 says the filesystem is clean, dump is:
http://www.sendspace.com/file/jj2b0j

update time :)

Comment 32 cornel panceac 2008-02-08 19:03:28 UTC
well, as usual, while updates were being installed and i was browsing internet
with firefox, my mouse freeze, my keyboard freeze, so i cold reboot machine, and
then i did it: instead of letting f8 to start, i selected f9 :(
i pressed ctrl-alt-del while the filesystem was about to recover journal. so,
these results are not pure, maybe :)
files will follow soon.

Comment 33 cornel panceac 2008-02-08 19:11:41 UTC
e2i after partial update and maybe after partial recovery of journal (?!?)
http://www.sendspace.com/file/0w4p2s

Comment 34 cornel panceac 2008-02-08 19:13:41 UTC
Created attachment 294399 [details]
fsck output after partial update

Comment 35 Eric Sandeen 2008-02-08 22:30:19 UTC
re: comment #30, if you are doing e2fsck on a mounted filesystem you can expect
to see errors.

I can't see any rhyme or reason to your corruption; I'm very tempted to blame
hardware at this point.

If you can completely and accurately describe the simplest set of steps you can
take to get from live CD boot to the corrupted filesystem, it would help.

Comment 36 cornel panceac 2008-02-09 04:16:28 UTC
ok. i'll do it once again and write everything on paper. ( but first i'll test
the hard disk drives )

Comment 37 cornel panceac 2008-02-09 06:25:38 UTC
hard disk long test completed successfully.

steps to reproduce the error:

installing

f9a x86_64
during install, i choose
bucharest, not utc
custom layout
edit sdb6 , / , format
format,continue,ignore
no grub
logout
c-a-f1,c-a-del

at first boot:

complete firstboot
c-a-f1, login as root
useradd guzu, passwd, ctrl-d, c-a-f7
login as guzu
authenticate (pulseaudio)
view updates
apply updates
----downloading
----updating
after about 40% of updates, progress bar moves very fast till the middle (like
updating very fast) and then software updater dissapears. mouse is still moving,
c-a-f1, 
c-a-del does nothing due to shutdown being unable to run, so i cold reboot.
files will follow.

Comment 38 cornel panceac 2008-02-09 06:28:16 UTC
Created attachment 294451 [details]
fsck after partial update 9 february 2008 8:20

Comment 39 cornel panceac 2008-02-09 06:29:00 UTC
Created attachment 294452 [details]
e2i after partial update 9 february 2008 8:15

Comment 40 cornel panceac 2008-02-09 06:37:56 UTC
my smolt profile:

http://rafb.net/p/G45YO713.html

Comment 41 Eric Sandeen 2008-03-02 15:07:47 UTC
Ok, where are you at with this one; we never seemed to get to the bottom of it.
 Do you have any new info, does the problem persist?  I was never able to
reproduce... though I should still try to follow your exact steps, above.

Thanks,
-Eric

Comment 42 cornel panceac 2008-03-03 13:56:09 UTC
no new info, i am unable to boot the system. i may try to repair it from a live
cd but, i wanted to see if you have a better idea. i may also try to install the
i386 version or download and burn again the x86_64 version. or  i can just wait
for the next rawhide livecd.

Comment 43 cornel panceac 2008-03-07 19:07:00 UTC
i've overwritten that partition with f9alpha installed from i686 live kde (two
times) and everything just works.

Comment 44 cornel panceac 2008-03-28 19:35:08 UTC
it's no longer present in f9 beta x86_64. thnx again for your help.

Comment 45 Eric Sandeen 2008-04-16 03:16:26 UTC
For what it's worth, I think we finally had a good reproducer and a resolution
for this one; see bug #442106 which is probably what you were hitting...

Basically what was happening was that the livecd image was not completely copied
to the system, resulting in some stale data on the disk, and corrupted fs. 
However, the behavior would differ depending on the particular incarnation of
the filesystem in the snapshot.

Based on the fsck output in this bug, pretty sure it's what you hit, so I'll dup it.

Sorry I didn't get to the bottom of it earlier!

*** This bug has been marked as a duplicate of 442106 ***


Note You need to log in before you can comment on or make changes to this bug.