Bug 431647
Summary: | efsprogs seems broken | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | cornel panceac <cpanceac> |
Component: | e2fsprogs | Assignee: | Eric Sandeen <esandeen> |
Status: | CLOSED DUPLICATE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | urgent | Docs Contact: | |
Priority: | low | ||
Version: | rawhide | CC: | kzak, oliver |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | f9beta | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-03-28 19:35:08 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
cornel panceac
2008-02-06 05:21:08 UTC
I'm going to need more info than that. "it seems that e2fsprogs are broken" -- broken how, what doesn't work? "broken filesystem" -- what makes you say that? Broken how? "unusable grub" -- how is it unusable? What works, or doesn't; what messages. Please include actual information about what is broken and how, as well as any relevant messages from the kernel (dmesg / console), grub, e2fsck, or whatever. I can try this myself, too. ok, i'll reinstall now. ok, after first boot, i've downloaded the available updates and, after updating about hlf of them, software updater freezed apparently. ctrl-alt-f1 shows the following errors: EXT3-fs error (device sda 6): ext3_get_inode_loc: unable to read inode block - inode 655421, block=1201656879 (same message again) EXT3-fs error (device sda6)in ext3_reserve_inode_write: IO failure after this, i logout and X doesn't restart and shutdown could not be run due to the ext3 errors, so the only option i was aware of was cold reboot. after reboot, grub prompt shows up so i reinstalled f9 alpha and then without updating anything just added f8 to grub and started it to report this. ok, good info, thanks. can you show me what "grep sda6 /proc/partitions" looks like? (just to make sure your disk really isn't in the terabyte range...!) Were there any other messages prior to those that might indicate storage problems? can you also boot the rescue disk, and do: # dumpe2fs /dev/sda6 # e2fsck -n /dev/sda6 and maybe even # e2image -r /dev/sda6 - | bzip2 > sda6.e2i.bz2 and then put that sda6.e2i.bz2 somewhere I can get to it? FWIW I did a live install on x86_64, to a pre-existing /dev/sda11, and a pre-existing /boot (which I did not format...) then did an update, and everything went fine for me :( Thanks, -Eric Oh. You reinstalled f9 alpha after the problems... so we don't know if your fs is corrupt now, or not. If you would be so kind... re-running your install/updatesteps to get a corrupted filesystem, and then following the steps I requested in comment #4, would give me something to look at. Thanks! -Eric sure, but not now, 20 hours later. now is not possible. where can i get the rescue cd from? i see no torrent for it. also, grep sda6 /proc/partitions should be run from rescue cd also? can another live cd be useful? (like system rescue cd, sysresccd.org ) actually just running the live cd to do those steps will be fine; just open a terminal and do those things. (rather than grepping, though, perhaps give me all of /proc/partitions just to be sure) (FWIW: disk 1 of the install can be used in rescue mode; I'm not certain about the live cd, but simply running the live CD and using it to analyze your problematic filesystem should be fine) Thanks! -Eric ok, it will be done 20 hours from now. Great, thanks for your help tracking this down! -Eric Created attachment 294230 [details]
dump output before the update
Created attachment 294232 [details]
fsck -n before update and on mounted fs
Created attachment 294233 [details]
partitions as seen from the installed f9a before the update
e2image output is here: http://www.sendspace.com/file/y5s6yp i wanna mention that before login to the newly installed system (that's before update) i go to c-a-f1 and create an unprivileged user, then i login in this new account. also, i've noticed that before creating any user, ls /home says /home/lmacken . this time i've left the broken f9 intact and get grub back using puppy. Thanks for all the info! I suppose I should have also asked for dmesg output after the original anaconda install, to see if there were any other errors reported. I'll look over the image to see if I can work backwards to what went wrong... Has this system been happy & stable with other OSes? it worked fine with rawhide until i've replaced it with f9 alpha. but i was not working full time with rawhide, only checking from time to time. however, all updates completed successfully on it :) the only other OS installed is f8 and i use it daily without any significant issue. Just to be sure - was the e2image created before the updates were applied, or after? And was it created while the fs was mounted? From the fsck -n output pre-upgrade, it looks like it was corrupted from the start, but since it's mounted sometimes odd things show up... but it looks like either the original image was corrupted, something went wrong in the transfer, or something went wrong in the growfs stage, I think... Overall it'd be best to get fsck & e2image info from the fs while it's unmounted, both before & after you apply the updates, i.e.: boot livecd install to HD boot into f8, gather e2fsck -n and e2image from /dev/sda6 boot into f9alpha, apply updates (I assume this is how you updated?) boot back into f8, gather e2fsck -n and e2image from /dev/sda6 again that'd tell us for sure if it was corrupted prior to the updates or not. But, I know that's more work; I can see what I can find from what you've given me, too. Thanks, -Eric all data was gathered before the update and while the filesystem was mounted. i have also some data after the update with filesystem umounted but basicaly it says the file system is dead :) let me know if you wanna see it anyway. 20 hours from now i'll try the new test plan and report back. (now i'm tired at the end of the day and i definitely don't wanna lose my data just because i'm tired :) ) So, I checked the original ext3 image in the livecd; it's clean: [root@localhost ~]# e2fsck -nf ext3fs.img e2fsck 1.40.5 (27-Jan-2008) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information F9-Alpha-x86_64-: 87955/524288 files (1.1% non-contiguous), 615559/1048576 blocks -Eric I also extracted the original ext3 image, grew it to the size of your block device, and resized it just as the installer would. This was also fine. if I stat the problematic inode in my fs... BLOCKS: (0-11):608280-608291, (IND):608292, (12-61):608293-608342 and in yours: BLOCKS: (0-11):608280-608291, (IND):608292, (12):262144, (13):878903296, (14):79927, (15):65536, (16):65536, (17):196608, (18):87890329 6, (19):312, (20):256, (21):256, (22):1024, (23):942957312, (24):312, (25):256, (26):256, (27):1024, (28):959734528, (29):312, .... odd. So it appears the indirect block is corrupted (IND). If we dump it out: [root@neon src2]# dd if=bad_sda6.img bs=4096 skip=608292 count=1 | hexdump -C 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 3.025e-05 s, 135 MB/s 00000000 00 00 04 00 00 00 63 34 37 38 01 00 00 00 01 00 |......c478......| 00000010 00 00 01 00 00 00 03 00 00 00 63 34 38 01 00 00 |..........c48...| 00000020 00 01 00 00 00 01 00 00 00 04 00 00 00 63 34 38 |.............c48| 00000030 38 01 00 00 00 01 00 00 00 01 00 00 00 04 00 00 |8...............| 00000040 00 63 34 39 38 01 00 00 00 01 00 00 00 01 00 00 |.c498...........| 00000050 00 04 00 00 00 63 35 30 38 01 00 00 00 01 00 00 |.....c508.......| 00000060 00 01 00 00 00 04 00 00 00 63 35 31 38 01 00 00 |.........c518...| 00000070 00 01 00 00 00 01 00 00 00 04 00 00 00 63 35 32 |.............c52| 00000080 38 01 00 00 00 01 00 00 00 01 00 00 00 04 00 00 |8...............| .... ok now what could that be.... The rest of that block contains things like: # Directory patterns (dir) # Parameters: # 1. domain type # 2. container (directory) type # 3. directory type # Regular file patterns (file) # Parameters: # 1. domain type # 2. container (directory) type # 3. file type this is from selinux, those strings can be found in selinux-policy-targeted files for example. If I look at the image I extracted & grew: [root@localhost ~]# dd if=ext3fs.img bs=4096 skip=608292 count=1 | hexdump -C 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.0267473 s, 153 kB/s 00000000 25 48 09 00 26 48 09 00 27 48 09 00 28 48 09 00 |%H..&H..'H..(H..| 00000010 29 48 09 00 2a 48 09 00 2b 48 09 00 2c 48 09 00 |)H..*H..+H..,H..| 00000020 2d 48 09 00 2e 48 09 00 2f 48 09 00 30 48 09 00 |-H...H../H..0H..| 00000030 31 48 09 00 32 48 09 00 33 48 09 00 34 48 09 00 |1H..2H..3H..4H..| 00000040 35 48 09 00 36 48 09 00 37 48 09 00 38 48 09 00 |5H..6H..7H..8H..| 00000050 39 48 09 00 3a 48 09 00 3b 48 09 00 3c 48 09 00 |9H..:H..;H..<H..| .... we get the proper indirect blocks. So in your case either this block was not copied properly, or something else copied over it afterwards. BTW: in comment #18, if after these steps: boot livecd install to HD boot into f8, gather e2fsck -n and e2image from /dev/sda6 e2fsck finds corruption, no need to try the updates. also, just to be certain, can you run memtestx86 on the box for a while? memtest ran successfully for more than two passes. however smartctl said that once the drive that holds the / partition once in the past was overheated. now i'll proceed with the reinstall and then i'll report back. before first f9a boot, fsck says clean! however there's one little change, i didn't install grub this time. e2i before first boot. http://www.sendspace.com/file/7lh6iv after firstboot ; c-a-del , filsystem checked from f8 is still clean! no login yet on f9. Could you please describe the exact steps you took intially when you got the corruptd filesystem, including each boot/reboot, what you booted into, what you clicked/ran/mounted/updated, etc? Thanks, -Eric e2i after firstboot http://www.sendspace.com/file/ebqctf until now, the only difference is that i no longer installed grub. i'll go now and login as root in console and adduser ... passwd ... . then i'll go offline to check the filesystem. after i login as root in console, first thing i've run was e2fsck -n /dev/sda6. and it reported errors! then c-a-del, back into f8, e2fsck says clean. the dump: http://www.sendspace.com/file/sgzl2g after useradd in console and reboot, f8 says the filesystem is clean, dump is: http://www.sendspace.com/file/jj2b0j update time :) well, as usual, while updates were being installed and i was browsing internet with firefox, my mouse freeze, my keyboard freeze, so i cold reboot machine, and then i did it: instead of letting f8 to start, i selected f9 :( i pressed ctrl-alt-del while the filesystem was about to recover journal. so, these results are not pure, maybe :) files will follow soon. e2i after partial update and maybe after partial recovery of journal (?!?) http://www.sendspace.com/file/0w4p2s Created attachment 294399 [details]
fsck output after partial update
re: comment #30, if you are doing e2fsck on a mounted filesystem you can expect to see errors. I can't see any rhyme or reason to your corruption; I'm very tempted to blame hardware at this point. If you can completely and accurately describe the simplest set of steps you can take to get from live CD boot to the corrupted filesystem, it would help. ok. i'll do it once again and write everything on paper. ( but first i'll test the hard disk drives ) hard disk long test completed successfully. steps to reproduce the error: installing f9a x86_64 during install, i choose bucharest, not utc custom layout edit sdb6 , / , format format,continue,ignore no grub logout c-a-f1,c-a-del at first boot: complete firstboot c-a-f1, login as root useradd guzu, passwd, ctrl-d, c-a-f7 login as guzu authenticate (pulseaudio) view updates apply updates ----downloading ----updating after about 40% of updates, progress bar moves very fast till the middle (like updating very fast) and then software updater dissapears. mouse is still moving, c-a-f1, c-a-del does nothing due to shutdown being unable to run, so i cold reboot. files will follow. Created attachment 294451 [details]
fsck after partial update 9 february 2008 8:20
Created attachment 294452 [details]
e2i after partial update 9 february 2008 8:15
my smolt profile: http://rafb.net/p/G45YO713.html Ok, where are you at with this one; we never seemed to get to the bottom of it. Do you have any new info, does the problem persist? I was never able to reproduce... though I should still try to follow your exact steps, above. Thanks, -Eric no new info, i am unable to boot the system. i may try to repair it from a live cd but, i wanted to see if you have a better idea. i may also try to install the i386 version or download and burn again the x86_64 version. or i can just wait for the next rawhide livecd. i've overwritten that partition with f9alpha installed from i686 live kde (two times) and everything just works. it's no longer present in f9 beta x86_64. thnx again for your help. For what it's worth, I think we finally had a good reproducer and a resolution for this one; see bug #442106 which is probably what you were hitting... Basically what was happening was that the livecd image was not completely copied to the system, resulting in some stale data on the disk, and corrupted fs. However, the behavior would differ depending on the particular incarnation of the filesystem in the snapshot. Based on the fsck output in this bug, pretty sure it's what you hit, so I'll dup it. Sorry I didn't get to the bottom of it earlier! *** This bug has been marked as a duplicate of 442106 *** |