|Summary:||File system errors left uncorrected, failure to install along side Mac OS, Mactel boot|
|Product:||[Fedora] Fedora||Reporter:||Chris Murphy <bugzilla>|
|Component:||anaconda||Assignee:||Anaconda Maintenance Team <anaconda-maint-list>|
|Status:||CLOSED CANTFIX||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Version:||17||CC:||anaconda-maint-list, awilliam, esandeen, felix, g.kaviyarasu, jfeeney, jonathan, robatino, satellitgo, the.ridikulus.rat, vanmeeuwen+fedora|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2012-07-18 17:46:11 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Chris Murphy 2012-04-28 12:05:14 UTC
Description of problem: Existing Mac OS 10.7 installation with free space, installer fails with File system errors left uncorrected message. Version-Release number of selected component (if applicable): Fedora-17.TC2-x86_64-Live-Desktop.iso How reproducible: 1 for 1 Steps to Reproduce: Installation of Mac OS 10.7: 1. Single partition installation produces three partitions: EFI System, Mac OS, Recovery HD. Use JHFSX as the file system. System installed and rebooted normally. 2. Within Mac OS, used the following to resize JHFSX diskutil resizevolume /dev/disk0s2 20G Result of this command is the same number of partitions as before: EFI, Mac OS, Recovery, but the last ~480GB are unallocated now. Execute Fedora Installation: 3. dd ISO to USB stick from Mac OS command line. 4. Boot from USB stick choosing EFI Boot option. 5. Run Fedora Installer. 6. Choose installation type "Use Free Space" with review of partition layout, accepted partition layout. Actual results: ext4 filesystem check failure immediately after it was created on vf_f17s-lv_root Expected results: proper creation of the file system, then copying of live image to hard drive. Additional Information: 10.7.0 MBP4,1 Within past 7 days the internal disk has receive an extended SMART scan, no read or sector errors. Within past 60 days ATA Secure Erase was issued to the drive. When I e2fsck -f I get hundreds of errors to fix and just gave up. The file system is completely fakaked as far as I can tell.
Comment 4 Chris Murphy 2012-04-28 12:20:18 UTC
Did the following immediately after saving logs: Reboot Mac OS. Fine. Reboot off USB stick, EFI Boot F17 TC2. Choose Replace Existing Linux Systems, and review part layout. Accept the default layout, write changes to disk. Right after ext4 file system creation on vg_f17s-lv_root, but before live image starts getting copied, I get the same error message. Attaching 2nd copy program.log as well as storage.state which I forgot to do the first time around.
Comment 5 Chris Murphy 2012-04-28 12:21:23 UTC
Created attachment 580965 [details] storage.state (attempt 2)
Comment 6 Chris Murphy 2012-04-28 12:22:11 UTC
Created attachment 580966 [details] program.log (attempt 2)
Comment 7 Chris Murphy 2012-04-28 12:44:01 UTC
OK and now 4 for 4 failure/attempts. Twice with Use Free Space and twice with Replace Existing.
Comment 8 Chris Murphy 2012-04-28 21:05:48 UTC
Does not occur when installing from Live-Desktop ISO burned to DVD-RW media. Does not occur when installing from DVD ISO burned to DVD-RW media. Does occur when installing from Live-Desktop ISO to USB stick using dd. The GRUB option to "Verify and Boot" Fedora goes by so quickly, that it's not at all obvious it has in fact failed. And I'm not finding a log or console where the results of this verification test are found. Removing rghb and quiet, and a moderate shutter speed, I was able to photograph what I could not see. Although blurry, the message plainly reads: Checking 004.8% The media check is complete, the result is <cutoff> It is not recommended to use this media.
Comment 9 Chris Murphy 2012-04-28 21:30:38 UTC
Created attachment 581006 [details] photo of failed verification This probably needs to be reassigned to livecd-tools, it's not an anaconda problem. The media fails verification, but the user is not informed.
Comment 10 Chris Murphy 2012-04-29 22:48:24 UTC
Since media fails verification, I think this bug should be closed or marked as duplicate of bug 817419.
Comment 11 Matthew Garrett 2012-04-30 15:23:13 UTC
I suspect that this is unrelated to the media verification failure.
Comment 12 Chris Murphy 2012-04-30 15:40:29 UTC
This bug is not reproducible when the USB stick is produced with livecd-iso-to-disk.
Comment 13 Matthew Garrett 2012-04-30 17:24:40 UTC
Can't reproduce this here - I wrote TC2 to a USB stick, booted it, installed it into 40GB of free space and the install completed successfully.
Comment 14 Chris Murphy 2012-04-30 17:43:21 UTC
I can only try it on one Lexar 2G stick and a MBP 4,1, but have re-imaged the stick twice, zeroing in between, and have reproduced the results 6+ times in a row. It's jerry-rigged, but I could dd a 4G CF card in a Firewire adapter and see what happens. MBP8,2 hangs so not testable. And a Kingston 16G stick isn't compatible with the MBP4,1, taking so long to load initramfs that GRUB times out and reboots, and Mac OS X takes 45 minutes to boot.
Comment 15 Eric Sandeen 2012-04-30 17:52:22 UTC
Can you pop over to a shell after a failure, and see what's in dmesg after the failed mount? Ideally, getting an "e2image -r" of the problematic device might be interesting so I can see just what's corrupted. Is it an encrypted root? Does the storage configuration affect it at all? What if you install to a plain partition?
Comment 16 Chris Murphy 2012-04-30 18:50:23 UTC
I didn't originally capture dmesg so I will need to reimage the USB stick to get it. I am not configuring an encrypted root on the target disk, however the target disk did previously contain Mac OS X and F17 Beta encrypted roots. The target disk was not wiped, merely repartitioned for this round of testing. Does not fail "Use All Space", only fails when installing along side Mac OS X partitions: EFI System (FAT32), MacOS X (JHFSX, case-sensitive journaled), and Recovery HD (JHFS+, case-insensitive journaled).
Comment 17 Chris Murphy 2012-04-30 20:31:26 UTC
*sigh* OK now with the same stick imaged a 3rd time, I get a hang at dracut:starting plymouth daemon. So I'm not sure if there are still problems dd'ing the image to stick, or if the stick itself is having issues, and I don't have a substitute as this MBP's firmware (or whatever) is beyond fussy about booting off USB anyway. dd'ing to a FW400 CF card, boots, installs, no errors.
Comment 18 Chris Murphy 2012-04-30 22:19:45 UTC
Same stick imaged 4th time, boot, installs, no errors. *shrug* It's possible it's stick related. It's possible the state of the target disk, if it could possibly be a factor, has changed enough that the problem won't reproduce until I regress way back (full disk encryption) and then try again. For grins, I did reinstall Mac OS X as a single partition, and resize, as in steps 1 and 2 in the Description. The problem still does not occur.
Comment 19 Chris Murphy 2012-05-03 10:47:29 UTC
Created attachment 581825 [details] dmesg.txt Reproduced it. But I don't know exactly what option I'm choosing causes it. 1. DVD ISO. "Use All Space" + encryption + XFS for lv_root and lv_home + default packages. 2. Firstboot, launched firefoxed, went to a random web site. 3. Reboot Mac OS X 10.7 from external disk (or DVD), repartition single partition, install OS, no encryption. 4. Reboot to new installation of Mac OS. 5. Resize file system to 20.4GB. 6. Reboot off dd'd USB stick of F17 TC2. 7. Default "Replace Existing" installation type + Review partitions. Fail after formatting, before copying image. I have a dmesg, will attach. # e2image -r /dev/mapper/vg_f17s/lv_root e2img_lvroot.img e2image: Corrupt extent header while iterating over inode 20 Resulting file size for *img is 0 so nothing to attach.
Comment 20 Eric Sandeen 2012-05-03 14:54:04 UTC
# e2image -r /dev/mapper/vg_f17s/lv_root e2img_lvroot.img e2image: Corrupt extent header while iterating over inode 20 Perhaps a dd of the first .... meg or so of /dev/mapper/vg_f17s/lv_root might offer a clue as to what's in there. -Eric
Comment 21 Chris Murphy 2012-05-03 18:40:56 UTC
Created attachment 581928 [details] dd first 1.5MB of lv_root dd if=/dev/vg_f17s/lv_root of=lv_root.img bs=1536K count=1
Comment 22 Eric Sandeen 2012-05-03 19:22:13 UTC
It appears that the corruption in inode 20 is well beyond this, out at 265 megs or so: debugfs: stat <20> Inode: 20 Type: regular Mode: 0644 Flags: 0x80000 Generation: 480441713 Version: 0x00000000:00000001 User: 0 Group: 0 Size: 38453248 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 75104 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x4f9b5918:8a1e4594 -- Fri Apr 27 21:42:32 2012 atime: 0x4f9b5906:670113ec -- Fri Apr 27 21:42:14 2012 mtime: 0x4f9b5906:50da59c4 -- Fri Apr 27 21:42:14 2012 crtime: 0x4f9b574a:19c2a0e0 -- Fri Apr 27 21:34:50 2012 Size of extra inode fields: 28 Extended attributes stored in inode body: selinux = "system_u:object_r:rpm_var_lib_t:s0\000" (35) EXTENTS: debugfs: hm, no extents? debugfs: dump_extents <20> Level Entries Logical Physical Length Flags 0/ 1 1/ 1 0 - 9387 67959 9388 <no more> There is another extent node out at physical block 67959 ... so if you want, you could attach the 67959'th 4k block - or poke around with debugfs as above I was hoping to see "MAC WAS HERE" in ASCII or something but no luck so far.
Comment 23 Chris Murphy 2012-05-03 21:02:57 UTC
Created attachment 581964 [details] dd 4K at 67959 dd if=/dev/vg_f17s/lv_root of=lv_root_67959.img bs=4K skip=67959 count=1 I'm not debugfs proficient. I can make this computer available by ssh if you want to poke around, just contact me offline.
Comment 24 Eric Sandeen 2012-05-03 23:15:33 UTC
I logged in with the info you gave me; not sure if you were on at the same time, but the /dev/vg_f17s/lv_root device disappeared while I was there. Looking at dmesg, there were then lots of errors: [ 7257.140183] SQUASHFS error: xz_dec_run error, data probably corrupt [ 7257.140189] SQUASHFS error: squashfs_read_data failed to read block 0x25c6ec [ 7257.140191] SQUASHFS error: Unable to read data cache entry [25c6ec] [ 7257.140193] SQUASHFS error: Unable to read page, block 25c6ec, size 3e7c [ 7257.140199] SQUASHFS error: Unable to read data cache entry [25c6ec] [ 7257.140200] SQUASHFS error: Unable to read page, block 25c6ec, size 3e7c [ 7257.140205] SQUASHFS error: Unable to read data cache entry [25c6ec] [ 7257.140207] SQUASHFS error: Unable to read page, block 25c6ec, size 3e7c [ 7257.140212] SQUASHFS error: Unable to read data cache entry [25c6ec] [ 7257.140213] SQUASHFS error: Unable to read page, block 25c6ec, size 3e7c [ 7257.140218] SQUASHFS error: Unable to read data cache entry [25c6ec] [ 7257.140220] SQUASHFS error: Unable to read page, block 25c6ec, size 3e7c [ 7257.140224] Buffer I/O error on device loop3, logical block 15904 .... Have you done a memtest on the box? I hate to ask, but something seems pretty odd here.
Comment 25 Chris Murphy 2012-05-03 23:23:54 UTC
We may have bumped each other, I decided to see if the state of the computer is still reproducing the problem and it is. But I'm out of it now if you want to do more work. As mentioned in my email, I remain suspicious of memory corruptions when booted EFI mode on Apple hardware, as this has been a problem manifesting itself in other ways, like with nouveau, which appears thus far to be fixed. There is no memtest86 for EFI as 16-bit isn't available EFI mode. But I have run it in CSM-BIOS mode extensively (days) with no errors. And of course Apple supplies an EFI based tool that does an extended memory test, and other hardware tests as well. And it comes up clean for all.
Comment 26 Chris Murphy 2012-05-03 23:26:10 UTC
When you're done, if you want I can reboot and try to do another "Use All Space" and see what happens. In all previous attempts, Use All Space does not reproduce this bug, all other things equal.
Comment 27 Eric Sandeen 2012-05-03 23:49:39 UTC
and /dev/loop3: :3 (/LiveOS/ext3fs.img) and that's the USB stick no? Hrm. If i look at the bad lvm image: [root@f16s ~]# debugfs /dev/mapper/vg_f16s-lv_root debugfs 1.42 (29-Nov-2011) debugfs: stat <20> Inode: 20 Type: regular Mode: 0644 Flags: 0x80000 Generation: 480441713 Version: 0x00000000:00000001 User: 0 Group: 0 Size: 38453248 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 75104 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x4f9b5918:8a1e4594 -- Fri Apr 27 22:42:32 2012 atime: 0x4f9b5906:670113ec -- Fri Apr 27 22:42:14 2012 mtime: 0x4f9b5906:50da59c4 -- Fri Apr 27 22:42:14 2012 crtime: 0x4f9b574a:19c2a0e0 -- Fri Apr 27 22:34:50 2012 Size of extra inode fields: 28 Extended attributes stored in inode body: selinux = "system_u:object_r:rpm_var_lib_t:s0\000" (35) EXTENTS: (ETB0):67959 there should be more extents there. On the original image: [root@f16s ~]# debugfs /mnt/squashfs/LiveOS/ext3fs.img debugfs 1.42 (29-Nov-2011) debugfs: stat <20> Inode: 20 Type: regular Mode: 0644 Flags: 0x80000 Generation: 480441713 Version: 0x00000000:00000001 User: 0 Group: 0 Size: 38453248 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 75104 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x4f9b5918:8a1e4594 -- Fri Apr 27 22:42:32 2012 atime: 0x4f9b5906:670113ec -- Fri Apr 27 22:42:14 2012 mtime: 0x4f9b5906:50da59c4 -- Fri Apr 27 22:42:14 2012 crtime: 0x4f9b574a:19c2a0e0 -- Fri Apr 27 22:34:50 2012 Size of extra inode fields: 28 Extended attributes stored in inode body: selinux = "system_u:object_r:rpm_var_lib_t:s0\000" (35) EXTENTS: (ETB0):67959, (0):33280, (1):34304, (2):33281, (3-2047):65536-67580, (2048-4095):69632-71679, (4096-5885):110592-112381, (5887-6143):112383-112639, (614 4-8191):159744-161791, (8192-9387):231424-232619 we start out the same, but the extent list is correct - i.e. the extent block at 67959 isn't corrupt. I dd'd that block out of each, and it's completely different on the system vs. the original image: # cmp -l orig.img lv_root_67959.img | wc -l 4085 (I guess we got about 11 "lucky" bytes that matched ;) So if I understand right - at one point, anaconda was doing a mkfs+fsck, and that failed with corruption. In this case it seems to have gotten past that, but now the image which was copied onto the laptop's storage is _also_ corrupt.... For some reason the image on the hard drive is also lacking a journal. Random corruption? And when I look at the USB stick I get IO errors: # cmp -l /dev/vg_f16s/lv_root /mnt/squashfs/LiveOS/ext3fs.img | more cmp: 1073 215 140 1074 17 131 1075 243 233 1083 2 1 1117 70 74 1249 0 10 1401 271 265 1185537 0 200 1185538 0 201 1185544 0 10 1185545 0 111 /mnt/squashfs/LiveOS/ext3fs.img: Input/output error ... not sure what's going on here but IO errors from the install medium are not encouraging ....
Comment 28 Chris Murphy 2012-05-04 04:52:12 UTC
Installation type "Use All Space" = same error. dd /dev/zero'd the first 21G of the target drive and tried again = same error. Poweroff let the computer sit for 2 hours, tried again, "Use All Space" = same error. So it's looking like an intermittant/failing USB stick, is my best guess. Odd that in 5 or 6 separate dd imaging attempt of this stick, that only the first, and most recent, manifest with this bug.
Comment 29 Jesse Keating 2012-07-18 17:46:11 UTC
We can't really fix broken install sources :/ Looks like you figured out the root cause though.