Bug 573628

Summary:

anaconda failure with Advanced Format disks (4K internal / 512 external)

Product:

[Fedora] Fedora

Reporter:

Steve Perkins <steve.perkins>

Component:

anaconda

Assignee:

Hans de Goede <hdegoede>

Status:

CLOSED DUPLICATE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

medium

Docs Contact:

Priority:

low

Version:

rawhide

CC:

esandeen, jbrier, jonathan, jordan_hargrave, linux-bugs, matt_domsch, meyering, mkp, steve.perkins, stuart_hayes, vanmeeuwen+fedora

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

563218

Environment:

Last Closed:

2010-03-23 17:31:30 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

574220, 586144

Bug Blocks:

Attachments:

Description	Flags
mbr after parted process	none
mbr after fdisk, w	none
mbr after fdisk and parted process	none

Description Steve Perkins 2010-03-15 12:00:38 UTC

+++ This bug was initially created as a clone of Bug #563218 +++

New bug to track the anaconda failure.

+++ This bug was initially created as a clone of Bug #553518 +++

--- Additional comment from steve.perkins on 2010-03-11 12:37:03 EST ---

Created an attachment (id=399402)
Debug dump from FC13 Alpha failed install

I have tried FC13 Alpha with a WD10-EARS Advanced Format (4K sector) drive and the default installation (anaconda) fails with an exception when creating the file systems on a blank disk.

The attached debug dump shows that Anaconda is creating the first partition on sector 63 which will be misaligned on a 4K sector drive. There seems to be some attempt to move the start of the partition from sector 63 to 2048 but verifying with fdisk for example shows that /dev/sda1 starts on sector 63.

I suspect that the issue may be at the mkfs stage where mke2fs 1.41.10 will notice the poorly aligned data and issue a yes/no question whichh the installation scripts can't cope with. I believe that mke2fs 1.41.11 will add a -F option to override the question but the real answer is to correctly align the partitions so that the drives performa at their best performance. 

Please let me know if you need any more details.

Steve

--- Additional comment from hdegoede on 2010-03-12 03:32:18 EST ---

Hi Steve,

Thanks for testing this!

Looking at the debuglog we never get around to creating the ext2 fs. We start
by writing an empty label to the disk and that is where we fail.

Could you please do the following:

1) start F-13 alpha installer
2) Switch to tty2 (ctrl + alt + F2)
3) Run python, and then at the python prompt type:
import parted
dev = parted.Device(path="/dev/sda")
disk = parted.freshDisk(device=dev, ty="msdos")
print disk
disk.commitToDevice()
disk.commitToOS()
exit()


###

I think the last commit call will fail. If that is the case there are 2 possible
causes:
1) The kernel does not grok partition tables on 4k disks
2) The device is somehow busy

Either way, please copy and paste the output of the above commands here.
Also it would be great if you could also do:

dd if=/dev/zero of=/dev/sda bs=512 count=1
fdisk /dev/sda
w

And see if fdisk is capable of writing an empty label ?

--- Additional comment from hdegoede on 2010-03-12 03:49:55 EST ---

p.s.

Could you do:
dd if=/dev/sda of=mbr.bin bs=512 count=1 after running the parted code,
and attach mbr.bin here ?

And then after the dd to fill with 0, fdisk, w attempt do:
dd if=/dev/sda of=mbr.bin bs=512 count=1 after running the parted code,
and attach this second mbr.bin here too?

Thanks!

Comment 1 Steve Perkins 2010-03-15 12:23:26 UTC

Created attachment 400198 [details]
mbr after parted process

Comment 2 Steve Perkins 2010-03-15 12:24:24 UTC

Created attachment 400200 [details]
mbr after fdisk, w

Comment 3 Steve Perkins 2010-03-15 12:25:30 UTC

Created attachment 400202 [details]
mbr after fdisk and parted process

Comment 4 Hans de Goede 2010-03-15 12:42:08 UTC

Hi,

Thanks for running these tests, did all test run successfully or did you get some errors in certain cases ?

Regards,

Hans

Comment 5 Steve Perkins 2010-03-15 13:00:49 UTC

Hi Hans,

No errors reported. All parted functions accpted or passsed true.

I've just noticed that the last entry I tried to ad to bugzilla didn't appear above - sorry about that, I'll try again.

[root@localhost liveuser]# python
Python 2.6.4 (r264:75706, Feb 11 2010, 21:00:07) 
[GCC 4.4.3 20100208 (Red Hat 4.4.3-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import parted
>>> dev = parted.Device(path="/dev/sda")
>>> disk = parted.freshDisk(device=dev, ty="msdos")
>>> print disk
parted.Disk instance --
  type: msdos  primaryPartitionCount: 0
  lastPartitionNumber: -1  maxPrimaryPartitionCount: 4
  partitions: []
  device: <parted.device.Device object at 0x91a4b6c>
  PedDisk: <_ped.Disk object at 0x90a1d2c>
>>> disk.commitToDevice()
True
>>> disk.commitToOS()
True
>>> exit()
[root@localhost liveuser]# 

/* Attached mbr1.bin */

[root@localhost liveuser]# fdisk /dev/sda
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x06806471.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
         switch off the mode (command 'c') and change display units to
         sectors (command 'u').

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@localhost liveuser]# dd if=/dev/sda of=mbr2.bin bs=512 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.00120451 s, 425 kB/s
[root@localhost liveuser]# 

Attached mbr_fdisk_w.bin

I ran the parted proceedure again and the mbr after this is mbr_fdisk_w_parted.bin.

The mkfs.ext4 output shown below shows the extra question which may upset automated installs. This was an issue in Ubuntu Lucid alpha 3 but apparently e2fsprogs 1.41.11 will have a -F option to allow the question to be overriden. Ideally the drive should be aligned correctly but at least the install could continue when the user absolutely insists on misaligned data (existing partiton?) 

[root@localhost liveuser]# mkfs.ext4 /dev/sda1
mke2fs 1.41.10 (10-Feb-2009)
/dev/sda1 alignment is offset by 512 bytes.
This may result in very poor performance, (re)-partitioning suggested.
Proceed anyway? (y,n)

Comment 6 Hans de Goede 2010-03-15 13:52:29 UTC

Steve,

Thanks for all the info. Given that the manual parted runs succeed I don't understand what the problem is in the original backtrace. As I hope to have access to one of the involved disks myself soon, I suggest we wait a bit, as that will make things much easier.

Regards,

Hans

p.s.

Thanks for the mkfs going interactive warning, that was very useful I was just working on 2 bugs which are caused by this and I already had a hunch this was going on, but your comment confirms this! See bug 573247, bug 573500 and bug 573643.

Comment 7 Steve Perkins 2010-03-15 14:06:26 UTC

Hi Hans,

FYI - the relevant Ubuntu bug is at https://bugs.launchpad.net/ubuntu/+source/ubiquity/+bug/530071 

Regards
Steve

Comment 8 Eric Sandeen 2010-03-15 15:16:54 UTC

As far as the mkfs problem goes:

%changelog
* Mon Mar 01 2010 Eric Sandeen <sandeen> 1.41.10-5
- Don't ask for confirmation of misaligned mkfs with -F (#569021)

from my upstream commit:
http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=commitdiff;h=ecced2c3586fed83750dad2b82780c9d201f6973

So a mkfs.extN -F won't fail anymore ...

What's in F13 now lets -F override it, upstream just drops the question in all cases.

Is anaconda calling it with -F?  I should probably just drop the question altogether for F13.

As for the misalignment, that sounds like something that should get sorted out, but since this drive in particular misreports sector size that complicates it.

Performance will be horrible though if we are not aligned to 4k on this drive.

-Eric

Comment 9 Steve Perkins 2010-03-15 16:01:23 UTC

(In reply to comment #8)
Hi Eric,
Thanks for the update.
> As far as the mkfs problem goes:
> %changelog
> * Mon Mar 01 2010 Eric Sandeen <sandeen> 1.41.10-5
> - Don't ask for confirmation of misaligned mkfs with -F (#569021)
> from my upstream commit:
> http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=commitdiff;h=ecced2c3586fed83750dad2b82780c9d201f6973
> So a mkfs.extN -F won't fail anymore ...
> What's in F13 now lets -F override it, upstream just drops the question in all
> cases.
> Is anaconda calling it with -F?  I should probably just drop the question
> altogether for F13.
> As for the misalignment, that sounds like something that should get sorted out,
> but since this drive in particular misreports sector size that complicates it.

I wouldn't say that the drive "misreports" sector size. Internally it uses 4K sectors and externally works in 512B logical sectors but this is a valid transform which is reported in the Identify Device data and reflected in sysfs.

> Performance will be horrible though if we are not aligned to 4k on this drive.

Write performance will be affected if the drives are misaligned depending on the pattern of data access.

> -Eric    
Regards

Steve

Comment 10 Eric Sandeen 2010-03-15 16:17:41 UTC

Ok, it was my understanding that the drive made no mention of the 4k physical sector, and reported physical and logical both as 512 in the Identify Device data, and reflected in sysfs.

If I'm misinformed, I apologize, I don't have one of these drives for testing.

If we are not 4k aligned, and default to 4k blocks starting on a non-4k boundary, then performance will suffer, no?


Thanks,
-Eric

Comment 11 Steve Perkins 2010-03-15 16:33:39 UTC

Hi Eric
(In reply to comment #10)
> Ok, it was my understanding that the drive made no mention of the 4k physical
> sector, and reported physical and logical both as 512 in the Identify Device
> data, and reflected in sysfs.
Apart from some early drives, the reporting is done in Identify Device to reflect the number of logical sectors per physical sectors and the offset for LBA 0 alignment. Normally this would be zero but the standards allow for other values.
> If I'm misinformed, I apologize, I don't have one of these drives for testing.
It is always a challenge working without the real hardware! I'll see if I can work something out. 
> If we are not 4k aligned, and default to 4k blocks starting on a non-4k
> boundary, then performance will suffer, no?
Indeed. Chunks of 4K data (as in most file systems these days) are good but if they don't match the 4K boundaries of the internal sectors then you will get a read/modify/write cycle which is to be avoided. Random small block writes are going to give poor latency numbers but some misaligned data patterns are not too tragic. For example large block writes (video data) or large streams of sequential data - even 512 byte blocks - will be coalesced by drive cache so only the first and last accesses will require the read/modify/write cycles.  
> Thanks,
> -Eric    
Regards

Steve

Comment 12 Martin K. Petersen 2010-03-16 04:03:05 UTC

I went to the lab and did some tests with Fedora 13 Alpha on a variety of drives tonight.

I tried both a regular 512/512 drive, a 512/4k that was 0-aligned, as well as one that was 1-aligned.  In all cases anaconda started the first partition on sector 63. My kernel code is reporting the correct values in all cases, and fdisk -cl lists the correct logical/physical and min/opt I/O sizes.  So something is wrong in libparted/anaconda land...

Comment 13 Eric Sandeen 2010-03-16 05:09:01 UTC

Bummer ... but thanks for checking, Martin, and thanks for the update... maybe a bug explicitly for that problem is in order.

Comment 14 Hans de Goede 2010-03-16 06:20:12 UTC

Martin,

Thanks for testing! Was this with F-13 alpha or with F-12 ? F-13 alpha should align to 1 MiB in the 512/512 case at least. And last time I checked it did.

Regards,

Hans

Comment 15 Steve Perkins 2010-03-16 12:49:15 UTC

I've checked out the FC13 beta and this behaves in the same way as the Alpha. Default install on a 512/512 drive starts partition 1 at sector 63 partition 2 at sector 1024000. So partition 2 is very well aligned to 1 MB (and more!) but not partition 1. 

On a 4K/512 drive, we still get the exception we saw in FC13 alpha.

I agree with Martin, the kernel code is working well reporting drive information as do the fdisk, parted, mkfs utilities etc. 

I think it would be useful to consider defaulting to 1MB boundaries in any case if drive appears to be "traditional" 512/512. I have seen the case where external USB adapters do not pass on the drive's alignment / offset data due to the cut-down SCSI command set used by USB to SATA adapters. Identify Drive works as a SATA pass through command but the kernel works at the SCSI layer and does not get the data required to set up sysfs alignment parameters.

Cheers

Steve

Comment 16 Martin K. Petersen 2010-03-16 15:33:24 UTC

This was FC13 Alpha Live x86_64 image on a USB stick.  Partition 2 was aligned at 1024000 for me too.

And definitely a yes to aligning on a 1MB boundary by default.  This was what I expected to happen given our lengthy discussions on this topic.

I agree it may be worthwhile for our USB/FireWire subsystems to attempt IDENTIFY DEVICE via ATA passthrough going forward instead of relying on READ CAPACITY(16) being implemented (correctly) by the USB-ATA bridge's SATL. But as usual the trick is finding a suitable heuristic that won't cause existing devices to crap out...

Comment 17 Hans de Goede 2010-03-16 15:48:22 UTC

(In reply to comment #16)
> This was FC13 Alpha Live x86_64 image on a USB stick.  Partition 2 was aligned
> at 1024000 for me too.
> 
> And definitely a yes to aligning on a 1MB boundary by default.  This was what I
> expected to happen given our lengthy discussions on this topic.
> 

Yes, we should be aligning to 1 MiB by default, what you are seeing is a bug. I'll try to reproduce this later today. And get back to you on this.

Comment 18 Hans de Goede 2010-03-16 21:37:04 UTC

Ok, I can reproduce the first partition starting at sector 63 bug, and I have a patch for this. I've file a separate bug for tracking this: bug 574220. Please post further comments on this issue there.

I've made an updates.img with the fix in available:
http://people.fedoraproject.org/~jwrdegoede/updates-574220.img

To use this add:
updates=http://people.fedoraproject.org/~jwrdegoede/updates-574220.img

To the syslinux cmdline when starting anaconda. Note that using an
updates.img is not possible with a livecd install.

Comment 19 Martin K. Petersen 2010-03-17 03:27:17 UTC

I did three installs tonight with the update image in place. In all cases I chose the "Use All Space" option.

512/512: First partition at 2048, i.e. 1MB. Second partition at 1026048, i.e. ~128 MB.  Both partitions aligned correctly.

512/4096, 0-aligned: First partition at 2048, i.e. 1 MB. Second partition at 1026048, i.e. ~128 MB.  Both partitions aligned correctly.

512/4096, 1-aligned: First partition at 63, i.e. 32.5KB. Second partition at 1024063. (1024063*512) % 4096 = 3584.  Both partitions aligned correctly.

I'm still not sure why Karel insisted on keeping 32.5KB if a device reports 1-alignment.  I'd prefer 1MB+3584.  But at least things are working now...

Comment 20 Hans de Goede 2010-03-23 17:31:30 UTC

Ok, as discussed by mail, this bug is fixed in the case of new partitions by fixing bug 574220. When using pre-existing unaligned partitions, you may hit the
mke2fs goes interactive to confirm creating an unaligned FS bug, that has been fixed by a recent e2fsprogs update. So as both issues are fixed, I'm closing this.

*** This bug has been marked as a duplicate of bug 574220 ***