Bug 350101 - Anaconda aligns partitions suboptimally for RAID disks
Summary: Anaconda aligns partitions suboptimally for RAID disks
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: anaconda
Version: 8
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Joel Andres Granados
QA Contact:
URL:
Whiteboard: bzcl34nup
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-10-24 08:06 UTC by Chris Snook
Modified: 2009-05-04 10:01 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-08-06 20:22:10 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Chris Snook 2007-10-24 08:06:47 UTC
Description of problem:
Anaconda tries to align partitions to legacy C/H/S geometry.  This is nice for
dual-booting with other operating systems, but Linux doesn't care about C/H/S
geometry, while RAID arrays *do* care about alignment.  Anaconda should at the
very least offer a non-default option to optimize partitions for RAID use.  EMC
currently recommends that their customers do some expert-mode hacking with fdisk
to partition storage on their SANs, which is inconvenient and prone to error.

Version-Release number of selected component (if applicable):
all released versions

How reproducible:
100%, at least on platforms using MSDOS disklabels.  Probably somewhat of an
issue on all platforms.

Steps to Reproduce:
1. Install RHEL
2. Run parted -s /dev/sda unit s print
  
Actual results:
Number  Start       End         Size        Type     File system  Flags
 1      63s         208844s     208782s     primary  ext3         boot
 2      208845s     312287534s  312078690s  primary               lvm
 3      312287535s  312496379s  208845s     primary  ext3

Expected results:
Keeping the current behavior by default is okay, but an enterprise OS should
give the user a convenient option to align partitions for RAID storage, which
typically has 32768 or 65536 byte stripes.

Additional info:
It might also be nice to use this info to set the ext3 stripe size, when possible.

It would be quite intuitive if I could add '--align 64k' to a 'part' command in
a kickstart file and have it do the right thing.

Comment 1 Chris Snook 2007-10-25 18:30:08 UTC
Playing around with things a bit, it seems that fdisk and sfdisk strongly
encourage C/H/S geometry, while parted is quite happy with arbitrary resolution.
 The big catch is that parted uses *inclusive* arithmetic, and arguably
incorrectly, for sizing the end of a partition.  For example:

parted -s /dev/sda mkpart primary 1 2

Will create a partition whose first sector is at precisely 1 MiB and whose last
sector is at precisely 2 MiB.  If you want to do it right, you need to set "unit
s" in parted and subtract 1 from the end address.  For example:

parted -s /dev/sda unit s mkpart primary 128 204927

Will create a partition that is precisely 100 MiB in size, 64 KiB aligned, just
after the MBR and partition table, assuming 512 byte sectors.

It should be noted that some SANs present 2048 byte sectors to the OS, and 4096
byte sectors will soon be standard, so the sector size and precise sector count
should be read explicitly.

Since my python sucks, but I needed arbitrary precision integer math to handle
large volumes, I developed a shell front-end and python back-end to create
partitions of specified sizes (plus using the rest in a final partition, so
specifying no sizes uses the whole disk) with the specified alignment.  I have
used these successfully in %pre on a test system.  Please consider these
examples only.  This was my first complete python script.

#!/bin/bash
# partalign.sh
# front-end to partalign.py

if [ $# -lt 2 ]; then
        echo 'usage:'
        echo 'partalign.sh device align_kiB [part1_MiB...]'
        exit 1
fi

DEVICE=$1

dd if=/dev/zero of=$DEVICE bs=4k count=4k
parted -s $DEVICE mklabel msdos

SECTOR_B=$(parted -s $DEVICE unit B print | grep -F Sector | cut -d ' ' -f 4 |
cut -d B -f 1)
TOTAL_S=$(parted -s $DEVICE unit s print | grep -F Disk | cut -d ' ' -f 3 | cut
-d s -f 1)

python partalign.py $SECTOR_B $TOTAL_S "$@"

#!/usr/bin/python
# partalign.py

import os
import sys

def dopart(dev, part, start, end):
        rc = os.spawnvp(os.P_WAIT, "parted", ("-s", dev, "unit", "s", "mkpart",
part, str(start), str(end)))
        if rc != 0:
                sys.exit(rc)

if len(sys.argv) < 5:
        print("Not enough arguments")
        sys.exit(1)

sector_b = long(sys.argv[1])
total_s = long(sys.argv[2])
device = sys.argv[3]
align_kb = long(sys.argv[4])

if align_kb <= 0:
        print("Invalid alignment")
        sys.exit(2)
elif align_kb > 1024:
        print("Alignment too large")
        sys.exit(3)

align_s = (align_kb * 1024) / sector_b

# truncate the slack, and remember that parted uses inclusive addressing
last_s = (total_s - (total_s % align_s)) - 1

# leave plenty of room for mbr, partition table, etc.
reserve_kb = 64

reserve_s = (reserve_kb * 1024) / sector_b
while reserve_s < align_s:
        reserve_s *= 2

start_s = reserve_s

parttype = "primary"
count = 0
args = range(len(sys.argv))
for i in args:
        if i < 5:
                continue
        count += 1
# do we need an extended partition?
        if count == 4:
# use the full physical space, so the slack can be used later
                dopart(device, "extended", start_s, total_s - 1)
                count += 1
                start_s += reserve_s
                parttype = "logical"
                continue
# linux only supports 15 partitions per disk
        elif count == 15:
                break
        if sys.argv[i] <= 0:
                break
        part_mb = long(sys.argv[i])
        part_s = (part_mb * 1024 * 1024) / sector_b
# end_s is inclusive
        end_s = start_s + part_s - 1
        if (end_s > last_s):
                break
        dopart(device, parttype, start_s, end_s)
        start_s += part_s

# if there's any room left, use it
if start_s <= last_s:
        dopart(device, parttype, start_s, last_s)


Comment 2 Bug Zapper 2008-04-04 14:15:46 UTC
Based on the date this bug was created, it appears to have been reported
during the development of Fedora 8. In order to refocus our efforts as
a project we are changing the version of this bug to '8'.

If this bug still exists in rawhide, please change the version back to
rawhide.
(If you're unable to change the bug's version, add a comment to the bug
and someone will change it for you.)

Thanks for your help and we apologize for the interruption.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

Comment 3 Andy Lindeberg 2008-06-03 20:14:37 UTC
Does this problem remain true for Fedora 9?

Comment 4 Kirby Zhou 2009-05-04 10:01:57 UTC
Does this problem remain true for RHEL-5.3?


Note You need to log in before you can comment on or make changes to this bug.