Red Hat Bugzilla – Bug 350101
Anaconda aligns partitions suboptimally for RAID disks
Last modified: 2009-05-04 06:01:57 EDT
Description of problem:
Anaconda tries to align partitions to legacy C/H/S geometry. This is nice for
dual-booting with other operating systems, but Linux doesn't care about C/H/S
geometry, while RAID arrays *do* care about alignment. Anaconda should at the
very least offer a non-default option to optimize partitions for RAID use. EMC
currently recommends that their customers do some expert-mode hacking with fdisk
to partition storage on their SANs, which is inconvenient and prone to error.
Version-Release number of selected component (if applicable):
all released versions
100%, at least on platforms using MSDOS disklabels. Probably somewhat of an
issue on all platforms.
Steps to Reproduce:
1. Install RHEL
2. Run parted -s /dev/sda unit s print
Number Start End Size Type File system Flags
1 63s 208844s 208782s primary ext3 boot
2 208845s 312287534s 312078690s primary lvm
3 312287535s 312496379s 208845s primary ext3
Keeping the current behavior by default is okay, but an enterprise OS should
give the user a convenient option to align partitions for RAID storage, which
typically has 32768 or 65536 byte stripes.
It might also be nice to use this info to set the ext3 stripe size, when possible.
It would be quite intuitive if I could add '--align 64k' to a 'part' command in
a kickstart file and have it do the right thing.
Playing around with things a bit, it seems that fdisk and sfdisk strongly
encourage C/H/S geometry, while parted is quite happy with arbitrary resolution.
The big catch is that parted uses *inclusive* arithmetic, and arguably
incorrectly, for sizing the end of a partition. For example:
parted -s /dev/sda mkpart primary 1 2
Will create a partition whose first sector is at precisely 1 MiB and whose last
sector is at precisely 2 MiB. If you want to do it right, you need to set "unit
s" in parted and subtract 1 from the end address. For example:
parted -s /dev/sda unit s mkpart primary 128 204927
Will create a partition that is precisely 100 MiB in size, 64 KiB aligned, just
after the MBR and partition table, assuming 512 byte sectors.
It should be noted that some SANs present 2048 byte sectors to the OS, and 4096
byte sectors will soon be standard, so the sector size and precise sector count
should be read explicitly.
Since my python sucks, but I needed arbitrary precision integer math to handle
large volumes, I developed a shell front-end and python back-end to create
partitions of specified sizes (plus using the rest in a final partition, so
specifying no sizes uses the whole disk) with the specified alignment. I have
used these successfully in %pre on a test system. Please consider these
examples only. This was my first complete python script.
# front-end to partalign.py
if [ $# -lt 2 ]; then
echo 'partalign.sh device align_kiB [part1_MiB...]'
dd if=/dev/zero of=$DEVICE bs=4k count=4k
parted -s $DEVICE mklabel msdos
SECTOR_B=$(parted -s $DEVICE unit B print | grep -F Sector | cut -d ' ' -f 4 |
cut -d B -f 1)
TOTAL_S=$(parted -s $DEVICE unit s print | grep -F Disk | cut -d ' ' -f 3 | cut
-d s -f 1)
python partalign.py $SECTOR_B $TOTAL_S "$@"
def dopart(dev, part, start, end):
rc = os.spawnvp(os.P_WAIT, "parted", ("-s", dev, "unit", "s", "mkpart",
part, str(start), str(end)))
if rc != 0:
if len(sys.argv) < 5:
print("Not enough arguments")
sector_b = long(sys.argv)
total_s = long(sys.argv)
device = sys.argv
align_kb = long(sys.argv)
if align_kb <= 0:
elif align_kb > 1024:
print("Alignment too large")
align_s = (align_kb * 1024) / sector_b
# truncate the slack, and remember that parted uses inclusive addressing
last_s = (total_s - (total_s % align_s)) - 1
# leave plenty of room for mbr, partition table, etc.
reserve_kb = 64
reserve_s = (reserve_kb * 1024) / sector_b
while reserve_s < align_s:
reserve_s *= 2
start_s = reserve_s
parttype = "primary"
count = 0
args = range(len(sys.argv))
for i in args:
if i < 5:
count += 1
# do we need an extended partition?
if count == 4:
# use the full physical space, so the slack can be used later
dopart(device, "extended", start_s, total_s - 1)
count += 1
start_s += reserve_s
parttype = "logical"
# linux only supports 15 partitions per disk
elif count == 15:
if sys.argv[i] <= 0:
part_mb = long(sys.argv[i])
part_s = (part_mb * 1024 * 1024) / sector_b
# end_s is inclusive
end_s = start_s + part_s - 1
if (end_s > last_s):
dopart(device, parttype, start_s, end_s)
start_s += part_s
# if there's any room left, use it
if start_s <= last_s:
dopart(device, parttype, start_s, last_s)
Based on the date this bug was created, it appears to have been reported
during the development of Fedora 8. In order to refocus our efforts as
a project we are changing the version of this bug to '8'.
If this bug still exists in rawhide, please change the version back to
(If you're unable to change the bug's version, add a comment to the bug
and someone will change it for you.)
Thanks for your help and we apologize for the interruption.
The process we're following is outlined here:
We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.
Does this problem remain true for Fedora 9?
Does this problem remain true for RHEL-5.3?