Red Hat Bugzilla – Bug 1252260
[RFE] Partition Ceph OSD disks automatically during installation
Last modified: 2016-06-17 23:44:53 EDT
Description of problem:
Ceph storage role with hiera customization may require extensive disk partitioning including:
1. Create partitions on disks prior to installation
2. Label disks as GPT
3. Delete partitions + MBR to wipe Ceph fsid prior to reinstall
4. Delete LVMs pvs, vgs, lvs from disk partitions
Currently ironic works mostly with disk images so these processes must be completed manually pre/post installtion.
Version-Release number of selected component (if applicable):
Related to https://bugzilla.redhat.com/show_bug.cgi?id=1074769 but this bug is specific to adding automatic partitioning and clean up to Ceph after advanced partitioning features are added to ironic.
Hi! After giving it one more thought, I don't think it is Ironic which should do the partitioning. It's not a preferred thing upstream as well. Ironic is responsible for writing your OS, making swap and configdrive for you. Once OS is booted, anything else can be (and IIRC may be already) used for partitioning the remaining space.
Mike, who can be better person to talk about post-deploy customization?
That sounds reasonable.
The intention of the bug is to avoid a workflow where we must deploy the ceph nodes to prepare their disks for customization and then redeploy them.
Whether through ironic, os-collect-config, or cloud-init, we need to prepare disks for ceph customization with only one overcloud deployment.
moving this to rhel-osp-installer. this is probably an rfe for os-disk-config
solving this properly seems outside the scope of y2.
the short term answer here is to use custom firstboot scripts to partition the secondary disks as needed.
We used to do automatic partitioning of the data disks but I think we can now do partitioning, from the puppet module, of the journal disks too, as described in: https://bugzilla.redhat.com/show_bug.cgi?id=1256103#c12
In which case, it seems to me the only 'functionality' missing in ironic is to actually erase the disks (at least the first bytes where the MBR is) so that automatic partitioning can be done across re-deployments.
Jacob, does this sound correct to you too?
Giulio & James,
Yes that sounds correct. The installer should wipe existing partitions on the specified journal disks, create partitions where needed, and make the appropriate file system links between the journals and underlying partitions/devices.
I disagree that we should use first boot scripts to partition the journals. Ceph should be tightly integrated with OSPd. Best practices for configuring the journal disks should be embedded in the product. The journal partition should be no different than PGs, replicas, etc in this respect: preconfigured with sensible defaults but customizable via hiera.
Jacob, I think firstboot was more of a temporary measure; we can do partitioning of the journal disks with hiera as per https://bugzilla.redhat.com/show_bug.cgi?id=1256103#c12
We can use firstboot (again as a temporary measure maybe) to erase the disks too, before the partitioning happens; in which case I think the scope of this RFE is limited to getting Ironic able to erase the disks.
(In reply to Giulio Fidente from comment #8)
> We used to do automatic partitioning of the data disks but I think we can
> now do partitioning, from the puppet module, of the journal disks too, as
> described in: https://bugzilla.redhat.com/show_bug.cgi?id=1256103#c12
> In which case, it seems to me the only 'functionality' missing in ironic is
> to actually erase the disks (at least the first bytes where the MBR is) so
> that automatic partitioning can be done across re-deployments.
Ironic already supports disk erasing when used with the IPA ramdisk . The ramdisk also can be customized to include more advanced tasks such as update firmware/bios, checking the hardware consistency etc... This is how rackspace uses it on OnMetal .
Luca, this is great; it seems we just need to migrate to the new IPA ramdisk then.
Gulio, how much work/effort is missing in order to resolve this bug then? Could you summarize please?
This is currently *also* blocked by bz #1256103
The puppet module does the partitioning but it expects the disks to have a GPT label, and not to be completely empty. I think we can instruct the IPA ramdisk to do so though? Ben?
I see this bug is targeted at OSP 7, which to me means IPA is out of the question. I expect we'll be moving to that for OSP 8, but it would be a huge change at this point in the release to switch discovery and provisioning images like that.
Can we not just do either a ceph-disk zap or sgdisk -Zog on all of the configured OSD disks before doing the Ceph install? See the cleanup script one of the sales folks wrote: https://github.com/dcostakos/osp-director-helpers/blob/master/scripts/zap_ceph_disk.sh#L13
We wouldn't need anything quite so complex because we'll be running in a more consistent environment. I'm pretty sure we should always have ceph-disk zap available on Ceph nodes, for example.
FWIW, I'm also not clear that this would even be reasonable to do in the deploy ramdisk. I'm pretty sure that doesn't have access to all of the hiera data to know which disks would need to be wiped. We might be able to add that, but I also don't see any reason this _has_ to be done at deploy time, since it's dealing exclusively with secondary storage and not the root partition. Once the OS is up and running we just need a step before the Ceph deploy to wipe the disks.
I also don't know if this is actually blocked by 1256103. My understanding is that at this point that is more a doc bug and auto-creating journal partitions works fine as long as you use the correct syntax for it. If we address this bug then from a functional perspective Ceph deploy will be capable of taking care of their own partitioning.
Ben thanks. The remaining issue with 1256103 is that the journal partitions are created *only* if the disk already has an empty GPT label. Not if it doesn't have any partition table (or uses MBR); that is what a disk erasure would probably leave us with.
Using a deployment script seems fine as long as we make sure we don't re-erase the disks in case of node reboot or stack update. Would like to investigate it more.
Here is an example firstboot script to automate creation of GPT labels.
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
I've reassigned this to Dmitry Tantsur to ensure it gets proper attention.
It's a common agreement in the Ironic community that we don't touch user instances, particularly don't partition the user data.
There are 2 ways out: use whole disk images (i.e. images with all the partitioning built in) or partition the disk on the 1st boot. The former option won't allow partitioning several physical devices, so I'd go with the latter. Which is probably documentation-only, as far as I know. Moving the hot potato to Steve to confirm.
For Ceph, we'll be creating both the GPT label and the partitions from puppet-ceph (most probably via ceph-disk). This happens already for the Ceph data disks, just not yet for the journal disks.
*** This bug has been marked as a duplicate of bug 1256103 ***