Bug 1252260 - [RFE] Partition Ceph OSD disks automatically during installation
Summary: [RFE] Partition Ceph OSD disks automatically during installation
Keywords:
Status: CLOSED DUPLICATE of bug 1256103
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: 10.0 (Newton)
Assignee: Steven Hardy
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On: 1256103
Blocks: 1267690 1299906
TreeView+ depends on / blocked
 
Reported: 2015-08-11 05:44 UTC by jliberma@redhat.com
Modified: 2016-06-18 03:44 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
: 1267690 (view as bug list)
Environment:
Last Closed: 2016-05-04 10:33:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description jliberma@redhat.com 2015-08-11 05:44:31 UTC
Description of problem:

Ceph storage role with hiera customization may require extensive disk partitioning including:
1. Create partitions on disks prior to installation
2. Label disks as GPT
3. Delete partitions + MBR to wipe Ceph fsid prior to reinstall
4. Delete LVMs pvs, vgs, lvs from disk partitions

Currently ironic works mostly with disk images so these processes must be completed manually pre/post installtion.

Version-Release number of selected component (if applicable):

python-rdomanager-oscplugin-0.0.8-44.el7ost.noarch

Additional info:

Related to https://bugzilla.redhat.com/show_bug.cgi?id=1074769 but this bug is specific to adding automatic partitioning and clean up to Ceph after advanced partitioning features are added to ironic.

Comment 4 Dmitry Tantsur 2015-10-08 15:51:10 UTC
Hi! After giving it one more thought, I don't think it is Ironic which should do the partitioning. It's not a preferred thing upstream as well. Ironic is responsible for writing your OS, making swap and configdrive for you. Once OS is booted, anything else can be (and IIRC may be already) used for partitioning the remaining space.
Mike, who can be better person to talk about post-deploy customization?

Comment 5 jliberma@redhat.com 2015-10-08 17:40:49 UTC
That sounds reasonable. 

The intention of the bug is to avoid a workflow where we must deploy the ceph nodes to prepare their disks for customization and then redeploy them.

Whether through ironic, os-collect-config, or cloud-init, we need to prepare disks for ceph customization with only one overcloud deployment.

Comment 6 Mike Burns 2015-10-14 12:43:17 UTC
moving this to rhel-osp-installer.  this is probably an rfe for os-disk-config

Comment 7 James Slagle 2015-10-14 12:55:36 UTC
solving this properly seems outside the scope of y2.

the short term answer here is to use custom firstboot scripts to partition the secondary disks as needed.

Comment 8 Giulio Fidente 2015-10-14 13:09:01 UTC
We used to do automatic partitioning of the data disks but I think we can now do partitioning, from the puppet module, of the journal disks too, as described in: https://bugzilla.redhat.com/show_bug.cgi?id=1256103#c12

In which case, it seems to me the only 'functionality' missing in ironic is to actually erase the disks (at least the first bytes where the MBR is) so that automatic partitioning can be done across re-deployments.

Jacob, does this sound correct to you too?

Comment 9 jliberma@redhat.com 2015-10-14 14:59:28 UTC
Giulio & James,

Yes that sounds correct. The installer should wipe existing partitions on the specified journal disks, create partitions where needed, and make the appropriate file system links between the journals and underlying partitions/devices.

I disagree that we should use first boot scripts to partition the journals. Ceph should be tightly integrated with OSPd. Best practices for configuring the journal disks should be embedded in the product. The journal partition should be no different than PGs, replicas, etc in this respect: preconfigured with sensible defaults but customizable via hiera.

Thanks, Jacob

Comment 10 Giulio Fidente 2015-10-14 17:15:46 UTC
Jacob, I think firstboot was more of a temporary measure; we can do partitioning of the journal disks with hiera as per https://bugzilla.redhat.com/show_bug.cgi?id=1256103#c12

We can use firstboot (again as a temporary measure maybe) to erase the disks too, before the partitioning happens; in which case I think the scope of this RFE is limited to getting Ironic able to erase the disks.

Comment 11 Lucas Alvares Gomes 2015-10-16 10:55:33 UTC
Hi,

(In reply to Giulio Fidente from comment #8)
> We used to do automatic partitioning of the data disks but I think we can
> now do partitioning, from the puppet module, of the journal disks too, as
> described in: https://bugzilla.redhat.com/show_bug.cgi?id=1256103#c12
> 
> In which case, it seems to me the only 'functionality' missing in ironic is
> to actually erase the disks (at least the first bytes where the MBR is) so
> that automatic partitioning can be done across re-deployments.

Ironic already supports disk erasing when used with the IPA ramdisk [1][2]. The ramdisk also can be customized to include more advanced tasks such as update firmware/bios, checking the hardware consistency etc... This is how rackspace uses it on OnMetal [3].

[1] https://github.com/openstack/ironic-python-agent/blob/ef57379342ec44180b26c33a2a3aa59391b8e627/ironic_python_agent/hardware.py#L301-L309

[2] https://github.com/openstack/ironic-python-agent/blob/ef57379342ec44180b26c33a2a3aa59391b8e627/ironic_python_agent/hardware.py#L233-L251

[3] https://github.com/rackerlabs/onmetal-ironic-hardware-manager/blob/b7aaa3cb2f99dcb5b4f9d49ef299111ff4dbebe7/onmetal_ironic_hardware_manager/__init__.py#L83-L145

Comment 12 Giulio Fidente 2015-10-16 11:08:53 UTC
Luca, this is great; it seems we just need to migrate to the new IPA ramdisk then.

Comment 13 Jaromir Coufal 2015-10-20 12:07:06 UTC
Gulio, how much work/effort is missing in order to resolve this bug then? Could you summarize please?

Comment 14 Giulio Fidente 2015-10-22 11:50:36 UTC
This is currently *also* blocked by bz #1256103

The puppet module does the partitioning but it expects the disks to have a GPT label, and not to be completely empty. I think we can instruct the IPA ramdisk to do so though? Ben?

Comment 15 Ben Nemec 2015-10-22 15:03:25 UTC
I see this bug is targeted at OSP 7, which to me means IPA is out of the question. I expect we'll be moving to that for OSP 8, but it would be a huge change at this point in the release to switch discovery and provisioning images like that.

Can we not just do either a ceph-disk zap or sgdisk -Zog on all of the configured OSD disks before doing the Ceph install? See the cleanup script one of the sales folks wrote: https://github.com/dcostakos/osp-director-helpers/blob/master/scripts/zap_ceph_disk.sh#L13

We wouldn't need anything quite so complex because we'll be running in a more consistent environment. I'm pretty sure we should always have ceph-disk zap available on Ceph nodes, for example.

FWIW, I'm also not clear that this would even be reasonable to do in the deploy ramdisk. I'm pretty sure that doesn't have access to all of the hiera data to know which disks would need to be wiped. We might be able to add that, but I also don't see any reason this _has_ to be done at deploy time, since it's dealing exclusively with secondary storage and not the root partition. Once the OS is up and running we just need a step before the Ceph deploy to wipe the disks.

I also don't know if this is actually blocked by 1256103.  My understanding is that at this point that is more a doc bug and auto-creating journal partitions works fine as long as you use the correct syntax for it.  If we address this bug then from a functional perspective Ceph deploy will be capable of taking care of their own partitioning.

Comment 16 Giulio Fidente 2015-10-22 15:16:14 UTC
Ben thanks. The remaining issue with 1256103 is that the journal partitions are created *only* if the disk already has an empty GPT label. Not if it doesn't have any partition table (or uses MBR); that is what a disk erasure would probably leave us with.

Using a deployment script seems fine as long as we make sure we don't re-erase the disks in case of node reboot or stack update. Would like to investigate it more.

Comment 23 John Fulton 2016-03-01 19:41:37 UTC
Here is an example firstboot script to automate creation of GPT labels. 

 https://access.redhat.com/solutions/2186391

Comment 24 Mike Burns 2016-04-07 20:47:27 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 26 Hugh Brock 2016-05-04 09:38:43 UTC
I've reassigned this to Dmitry Tantsur to ensure it gets proper attention.

Comment 27 Dmitry Tantsur 2016-05-04 10:29:30 UTC
It's a common agreement in the Ironic community that we don't touch user instances, particularly don't partition the user data.

There are 2 ways out: use whole disk images (i.e. images with all the partitioning built in) or partition the disk on the 1st boot. The former option won't allow partitioning several physical devices, so I'd go with the latter. Which is probably documentation-only, as far as I know. Moving the hot potato to Steve to confirm.

Comment 28 Giulio Fidente 2016-05-04 10:31:52 UTC
For Ceph, we'll be creating both the GPT label and the partitions from puppet-ceph (most probably via ceph-disk). This happens already for the Ceph data disks, just not yet for the journal disks.

Comment 29 Giulio Fidente 2016-05-04 10:33:17 UTC

*** This bug has been marked as a duplicate of bug 1256103 ***


Note You need to log in before you can comment on or make changes to this bug.