Bug 1613918

Summary: [Docs] The Ceph Guide for OpenStack should have have a disk cleaning recomendation
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: John Fulton <johfulto>
Component: Documentation-RHHI4CAssignee: Aron Gunn <agunn>
Status: CLOSED CURRENTRELEASE QA Contact: Rachana Patel <racpatel>
Severity: medium Docs Contact: Anjana Suparna Sriram <asriram>
Priority: unspecified    
Version: 3.2CC: agunn, joea, pgrist, racpatel, sasha, srevivo, vereddy
Target Milestone: rc   
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-12 20:39:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Fulton 2018-08-08 14:44:58 UTC
The Deploying an Overcloud with Containerized Red Hat Ceph document [1] section 2.1 has an example of how to clean disks. This bug asks that this section be revised with an alternative available in OSP13 which is safer. 

We have suggested that users could use "clean_nodes=true" in the undercloud.conf but warned them of the side effects and explained why the default is false. I.e. if an operator makes a mistake and deletes a node, restoring that node will be much harder if its data is then automatically cleaned away. As a safer alternative, anyone deploying Ceph can keep automatic cleaning off and just run the following for each Ceph node between deployments.

  openstack baremetal node manage $node
  openstack baremetal node clean $node --clean-steps '[{"interface":
"deploy", "step": "erase_devices_metadata"}]'
  openstack baremetal node provide $node



[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/deploying_an_overcloud_with_containerized_red_hat_ceph/

Comment 1 John Fulton 2018-08-08 14:57:47 UTC
If you don't clean overcloud nodes which will host Ceph OSDs before deployment, then your deployment will fail. 

A warning like the above ^ should be added to this section of the document so that it's clear to the user WHY they should clean their disks.

The answer to the next question, why the deployment will fail, is that ceph-disk [1] won't prepare a disk that isn't clean. Deployers new to Ceph may not realize this and deployment tools which trigger ceph-disk will fail to prepare the requested OSDs, but usually on the second deployment. The first deployment will work if the disks happen to be factory clean for example. ceph-disk itself has a zap option to do the cleaning but it doesn't do it implicitly and neither do automation tools like ceph-ansible, or puppet-ceph by design. It has been proposed in the past but ultimately rejected because it could lead to accidental data loss. Instead all of the tools default to a safer option; don't delete data unless the user opts in to delete it.

[1] http://docs.ceph.com/docs/hammer/man/8/ceph-disk

Comment 4 Rachana Patel 2019-02-27 08:20:46 UTC
lgtm

Comment 5 John Fulton 2019-06-26 12:48:36 UTC
*** Bug 1722567 has been marked as a duplicate of this bug. ***