1437142 – [RFE] Always clean (at least metadata) of all the disks of deployed nodes

Bug 1437142 - [RFE] Always clean (at least metadata) of all the disks of deployed nodes

Summary: [RFE] Always clean (at least metadata) of all the disks of deployed nodes

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	instack-undercloud
Sub Component:
Version:	11.0 (Ocata)
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	Upstream M2
Target Release:	14.0 (Rocky)
Assignee:	Dmitry Tantsur
QA Contact:	mlammon
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-03-29 15:36 UTC by Dan Yasny
Modified:	2018-05-02 22:44 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-02 09:44:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Dan Yasny 2017-03-29 15:36:16 UTC

Description of problem:
If the root hint specified when deploying a node is not at the first boot device, IPMI cannot update the boot device (some IPMI drivers are like that apparently), and there is a bootable disk earlier in the sequence, the node will boot from the earlier-in-the-sequence disk and even if the overcloud image gets pushed to the correct root-hint (by WWN/HCTL/etc), what actually boots is another OS. This will cause the OC deploy to hang until the timeout, and will end in a faulty condition

Version-Release number of selected component (if applicable):
OSP11 - any version

How reproducible:
always

Steps to Reproduce:
1. deploy on virt, with two disks, when the first has a working OS
2. point the deploy root hint to the second disk
3. try to deploy overcloud

Actual results:
OC deploy fails on timeout. Checking the nodes, I can see it's booted to the first disk, and if I mnaually mount the second disk, I can see it contains the correct image. The root hint worked, but the node booted from the first disk

Expected results:
If the original disks are cleaned upon deploy, the boot sequence can simply continue iterating over all disks until it hits the one that overcloud is deployed on.

Additional info:

Comment 1 Ramon Acedo 2017-03-29 15:55:54 UTC

Currently undercloud.conf has clean_nodes = false by default. The goal would be:

1. Allow specifying wether to clean data + metadata, just metadata or nothing
2. Default to clean meatadata

Comment 2 Dmitry Tantsur 2017-04-03 11:17:21 UTC

Do I get it right that we need to set clean_node=True by default? I agree, but that's an RFE for sure. We can try fitting it into 12, as code-wise it's just one variable flip. But I'd prefer to get buy-in from the TripleO team, possibly via the ML.

> Default to clean meatadata

This is done.

> Allow specifying wether to clean data + metadata, just metadata or nothing

This can be done via hiera overrides.

Comment 3 Dan Yasny 2017-04-03 12:12:09 UTC

Whether or not this is an RFE is a question of semantics, since a (more) reasonable default setting doesn't always need to be an RFE, and the reason is pretty much why the BZ was open. 

But that's not really a big deal atm, since we're talking about a corner case. What we really need here is a good documentation note regarding cleaning, which will take care of warning customers their disks are going to be cleaned even when the root hint doesn't point at them. We don't want to do destructive actions without a warning fair warning to the user.

Comment 4 Dmitry Tantsur 2017-04-11 15:33:16 UTC

Given the upstream CI condition, and the size of our backlog, this has to be moved to 13.

Comment 6 Dmitry Tantsur 2017-10-02 11:24:45 UTC

Removing from the release. Giving that Queens is about stabilization, this better be deferred.

Comment 8 Dmitry Tantsur 2018-05-02 09:44:40 UTC

Hi all,

it seems that enabling cleaning by default meets resistance from both TripleO team and TripleO consumers - see thread http://lists.openstack.org/pipermail/openstack-dev/2018-April/129826.html. Thus, I'm closing this RFE in favour of a separate command to run cleaning: https://bugzilla.redhat.com/show_bug.cgi?id=1573790.

We can re-evaluate it if the upstream situation changes.

Note You need to log in before you can comment on or make changes to this bug.