Bug 1265069

Summary: RFE: validate that the user didn't forget to introspect the nodes
Product: Red Hat OpenStack Reporter: Udi Kalifon <ukalifon>
Component: python-rdomanager-oscpluginAssignee: Marek Aufart <maufart>
Status: CLOSED NOTABUG QA Contact: yeylon <yeylon>
Severity: high Docs Contact:
Priority: medium    
Version: 7.0 (Kilo)CC: bnemec, calfonso, jslagle, mburns, mcornea, rhel-osp-director-maint, srevivo, ukalifon
Target Milestone: ---Keywords: FutureFeature, Triaged
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-18 14:07:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Udi Kalifon 2015-09-22 05:17:00 UTC
Description of problem:
I ran deployment, and waited close to 3 hours until I decided that something's not right. I then tried other deployments, hoping that something will work, but always got hung. I could have wasted a lot more time on it if I hadn't started suspecting that maybe I skipped the introspection (it's not the first time I miss a step in the process). I ran introspection and tried the deployment again and finally succeeded.

The system should give clearer indications as to what's wrong, and not just keep the user constantly guessing.


Version-Release number of selected component (if applicable):
python-rdomanager-oscplugin-0.0.10-5.el7ost.noarch

Comment 3 Marek Aufart 2015-11-03 12:01:57 UTC
Hi, dtantsur mentioned on patch review https://review.openstack.org/#/c/229269/ that we should support also flow without run introspection, so the patch would not make a sense.

Could someone specify what behaviour is wanted?

Comment 4 Udi Kalifon 2015-11-03 12:27:45 UTC
I can't make a decision on the wanted behaviour, but I believe that some kind of validation is necessary to help the user avoid mistakes. I think we need to test that the user either ran introspection or assigned roles manually as per the instructions in section 6.2.3. "Manually Tagging the Nodes" in https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Scenario_2_Using_the_CLI_to_Create_a_Basic_Overcloud.html#sect-Manually_Tagging_the_Nodes

Comment 5 Ben Nemec 2016-01-07 23:06:08 UTC
A couple of thoughts:

First, introspection is not strictly required.  If you've provided the appropriate hardware values in instackenv.json and aren't using AHC, then introspection doesn't actually buy you much.  Upstream we actually don't run introspection in two of the three CI jobs because it doesn't increase our test coverage and we don't need it (and most importantly it adds more unnecessary time to the test runs).

Second, we _have_ added checks to the deployment process to verify things like the appropriate number of nodes are available, which I would expect to catch something like this.

I'm a little unsure how just skipping introspection would have caused a deployment to hang for three hours though.  Worst case scenario there is that the Ironic nodes have bogus hardware values and Nova refuses to schedule to them because they don't meet the flavor requirements.  But even in that case Nova would only try to schedule an instance three times and then give up, which would fail the Heat stack and the deployment.  I guess it's possible this somehow tripped a bug in Heat or Nova that caused a failed deployment to be ignored, but that's a different bug and not an indication that introspection should be mandatory.

Comment 6 Marek Aufart 2016-01-08 10:01:49 UTC
@bnemec Makes sense..

It looks to me, that this bug/RFE should be closed with WONTFIX. I will do that within couple of days if no new informations appear.

Comment 7 Marek Aufart 2016-01-18 14:07:03 UTC
Run node's introspection is not mandatory.