Red Hat Bugzilla – Bug 1265069
RFE: validate that the user didn't forget to introspect the nodes
Last modified: 2016-04-18 02:57:48 EDT
Description of problem:
I ran deployment, and waited close to 3 hours until I decided that something's not right. I then tried other deployments, hoping that something will work, but always got hung. I could have wasted a lot more time on it if I hadn't started suspecting that maybe I skipped the introspection (it's not the first time I miss a step in the process). I ran introspection and tried the deployment again and finally succeeded.
The system should give clearer indications as to what's wrong, and not just keep the user constantly guessing.
Version-Release number of selected component (if applicable):
Hi, dtantsur mentioned on patch review https://review.openstack.org/#/c/229269/ that we should support also flow without run introspection, so the patch would not make a sense.
Could someone specify what behaviour is wanted?
I can't make a decision on the wanted behaviour, but I believe that some kind of validation is necessary to help the user avoid mistakes. I think we need to test that the user either ran introspection or assigned roles manually as per the instructions in section 6.2.3. "Manually Tagging the Nodes" in https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Scenario_2_Using_the_CLI_to_Create_a_Basic_Overcloud.html#sect-Manually_Tagging_the_Nodes
A couple of thoughts:
First, introspection is not strictly required. If you've provided the appropriate hardware values in instackenv.json and aren't using AHC, then introspection doesn't actually buy you much. Upstream we actually don't run introspection in two of the three CI jobs because it doesn't increase our test coverage and we don't need it (and most importantly it adds more unnecessary time to the test runs).
Second, we _have_ added checks to the deployment process to verify things like the appropriate number of nodes are available, which I would expect to catch something like this.
I'm a little unsure how just skipping introspection would have caused a deployment to hang for three hours though. Worst case scenario there is that the Ironic nodes have bogus hardware values and Nova refuses to schedule to them because they don't meet the flavor requirements. But even in that case Nova would only try to schedule an instance three times and then give up, which would fail the Heat stack and the deployment. I guess it's possible this somehow tripped a bug in Heat or Nova that caused a failed deployment to be ignored, but that's a different bug and not an indication that introspection should be mandatory.
@bnemec Makes sense..
It looks to me, that this bug/RFE should be closed with WONTFIX. I will do that within couple of days if no new informations appear.
Run node's introspection is not mandatory.