Bug 1411571
Summary: | RHV+OSP+CFME+OCP Deployment failed: Went to status ERROR due to "Message: Unknown, Code: Unknown" | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Quickstart Cloud Installer | Reporter: | Landon LaSmith <llasmith> | ||||
Component: | Installation - RHELOSP | Assignee: | Jason Montleon <jmontleo> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Landon LaSmith <llasmith> | ||||
Severity: | unspecified | Docs Contact: | Dan Macpherson <dmacpher> | ||||
Priority: | unspecified | ||||||
Version: | 1.1 | CC: | bthurber, jmatthew, llasmith, qci-bugzillas, smallamp | ||||
Target Milestone: | --- | Keywords: | Triaged | ||||
Target Release: | 1.1 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-02-07 21:55:01 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1353464 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
As a test, I attempted to deploy OSP+CFME since RHV deployment succeeded but it failed with a different error that was reported in BZ1411935 It's possible this is a duplicate of BZ1411935 even though the error is a bit different. I've seen it succeed, fail with the error in BZ1411935, and yet fail with different errors. It comes down to timing as to what if any follow up commands fail in the script. Please retest after https://github.com/fusor/egon/pull/92 makes it into a compose. I've also encountered the below error on a OSP+CFME API deployment with 1 controller and 2 compute nodes. The OSP+CFME deployments have had success with the same iso. I think both errors are a symptom of the same unknown issue the reported error just depends on if the Compute or Controller stack resource fails first. ERROR: deployment failed with status: CREATE_FAILED and reason: Resource CREATE failed: ResourceInError: resources.Compute.resources[1].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" QCI Media Version: QCI-1.1-RHEL-7-20170116.t.0 QCIOOO Media Version: QCIOOO-10.0-RHEL-7-20170113.t.0 The reason isn't unknown. It's "Message: No valid host was found. There are not enough hosts available., Code: 500" You're hosts are locked up, possibly from only partial deletion of an old deployment or something else going wrong. What does ironic node-list and ironic node-show <id> for each show? (In reply to Jason Montleon from comment #6) > The reason isn't unknown. It's "Message: No valid host was found. There are > not enough hosts available., Code: 500" > > You're hosts are locked up, possibly from only partial deletion of an old > deployment or something else going wrong. > > What does ironic node-list and ironic node-show <id> for each show? Comment 5 was from a clean environment and a fresh deployment of OSP+CFME with no previous stack deployment. The environment is no longer available but I think that the controller and 1 out of 2 compute nodes was in an ERROR state from ironic node-list. The other compute node was powered on and active. This is probably https://bugzilla.redhat.com/show_bug.cgi?id=1353464 Maybe try the workaround suggested there and either set the max concurrent builds to 2 or even 1. crudini --set /etc/nova/nova.conf DEFAULT max_concurrent_builds 2; openstack-service restart nova It's probably caused by load induced from doing more builds than the director can keep up with in the virt environment. openstack-service isn't available in the QCIOOO iso for OSP 10 but the (In reply to Jason Montleon from comment #13) > This is probably https://bugzilla.redhat.com/show_bug.cgi?id=1353464 > > Maybe try the workaround suggested there and either set the max concurrent > builds to 2 or even 1. > > crudini --set /etc/nova/nova.conf DEFAULT max_concurrent_builds 2; > openstack-service restart nova > > It's probably caused by load induced from doing more builds than the > director can keep up with in the virt environment. openstack-service isn't available as part of the QCIOOO iso install for OSP 10 so you can replace it with systemctl command crudini --set /etc/nova/nova.conf DEFAULT max_concurrent_builds 2; systemctl restart openstack-nova-api openstack-nova-scheduler See: https://access.redhat.com/documentation/en/red-hat-openstack-platform/10/paged/director-installation-and-usage/chapter-9-troubleshooting-director-issues (Section 9.8 Tuning the undercloud) I haven't seen any reoccurrence of this issue when setting max_concurrent_build manually. QE updating automated job runs to include this in all OSP runs for more data points. Closing the bug. Will re-open if needed. - Sudhir |
Created attachment 1238948 [details] Log from the deployment Description of problem: During an all-in-one API deployment of RHV+OSP+CFME+OCP, the OSP deployment failed at 30% with the message: ERROR: deployment failed with status: CREATE_FAILED and reason: Resource CREATE failed: ResourceInError: resources.Controller.resources[0].resources.Controller: Went to status ERROR due to "Message: Unknown, Code: Unknown" QCI Media Version: QCI-1.1-RHEL-7-20170106.t.0 QCIOOO Media Version: QCIOOO-10.0-RHEL-7-20170104.t.0 How reproducible: First occurrence Steps to Reproduce: 1. Install QCI & QCIOOO from iso 2. Provision resources for RHV and OSP 3. Create and start deployment of RHV+OSP+CFME+OCP Actual results: Deployment fails during task Actions::Fusor::Deployment::OpenStack::Deploy Expected results: OSP deployment succeeds