Bug 1344457 - Assign Nodes step fail with "A plan with the name overcloud does not exist."
Summary: Assign Nodes step fail with "A plan with the name overcloud does not exist."
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Quickstart Cloud Installer
Classification: Red Hat
Component: Installation - RHELOSP
Version: 1.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ga
: 1.0
Assignee: Jason Montleon
QA Contact: Landon LaSmith
Dan Macpherson
URL:
Whiteboard:
Depends On:
Blocks: qci-sprint-17
TreeView+ depends on / blocked
 
Reported: 2016-06-09 18:18 UTC by Jean-Francois Saucier
Modified: 2016-09-13 16:29 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-13 16:29:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
foreman production log (28.25 KB, text/plain)
2016-06-09 18:18 UTC, Jean-Francois Saucier
no flags Details
fusor-undercloud-installer (350.53 KB, application/octet-stream)
2016-07-13 15:49 UTC, Jean-Francois Saucier
no flags Details
Log for failed swift-proxy (3.92 KB, text/plain)
2016-07-14 13:24 UTC, Jean-Francois Saucier
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:1862 0 normal SHIPPED_LIVE Red Hat Quickstart Installer 1.0 2016-09-13 20:18:48 UTC

Description Jean-Francois Saucier 2016-06-09 18:18:03 UTC
Created attachment 1166385 [details]
foreman production log

Description of problem:

Trying to do a deployment of OSP using QCI 1.2 version. Everything goes well until the "Assign Nodes" step. When entering this step, it fail with the following error :

Error retrieving OpenStack data: Adapter operation failed: Expected(200) <=> Actual(404 Not Found) excon.error.response :body => "{\n \"error\": {\n \"message\": \"A plan with the name overcloud does not exist.\"\n }\n}"


Version-Release number of selected component (if applicable):

QCI-1.2-RHEL-7-20160607.t.0-QCI-x86_64-dvd1.iso
QCIOOO-8.0-RHEL-7-20160527.n.0-QCIOOO-x86_64-dvd1.iso


How reproducible:

Always.


Steps to Reproduce:
1. Install QCI and QCIOOO from iso
2. Start an OSP deployment
3. Register two nodes and click "Next" to go to "Assign Nodes" step.


Actual results:

Fail with the mentioned error above.


Expected results:

Give the UI to assign nodes.


Additional info:

Comment 1 Jason Montleon 2016-06-21 19:17:37 UTC
If the overcloud plan did not get uploaded to it sounds like the QCI OOO kicked off by fusor-undercloud-installer did not go well. Ideally we should try to catch that it did not finish correctly and exit with an error instead of continuing and leaving the impression all is well.

Comment 2 Jean-Francois Saucier 2016-07-13 15:49:01 UTC
Created attachment 1179337 [details]
fusor-undercloud-installer

Comment 3 Jean-Francois Saucier 2016-07-13 15:50:18 UTC
Reproduced with the following ISO :

- QCI-1.2-RHEL-7-20160711.t.1-QCI-x86_64-dvd1.iso
- QCIOOO-8.0-RHEL-7-20160708.t.1-QCIOOO-x86_64-dvd1.iso

Comment 4 Jason Montleon 2016-07-13 16:18:14 UTC
Relevant part from fusor-undercloud-installer is here:

This is the step from egon where it runs swift list to check if the plan exists (and if not it attempts to upload it). Swift isn't ready yet and this is probably either due to slow disk or load. 

Redirecting to /bin/systemctl restart  openstack-heat-engine.service
/usr/lib/python2.7/site-packages/keystoneclient/service_catalog.py:196: UserWarning: Providing attr without filter_value to get_urls() is deprecated as of the 1.7.0 release and may be removed in the 2.0.0 release. Either both should be provided or neither should be provided.
  'Providing attr without filter_value to get_urls() is '
('Connection aborted.', error(111, 'Connection refused'))
/usr/lib/python2.7/site-packages/keystoneclient/service_catalog.py:196: UserWarning: Providing attr without filter_value to get_urls() is deprecated as of the 1.7.0 release and may be removed in the 2.0.0 release. Either both should be provided or neither should be provided.
  'Providing attr without filter_value to get_urls() is '
2016-07-12 19:07:33.534 20017 ERROR swiftclient [-] ('Connection aborted.', error(111, 'Connection refused'))

I was able to reproduce the same error by doing a 'openstack-service stop swift' and then I ran a one liner like below:

while ! swift stat; do echo "Swift is not ready. Sleeping for 30 seconds."; sleep 30; done

After 'openstack-service start swift the loop exited. 

Per conversation on irc it looks like the swift service did eventually start up as expected, which is to say empty output, but no connection error.

Perhaps we can drop a line like above into egon to ensure the service is started before we try to do anything.

Comment 5 Jason Montleon 2016-07-13 16:24:22 UTC
https://github.com/fusor/egon/pull/74

Comment 6 Jean-Francois Saucier 2016-07-14 13:24:29 UTC
I tried to do a new deployment with this fix. It stay in the while loop forever waiting for swift to be ready.

On the undercloud, if I do an "openstack-status", it report swift proxy as failed. I will attach the output of the journalctl log.

As reported in the log file, it failed to find the /etc/swift/container.ring.gz file. But this file exist on the undercloud when I check it.

What I see is it try to access the file at 11:39:40 but the file seems to get created at 11:41 :

[root@undercloud ~]# ls -l /etc/swift/container.ring.gz
-rw-r--r--. 1 root root 1731 Jul 14 11:41 /etc/swift/container.ring.gz

Comment 7 Jean-Francois Saucier 2016-07-14 13:24:53 UTC
Created attachment 1179859 [details]
Log for failed swift-proxy

Comment 13 John Matthews 2016-07-25 12:44:29 UTC
QCIOOO-8.0-RHEL-7-20160722.n.0-QCIOOO-x86_64-dvd1.iso

Comment 14 Landon LaSmith 2016-08-30 18:39:44 UTC
(In reply to Jason Montleon from comment #8)
> https://github.com/fusor/egon/pull/77
> https://github.com/fusor/fusor-undercloud-installer/pull/42

These PRs will wait for swift to load and exit if 'swift stat' doesn't return success after 5 attempts preventing the install from continuing when swift is down.

There is an issue where it won't restart swift if any or all of the systemd units are stopped/inactive due to the glob in 'systemctl is-active openstack-swift-*' not returning inactive units.  If any or all of the openstack-swift-* services are inactive, the call will return always 0.  Adding '--all' to the systemctl call will return active/inactive services and a non-zero if at least one is inactive

QCIOOO ISO Version: QCIOOO-8.0-RHEL-7-20160829.t.0

Comment 15 cchase 2016-08-30 20:32:48 UTC
https://github.com/fusor/egon/pull/83

Comment 16 Landon LaSmith 2016-08-31 20:08:34 UTC
VERIFIED.

I shutdown openstack-swift-container* services prior to execution of the PR code block and egon detected that swift services were not running and restarted the service.

QCIOOO Media Version: QCIOOO-8.0-RHEL-7-20160831.t.0

Comment 18 errata-xmlrpc 2016-09-13 16:29:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1862


Note You need to log in before you can comment on or make changes to this bug.