Bug 1593909 - Overcloud Nodes listed as "ERROR" after Upgrade to OSP13
Summary: Overcloud Nodes listed as "ERROR" after Upgrade to OSP13
Keywords:
Status: CLOSED DUPLICATE of bug 1590297
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Jiri Stransky
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-21 20:00 UTC by Darin Sorrentino
Modified: 2018-06-25 15:16 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-25 15:16:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport from the server showing 3 overcloud nodes in ERROR state (17.00 MB, application/x-xz)
2018-06-21 20:00 UTC, Darin Sorrentino
no flags Details

Description Darin Sorrentino 2018-06-21 20:00:16 UTC
Created attachment 1453600 [details]
sosreport from the server showing 3 overcloud nodes in ERROR state

Description of problem:
Both Chris J (cjanisze) and myself hit this issue.  At the completion of the upgrade to OSP13 on the Director node, all/some of the Overcloud nodes show in an ERROR State:

(undercloud) [stack@ds-hf-ca-undercloud ~]$ openstack server list
+--------------------------------------+------------------------+--------+-----------------------+--------------------------------+---------+
| ID                                   | Name                   | Status | Networks              | Image                          | Flavor  |
+--------------------------------------+------------------------+--------+-----------------------+--------------------------------+---------+
| 3cd682e6-b2c0-4505-af7a-a01786a5cfe4 | overcloud-controller-2 | ACTIVE | ctlplane=172.16.0.105 | overcloud-full_20180619T142126 | control |
| afb6d2a8-0937-488b-85dd-157ac38ad6bf | overcloud-controller-0 | ACTIVE | ctlplane=172.16.0.101 | overcloud-full_20180619T142126 | control |
| 1f57af8d-bdc5-41b9-a58c-b561a7cfe927 | overcloud-compute-0    | ERROR  | ctlplane=172.16.0.112 | overcloud-full_20180619T142126 | compute |
| 2b6f3e6c-83d0-4fe1-856e-a001be10287e | overcloud-compute-1    | ERROR  | ctlplane=172.16.0.103 | overcloud-full_20180619T142126 | compute |
| d3b7b0be-3a55-4a0e-a1fd-15c401b392bb | overcloud-controller-1 | ERROR  | ctlplane=172.16.0.108 | overcloud-full_20180619T142126 | control |
+--------------------------------------+------------------------+--------+-----------------------+--------------------------------+---------+
(undercloud) [stack@ds-hf-ca-undercloud ~]$ 


In my environment (above) 3 nodes are in error state while 2 remain active. Chris had all of his nodes in ERROR state.

The Overcloud appears to be functional so we are going to just use nova to reset the state to active.  I am attaching the an sosreport from my environment before I force the state change to active. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jiri Stransky 2018-06-25 15:16:15 UTC
Thanks for the report Darin, we've hit this recently in other environments too, it's a race condition between nova-compute and ironic-conductor starting up. If nova-compute comes up before ironic-conductor is able to reply on requests, the instances backed by ironic go to ERROR. Workaround is `openstack server set --state active <server-id>`.

Being tracked as bug 1590297 so i'll mark this one as duplicate.

*** This bug has been marked as a duplicate of bug 1590297 ***


Note You need to log in before you can comment on or make changes to this bug.