Created attachment 1453600 [details] sosreport from the server showing 3 overcloud nodes in ERROR state Description of problem: Both Chris J (cjanisze) and myself hit this issue. At the completion of the upgrade to OSP13 on the Director node, all/some of the Overcloud nodes show in an ERROR State: (undercloud) [stack@ds-hf-ca-undercloud ~]$ openstack server list +--------------------------------------+------------------------+--------+-----------------------+--------------------------------+---------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+------------------------+--------+-----------------------+--------------------------------+---------+ | 3cd682e6-b2c0-4505-af7a-a01786a5cfe4 | overcloud-controller-2 | ACTIVE | ctlplane=172.16.0.105 | overcloud-full_20180619T142126 | control | | afb6d2a8-0937-488b-85dd-157ac38ad6bf | overcloud-controller-0 | ACTIVE | ctlplane=172.16.0.101 | overcloud-full_20180619T142126 | control | | 1f57af8d-bdc5-41b9-a58c-b561a7cfe927 | overcloud-compute-0 | ERROR | ctlplane=172.16.0.112 | overcloud-full_20180619T142126 | compute | | 2b6f3e6c-83d0-4fe1-856e-a001be10287e | overcloud-compute-1 | ERROR | ctlplane=172.16.0.103 | overcloud-full_20180619T142126 | compute | | d3b7b0be-3a55-4a0e-a1fd-15c401b392bb | overcloud-controller-1 | ERROR | ctlplane=172.16.0.108 | overcloud-full_20180619T142126 | control | +--------------------------------------+------------------------+--------+-----------------------+--------------------------------+---------+ (undercloud) [stack@ds-hf-ca-undercloud ~]$ In my environment (above) 3 nodes are in error state while 2 remain active. Chris had all of his nodes in ERROR state. The Overcloud appears to be functional so we are going to just use nova to reset the state to active. I am attaching the an sosreport from my environment before I force the state change to active. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Thanks for the report Darin, we've hit this recently in other environments too, it's a race condition between nova-compute and ironic-conductor starting up. If nova-compute comes up before ironic-conductor is able to reply on requests, the instances backed by ironic go to ERROR. Workaround is `openstack server set --state active <server-id>`. Being tracked as bug 1590297 so i'll mark this one as duplicate. *** This bug has been marked as a duplicate of bug 1590297 ***