Bug 1590297

Summary: Ironic driver initialization should retry if ironic reports no conductors
Product: Red Hat OpenStack Reporter: Raviv Bar-Tal <rbartal>
Component: openstack-novaAssignee: Dan Smith <dasmith>
Status: CLOSED ERRATA QA Contact: Raviv Bar-Tal <rbartal>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: berrange, bfournie, ccamacho, chjones, dasmith, dbecker, dsorrent, eglynn, jhakimra, jschluet, jstransk, kchamart, lyarwood, mbooth, mbracho, mburns, mcornea, morazi, nlevinki, owalsh, rbartal, sathlang, sbauza, sclewis, sferdjao, sgordon, srevivo, vromanso
Target Milestone: gaKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-nova-17.0.3-0.20180420001141.el7ost Doc Type: Known Issue
Doc Text:
Story Points: ---
Clone Of:
: 1592427 1592432 1592466 (view as bug list) Environment:
Last Closed: 2018-06-27 13:57:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1592427, 1592432, 1592466    

Description Raviv Bar-Tal 2018-06-12 11:37:23 UTC
Description of problem:
After running an update from beta to 2018-06-08.3 puddle and scaling out compute (adding 1 compute node to overcloud)
all old nodes (3 controllers, 1 compute) are in ERROR status,

In the 2 of the controllers, the openstack-memcached docker is stuck in Restarting:

410b48140798        192.168.24.1:8787/rhosp13/openstack-memcached:2018-06-08.3                   "/bin/bash -c 'sou..."   16 hours ago        Restarting (1) 5 hours ago                       memcached

Trying to check docker log generate permission error:
[heat-admin@controller-0 ~]$ sudo docker logs memcached
/bin/bash: /var/log/memcached.log: Permission denied


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.Install beta overcloud
2.Update overcloud
3.

Actual results:


Expected results:


Additional info:

Comment 2 Sofer Athlan-Guyot 2018-06-13 12:43:51 UTC
Hi Raviv, 

looking at home/stack/overcloud_scale_up.log I see that the heat process completed successfully:

  2018-06-11 23:08:24Z [overcloud-AllNodesDeploySteps-54k52bq4bzns]: UPDATE_COMPLETE  Stack UPDATE completed successfully
  2018-06-11 23:08:24Z [AllNodesDeploySteps]: UPDATE_COMPLETE  state changed
  2018-06-11 23:08:31Z [overcloud]: UPDATE_COMPLETE  Stack UPDATE completed successfully

Looking at the CI job I couldn't find an obvious error there neither.
Could you tell me tell me what is "ERROR status" in that sentence.

> all old nodes (3 controllers, 1 compute) are in ERROR status,

In the meantime, continuing to dig into the memcache issue.

Comment 3 Raviv Bar-Tal 2018-06-13 13:35:53 UTC
Hi Sofer 
The ERROR status is the state the overcloud nodes have in the output of this command:
'openstack server list'

Comment 4 Sofer Athlan-Guyot 2018-06-13 17:14:31 UTC
Oki,

so we're waiting for the ci test to be run again to get more info on the "openstack server list".  In particular we want to get the output of "openstack server show <id>" to get the exact error message, and that may be related to virt-env configuration:

   <yprokule> rbartal: chem ain't ERROR state of those nodes related to issues
           with vbmc ?                                                  [16:28]

This need checking (and thus the live env)

For the memcached issue it may be unrelated to scale out.  Trying to get a reproducer by updating another osp13 environment.

Currently I can see that the /var/log/containers/memcache directory is missing from controller-0,1,2 but that may be a logging artifact error (form the sosreport)

We can also see that the container start to fail after the reboot of the node that seems to happen in the job after the update:

    <jistr> Jun 11 22:17:29 controller-0 systemd[1]: Stopped Docker Application
            Container Engine.
    <jistr> -- Reboot --
    <jistr> Jun 11 22:18:30 controller-0 systemd[1]: Starting Docker Application
            Container Engine...
    <jistr> and first mention of the error is right after that:             [18:27]
    <jistr> Jun 11 22:18:40 controller-0 dockerd-current[2589]: /bin/bash:
            /var/log/memcached.log: Permission denied

In between there is a reboot.

We have those log related to the memcached container after the
reboot[1], but not sure it's related.

Next step would be to run an update, and reboot, and see what happen
to containers, as it seems this is happening before scale out.

[1] http://pastebin.test.redhat.com/603667

Comment 6 Marius Cornea 2018-06-13 18:20:51 UTC
(In reply to Sofer Athlan-Guyot from comment #4)

> Next step would be to run an update, and reboot, and see what happen
> to containers, as it seems this is happening before scale out.
> 

I had an updated environment and rebooted the controller nodes post update and the memcached container started ok:

[heat-admin@controller-0 ~]$ sudo docker ps | grep memcache
5cb57b77bc17        192.168.24.1:8787/rhosp13/openstack-memcached:2018-06-12.1                 "/bin/bash -c 'sou..."   21 hours ago        Up 23 minutes                                   memcached

s[heat-admin@controller-1 ~]$ sudo docker ps | grep memcache
662ac02958b7        192.168.24.1:8787/rhosp13/openstack-memcached:2018-06-12.1                 "/bin/bash -c 'sou..."   21 hours ago        Up 12 minutes                                   memcached


[root@controller-2 heat-admin]# docker ps | grep memcache
9fd750ebc24c        192.168.24.1:8787/rhosp13/openstack-memcached:2018-06-12.1                 "/bin/bash -c 'sou..."   22 hours ago        Up 36 seconds                                          memcached

Comment 9 Sofer Athlan-Guyot 2018-06-14 13:22:17 UTC
Created https://bugzilla.redhat.com/show_bug.cgi?id=1591278 to follow the memcached issue, focusing on the this error, which happen after the scale out

    (undercloud) [stack@undercloud-0 ~]$ openstack server list
    +--------------------------------------+--------------+--------+------------------------+---------------------------------+------------+
    | ID                                   | Name         | Status | Networks               | Image                           | Flavor     |
    +--------------------------------------+--------------+--------+------------------------+---------------------------------+------------+
    | d79ce12e-52e7-415a-86d2-953da21a0f81 | compute-1    | ACTIVE | ctlplane=192.168.24.11 | overcloud-full                  | compute    |
    | 5ad98bd6-c879-4b18-84a5-69df3020a0fb | controller-2 | ERROR  | ctlplane=192.168.24.9  | overcloud-full_20180613T152505Z | controller |
    | 82746f0a-d2c9-429d-bb48-648a61c156ae | compute-0    | ERROR  | ctlplane=192.168.24.13 | overcloud-full_20180613T152505Z | compute    |
    | b6732932-9954-4d36-a756-b2f2e796a53e | controller-0 | ERROR  | ctlplane=192.168.24.10 | overcloud-full_20180613T152505Z | controller |
    | b590302c-982a-4a14-b800-5cfd487ac52a | controller-1 | ERROR  | ctlplane=192.168.24.16 | overcloud-full_20180613T152505Z | controller |
    +--------------------------------------+--------------+--------+------------------------+---------------------------------+------------+

    (undercloud) [stack@undercloud-0 ~]$ openstack server show controller-2
    +-------------------------------------+------------------------------------------------------------------------+                                                                                                  
    | Field                               | Value                                                                  |                                                                                                  
    +-------------------------------------+------------------------------------------------------------------------+                                                                                                  
    | OS-DCF:diskConfig                   | MANUAL                                                                 |                                                                                                  
    | OS-EXT-AZ:availability_zone         | nova                                                                   |                                                                                                  
    | OS-EXT-SRV-ATTR:host                | undercloud-0.redhat.local                                              |                                                                                                  
    | OS-EXT-SRV-ATTR:hypervisor_hostname | 4dc3f9d6-7809-4164-9f3f-ba14f3c46ca1                                   |                                                                                                  
    | OS-EXT-SRV-ATTR:instance_name       | instance-00000001                                                      |                                                                                                  
    | OS-EXT-STS:power_state              | Running                                                                |                                                                                                  
    | OS-EXT-STS:task_state               | None                                                                   |                                                                                                  
    | OS-EXT-STS:vm_state                 | error                                                                  |                                                                                                  
    | OS-SRV-USG:launched_at              | 2018-06-13T15:55:58.000000                                             |                                                                                                  
    | OS-SRV-USG:terminated_at            | None                                                                   |                                                                                                  
    | accessIPv4                          |                                                                        |                                                                                                  
    | accessIPv6                          |                                                                        |                                                                                                  
    | addresses                           | ctlplane=192.168.24.9                                                  |                                                                                                  
    | config_drive                        | True                                                                   |                                                                                                  
    | created                             | 2018-06-13T15:51:47Z                                                   |                                                                                                  
    | flavor                              | controller (c0698e0b-5333-410d-8878-1c44490f6d32)                      |                                                                                                  
    | hostId                              | 84894593ace41fc1609184266da515ff275043cd68a4be54a84e6758               |                                                                                                  
    | id                                  | 5ad98bd6-c879-4b18-84a5-69df3020a0fb                                   |                                                                                                  
    | image                               | overcloud-full_20180613T152505Z (fc175c62-1252-48d3-a7ba-b7c5bb9f5def) |                                                                                                  
    | key_name                            | default                                                                |                                                                                                  
    | name                                | controller-2                                                           |                                                                                                  
    | project_id                          | 82315f5afedc4594bbb392c4891453c2                                       |                                                                                                  
    | properties                          |                                                                        |                                                                                                  
    | security_groups                     | name='default'                                                         |                                                                                                  
    | status                              | ERROR                                                                  |                                                                                                  
    | updated                             | 2018-06-13T17:46:38Z                                                   |                                                                                                  
    | user_id                             | 1d157ecf0818414c818bedf56d68518e                                       |                                                                                                  
    | volumes_attached                    |                                                                        |                                                                                                  
    +-------------------------------------+------------------------------------------------------------------------+

Comment 10 Sofer Athlan-Guyot 2018-06-15 09:52:41 UTC
After a long debug session with Jiri, we are still not able to get the root cause of the issue.

According to the timestanp in the db, the vm got into error either at the end of "openstack undercloud upgrade" or at the beginning of "openstack overcloud image upload --update-existing".

We are trying to get a reproducer, but running manually the step of the CI.

Regarding the job in itself we noted that a lot of tempest are run before the end of the scale out:
 - we would recommend only one tempest run at the end as this could very much change the cluster in strange way;
 - the infrared scale out command is wrong:


infrared cloud-config -o scale-up.yml --tasks
       scale_up --splitstack no --build beta --mirror tlv --scale-nodes
       compute-1

but should be

infrared cloud-config -o scale-up.yml --tasks
       scale_up --splitstack no --build passed_phase2 --mirror tlv --scale-nodes
       compute-1

(build changed to passed_phased2)

Comment 11 Jiri Stransky 2018-06-15 10:03:58 UTC
Note that for making sense of times in the environment, timezones are important: node time (logs etc.) is UTC-4, CLI times (like `openstack server show` pasted in earlier comment) are UTC, and Jenkins logs are UTC+3.

In addition to the failures happening around `openstack undercloud upgrade` and `openstack overcloud image upload --update-existing`, we observed this:

Close to 'updated' time from `openstack server show` there are errors in nova-compute.log:

2018-06-13 13:42:23.917 19257 DEBUG ironicclient.common.http [req-3a652a75-e204-4713-a384-adff28e259a2 - - - - -] Error contacting Ironic server: Unable to attach VIF because VIF 1711a06f-cc58-4552-b55b-9ce4a870f3f3 is already attached to Ironic Port cd2051a2-fc3c-42ee-8892-6d6230f90d24 (HTTP 409). Attempt 1 of 2 wrapper /usr/lib/python2.7/site-packages/ironicclient/common/http.py:258
2018-06-13 13:42:26.334 19257 ERROR ironicclient.common.http [req-3a652a75-e204-4713-a384-adff28e259a2 - - - - -] Error contacting Ironic server: Unable to attach VIF because VIF 1711a06f-cc58-4552-b55b-9ce4a870f3f3 is already attached to Ironic Port cd2051a2-fc3c-42ee-8892-6d6230f90d24 (HTTP 409). Attempt 2 of 2: Conflict: Unable to attach VIF because VIF 1711a06f-cc58-4552-b55b-9ce4a870f3f3 is already attached to Ironic Port cd2051a2-fc3c-42ee-8892-6d6230f90d24 (HTTP 409)

and in ironic app.log:

2018-06-13 13:42:23.911 17079 DEBUG wsme.api [req-c90be820-9a77-47d3-a2ad-4778e5c73b09 a1dde5ec4fff4d4889dc30059a4abe4a 5f6f06c988c24a7eb4d761360980fdc4 - default default] Client-side error: Unable to attach VIF because VIF 1711a06f-cc58-4552-b55b-9ce4a870f3f3 is already attached to Ironic Port cd2051a2-fc3c-42ee-8892-6d6230f90d24
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 226, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 2969, in vif_attach
    task.driver.network.vif_attach(task, vif_info)

  File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/network/common.py", line 551, in vif_attach
    port_like_obj = get_free_port_like_object(task, vif_id, physnets)

  File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/network/common.py", line 172, in get_free_port_like_object
    physnets)

  File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/network/common.py", line 117, in _get_free_portgroups_and_ports
    if _vif_attached(p, vif_id):

  File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/network/common.py", line 55, in _vif_attached
    vif=vif_id, object_uuid=port_like_obj.uuid)

VifAlreadyAttached: Unable to attach VIF because VIF 1711a06f-cc58-4552-b55b-9ce4a870f3f3 is already attached to Ironic Port cd2051a2-fc3c-42ee-8892-6d6230f90d24
 format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222


It's also worth noting that both Ironic and Nova-compute were restarted very shortly before these errors appeared (timestamps of the restarts fall within undercloud update, so that's expected).

Comment 12 Sofer Athlan-Guyot 2018-06-15 15:46:17 UTC
So it's not a scale up issue.  We are able to reproduce it upstream using this simple sequence:

 - install master;
 - update undercldoud;
   -> "overcloud (controller,compute)" vm_state switch to error

It seems not even a undercloud reboot is needed.  You have to wait a little after the undercloud update to get it and then the overcloud nodes switch to vm_state error (from the undercloud . stackrc && openstack server list).

It should be noted that the overcloud is working fine and can still be heat updated/upgrade and scale up, but it looks very bad on the display (see https://bugzilla.redhat.com/show_bug.cgi?id=1590297#c9)

Nova people are looking into it currently, and we working on a clean reproducer on osp13.

Comment 13 Jiri Stransky 2018-06-15 15:56:03 UTC
Just adding clarification to make sure we don't assume something that's not completely sure:

> So it's not a scale up issue.  We are able to reproduce it upstream using
> this simple sequence:
> 
>  - install master;
>  - update undercldoud;
>    -> "overcloud (controller,compute)" vm_state switch to error
> 
> It seems not even a undercloud reboot is needed.

I saw the VIF errors in logs *before* reboot, but i only noticed the Nova VM state in ERROR *after* reboot. Just waiting (or another service restart) instead of rebooting the undercloud might be enough for VMs to switch state, but i don't think we're sure of that yet.

Comment 14 Ollie Walsh 2018-06-15 16:08:25 UTC
Looks like a race in the service startup.

(undercloud) [stack@undercloud ~]$ openstack server list
+--------------------------------------+-------------------------+--------+------------------------+------------------------+--------------+
| ID                                   | Name                    | Status | Networks               | Image                  | Flavor       |
+--------------------------------------+-------------------------+--------+------------------------+------------------------+--------------+
| fcda23f8-7b98-41bf-9e45-7a4579164874 | overcloud-controller-1  | ERROR  | ctlplane=192.168.24.7  | overcloud-full_renamed | oooq_control |
| ced0e0db-ad1f-4381-a523-0a79ae0303ff | overcloud-controller-2  | ERROR  | ctlplane=192.168.24.21 | overcloud-full_renamed | oooq_control |
| 0731685f-a8fc-4126-868a-bdf9879238f3 | overcloud-controller-0  | ERROR  | ctlplane=192.168.24.12 | overcloud-full_renamed | oooq_control |
| c4fd4e28-fe44-40d2-9b40-7196b7c3b6a2 | overcloud-novacompute-2 | ERROR  | ctlplane=192.168.24.15 | overcloud-full_renamed | oooq_compute |
| 5bef3afd-4485-4cbc-b76c-f126c85bd015 | overcloud-novacompute-1 | ERROR  | ctlplane=192.168.24.8  | overcloud-full_renamed | oooq_compute |
| e4c9da33-c452-446c-82c4-bd55a6b294d8 | overcloud-novacompute-0 | ERROR  | ctlplane=192.168.24.9  | overcloud-full_renamed | oooq_compute |
+--------------------------------------+-------------------------+--------+------------------------+------------------------+--------------+


Looking at controller-1...

(undercloud) [stack@undercloud ~]$ openstack server show overcloud-controller-1
+-------------------------------------+---------------------------------------------------------------+
| Field                               | Value                                                         |
+-------------------------------------+---------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                        |
| OS-EXT-AZ:availability_zone         | nova                                                          |
| OS-EXT-SRV-ATTR:host                | undercloud                                                    |
| OS-EXT-SRV-ATTR:hypervisor_hostname | b6a32fba-b57a-4e7b-a6ce-99941b4d134d                          |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000005                                             |
| OS-EXT-STS:power_state              | Running                                                       |
| OS-EXT-STS:task_state               | None                                                          |
| OS-EXT-STS:vm_state                 | error                                                         |
| OS-SRV-USG:launched_at              | 2018-06-11T14:56:34.000000                                    |
| OS-SRV-USG:terminated_at            | None                                                          |
| accessIPv4                          |                                                               |
| accessIPv6                          |                                                               |
| addresses                           | ctlplane=192.168.24.7                                         |
| config_drive                        | True                                                          |
| created                             | 2018-06-11T14:53:09Z                                          |
| flavor                              | oooq_control (04f8ba26-e9bd-472f-b92a-919e7ec8bed1)           |
| hostId                              | da8848b6f3dc77a51235f70dcf44df197261eca0529173f670c94ce9      |
| id                                  | fcda23f8-7b98-41bf-9e45-7a4579164874                          |
| image                               | overcloud-full_renamed (4d965d80-61fc-45d0-88e1-e6365c4afd57) |
| key_name                            | default                                                       |
| name                                | overcloud-controller-1                                        |
| project_id                          | 3b778414471a47e4b0760bbecc9d4070                              |
| properties                          |                                                               |
| security_groups                     | name='default'                                                |
| status                              | ERROR                                                         |
| updated                             | 2018-06-15T09:29:17Z                                          |
| user_id                             | 356b9a5451c643bb8162b9349bc9487b                              |
| volumes_attached                    |                                                               |
+-------------------------------------+---------------------------------------------------------------+


From /var/log/ironic/ironic-conductor.log I can see that ironic-conductor started loading extensions after the updated timestamp:

2018-06-15 09:29:18.576 1408 DEBUG oslo_concurrency.lockutils [req-6ad8cb26-b607-4f65-a8b7-30ac72a7a997 - - - - -] Lock "extension_manager" acquired by "ironic.common.driver_factory._init_extension_manager" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273

The errors in /var/log/ironic/app.log occurred before this, because there wasn't a conductor registered that supports ipmi:

2018-06-15 09:29:14.419 1793 DEBUG wsme.api [req-81ce866e-1c15-4499-a452-044a44efa1c0 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
2018-06-15 09:29:15.304 1792 DEBUG wsme.api [req-be8a2443-12ad-4115-9640-b91b39d15765 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
2018-06-15 09:29:16.105 1793 DEBUG wsme.api [req-0f2c2010-3bce-4600-8c8a-54a6aa0b8965 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
2018-06-15 09:29:16.797 1792 DEBUG wsme.api [req-e660c0cd-8a25-4b93-815f-edc26c15ddba 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
2018-06-15 09:29:17.596 1793 DEBUG wsme.api [req-ffa50a06-3a62-4f3e-937c-4304818dca4c 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
2018-06-15 09:29:18.277 1793 DEBUG wsme.api [req-c391da42-997e-49ee-b604-11ab81f875e2 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222

And /var/log/nova/nova-compute.log:

2018-06-15 09:29:12.007 2623 INFO nova.service [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] Starting compute node (version 18.0.0-0.20180601221704.f902e0d.el7)
2018-06-15 09:29:12.135 2623 DEBUG nova.servicegroup.drivers.db [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] Seems service nova-compute on host undercloud is down. Last heartbeat was 2018-06-15 09:27:40. Elapsed time is 92.135605 is_up /usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py:79
2018-06-15 09:29:12.260 2623 DEBUG nova.compute.manager [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] Checking state _get_power_state /usr/lib/python2.7/site-packages/nova/compute/manager.py:1167
2018-06-15 09:29:13.819 2623 DEBUG nova.virt.ironic.driver [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] plug: instance_uuid=5bef3afd-4485-4cbc-b76c-f126c85bd015 vif=[{"profile": {}, "ovs_interfaceid": null, "preserve_on_delete": true, "network": {"bridge": null, "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "192.168.24.8"}], "version": 4, "meta": {"dhcp_server": "192.168.24.5"}, "dns": [{"meta": {}, "version": 4, "type": "dns", "address": "192.168.23.1"}], "routes": [{"interface": null, "cidr": "169.254.169.254/32", "meta": {}, "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "192.168.24.1"}}], "cidr": "192.168.24.0/24", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "192.168.24.1"}}], "meta": {"injected": false, "tenant_id": "3b778414471a47e4b0760bbecc9d4070", "mtu": 1500}, "id": "1fe54003-4d8c-4b37-9f20-f04216ac4e26", "label": "ctlplane"}, "devname": "tapcbd2c5be-32", "vnic_type": "baremetal", "qbh_params": null, "meta": {}, "details": {}, "address": "00:1a:24:52:db:70", "active": true, "type": "other", "id": "cbd2c5be-32fd-4c5c-9a14-43e323071912", "qbg_params": null}] _plug_vifs /usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py:1397
2018-06-15 09:29:14.422 2623 ERROR nova.virt.ironic.driver [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] Cannot attach VIF cbd2c5be-32fd-4c5c-9a14-43e323071912 to the node 059ac0f4-3a78-4f67-9f75-57a7a1f9ec5c due to error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. (HTTP 400): BadRequest: No valid host was found. Reason: No conductor service registered which supports driver ipmi. (HTTP 400)
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] Vifs plug failed: VirtualInterfacePlugException: Cannot attach VIF cbd2c5be-32fd-4c5c-9a14-43e323071912 to the node 059ac0f4-3a78-4f67-9f75-57a7a1f9ec5c due to error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. (HTTP 400)
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] Traceback (most recent call last):
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 942, in _init_instance
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015]     self.driver.plug_vifs(instance, net_info)
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015]   File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 1441, in plug_vifs
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015]     self._plug_vifs(node, instance, network_info)
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015]   File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 1408, in _plug_vifs
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015]     raise exception.VirtualInterfacePlugException(msg)
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] VirtualInterfacePlugException: Cannot attach VIF cbd2c5be-32fd-4c5c-9a14-43e323071912 to the node 059ac0f4-3a78-4f67-9f75-57a7a1f9ec5c due to error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. (HTTP 400)
2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015]

etc... for the other instances

Comment 15 Jiri Stransky 2018-06-18 10:20:27 UTC
I think the hypothesis with race in service startup is correct. I'm able to reproduce the 'ERROR' state of overcloud in `openstack server list` by simply doing:

sudo systemctl stop openstack-ironic-conductor
sudo systemctl restart openstack-nova-compute

After this ^ nodes go to 'ERROR'. Tried to recover from it this way:

sudo systemctl start openstack-ironic-conductor
sudo systemctl restart openstack-nova-compute

but the nodes still stay in 'ERROR'.

Comment 16 Sofer Athlan-Guyot 2018-06-18 12:07:41 UTC
Hi,

Reproduced on OSP13 as well:

    (undercloud) [stack@undercloud-0 ~]$ openstack server list
    +--------------------------------------+--------------+--------+------------------------+----------------+------------+
    | ID                                   | Name         | Status | Networks               | Image          | Flavor     |
    +--------------------------------------+--------------+--------+------------------------+----------------+------------+
    | 9f80c38a-9f33-4a18-88e0-b89776e62150 | compute-0    | ACTIVE | ctlplane=192.168.24.18 | overcloud-full | compute    |
    | e87efe17-b939-4df2-af0c-8e2effd58c95 | controller-1 | ACTIVE | ctlplane=192.168.24.9  | overcloud-full | controller |
    | 5a3ea20c-75e8-49fe-90b6-edad01fc0a48 | controller-2 | ACTIVE | ctlplane=192.168.24.13 | overcloud-full | controller |
    | ba0f26e7-ec2c-4e61-be8e-05edf00ce78a | controller-0 | ACTIVE | ctlplane=192.168.24.8  | overcloud-full | controller |
    +--------------------------------------+--------------+--------+------------------------+----------------+------------+
    (undercloud) [stack@undercloud-0 ~]$ sudo systemctl stop openstack-ironic-conductor
    (undercloud) [stack@undercloud-0 ~]$ sudo systemctl restart openstack-nova-compute
    (undercloud) [stack@undercloud-0 ~]$ openstack server list
    +--------------------------------------+--------------+--------+------------------------+----------------+------------+
    | ID                                   | Name         | Status | Networks               | Image          | Flavor     |
    +--------------------------------------+--------------+--------+------------------------+----------------+------------+
    | 9f80c38a-9f33-4a18-88e0-b89776e62150 | compute-0    | ERROR  | ctlplane=192.168.24.18 | overcloud-full | compute    |
    | e87efe17-b939-4df2-af0c-8e2effd58c95 | controller-1 | ERROR  | ctlplane=192.168.24.9  | overcloud-full | controller |
    | 5a3ea20c-75e8-49fe-90b6-edad01fc0a48 | controller-2 | ERROR  | ctlplane=192.168.24.13 | overcloud-full | controller |
    | ba0f26e7-ec2c-4e61-be8e-05edf00ce78a | controller-0 | ERROR  | ctlplane=192.168.24.8  | overcloud-full | controller |
    +--------------------------------------+--------------+--------+------------------------+----------------+------------+

Version (osp13 beta):

    undercloud) [stack@undercloud-0 ~]$ yum list | egrep 'ironic|nova'
    openstack-ironic-api.noarch      1:10.1.2-3.el7ost      @rhelosp-13.0-puddle
    openstack-ironic-common.noarch   1:10.1.2-3.el7ost      @rhelosp-13.0-puddle
    openstack-ironic-conductor.noarch
    openstack-ironic-inspector.noarch
    openstack-ironic-staging-drivers.noarch
    openstack-nova-api.noarch        1:17.0.3-0.20180420001138.el7ost
    openstack-nova-common.noarch     1:17.0.3-0.20180420001138.el7ost
    openstack-nova-compute.noarch    1:17.0.3-0.20180420001138.el7ost
    openstack-nova-conductor.noarch  1:17.0.3-0.20180420001138.el7ost
    openstack-nova-placement-api.noarch
    openstack-nova-scheduler.noarch  1:17.0.3-0.20180420001138.el7ost
    puppet-ironic.noarch             12.4.0-0.20180329034302.8285d85.el7ost
    puppet-nova.noarch               12.4.0-3.el7ost        @rhelosp-13.0-puddle
    python-ironic-inspector-client.noarch
    python-ironic-lib.noarch         2.12.1-1.el7ost        @rhelosp-13.0-puddle
    python-nova.noarch               1:17.0.3-0.20180420001138.el7ost
    python-novaclient.noarch         1:9.1.1-1.el7ost       @rhelosp-13.0-puddle
    python2-ironic-neutron-agent.noarch
    python2-ironicclient.noarch      2.2.0-1.el7ost         @rhelosp-13.0-puddle
    openstack-ironic-python-agent.noarch
    openstack-ironic-ui.noarch       3.1.2-1.el7ost         rhelosp-13.0-puddle
    openstack-nova.noarch            1:17.0.3-0.20180420001138.el7ost
    openstack-nova-cells.noarch      1:17.0.3-0.20180420001138.el7ost
    openstack-nova-console.noarch    1:17.0.3-0.20180420001138.el7ost
    openstack-nova-migration.noarch  1:17.0.3-0.20180420001138.el7ost
    openstack-nova-network.noarch    1:17.0.3-0.20180420001138.el7ost
    openstack-nova-novncproxy.noarch 1:17.0.3-0.20180420001138.el7ost
    openstack-nova-serialproxy.noarch
    openstack-nova-spicehtml5proxy.noarch
    python-nova-tests.noarch         1:17.0.3-0.20180420001138.el7ost
    python-novajoin.noarch           1.0.18-2.el7ost        rhelosp-13.0-puddle
    python2-ironic-python-agent.noarch


Re-Assigning this one to DFG:Compute.  Beside the race/wrong handling
of the vm_state, it would be nice to get a way to recover from the
error state.

Thanks,

Comment 18 Matthew Booth 2018-06-18 12:48:26 UTC
From comment 14, it looks like the service ordering issue is between ironic conductor and ironic api, so nova is the victim here. Guessing the simplest solution (for GA, at least) would be to ensure conductor comes up first, which sounds like a PIDONE thing? I'll let you make that call, though. Please let me know asap if compute can help.

Comment 20 Dmitry Tantsur 2018-06-18 13:21:14 UTC
Hi all,

We did some brainstorming on IRC, and that's what we came up with:

1. On nova-compute start, it tries to attach VIFs to ACTIVE instances. This behavior has been around since really long ago, and it seems correct.

2. A VIF attach call for Ironic requires a conductor to process it. With no ironic-conductor instances up we cannot handle it. The error message is a bit unfortunate but in fact correct: actions for the "ipmi" driver cannot be handled by an alive conductors since we don't have alive conductors at all. Again, this is the correct behavior given the Ironic architecture.

3. It seems like we've had this issue since long ago, I'm not entirely sure why we've never seen it. Is it possible that something has changed in upgrade/update ordering?

4. In any case, we have two options:

4.1. Make Nova somehow retry VIF attachment. This is probably too invasive and also probably incorrect for VMs.

4.2. Make sure we never start a nova-compute instance when no ironic-conductor instances are up (this also applies to the overcloud)

I think the last option is the most viable for an urgent fix. Can someone please provide their insights?

Comment 21 Ollie Walsh 2018-06-18 13:35:09 UTC
(In reply to Sofer Athlan-Guyot from comment #16)
> Hi,
> 
> Reproduced on OSP13 as well:
> 
>     (undercloud) [stack@undercloud-0 ~]$ openstack server list
>    
> +--------------------------------------+--------------+--------+-------------
> -----------+----------------+------------+
>     | ID                                   | Name         | Status |
> Networks               | Image          | Flavor     |
>    
> +--------------------------------------+--------------+--------+-------------
> -----------+----------------+------------+
>     | 9f80c38a-9f33-4a18-88e0-b89776e62150 | compute-0    | ACTIVE |
> ctlplane=192.168.24.18 | overcloud-full | compute    |
>     | e87efe17-b939-4df2-af0c-8e2effd58c95 | controller-1 | ACTIVE |
> ctlplane=192.168.24.9  | overcloud-full | controller |
>     | 5a3ea20c-75e8-49fe-90b6-edad01fc0a48 | controller-2 | ACTIVE |
> ctlplane=192.168.24.13 | overcloud-full | controller |
>     | ba0f26e7-ec2c-4e61-be8e-05edf00ce78a | controller-0 | ACTIVE |
> ctlplane=192.168.24.8  | overcloud-full | controller |
>    
> +--------------------------------------+--------------+--------+-------------
> -----------+----------------+------------+
>     (undercloud) [stack@undercloud-0 ~]$ sudo systemctl stop
> openstack-ironic-conductor
>     (undercloud) [stack@undercloud-0 ~]$ sudo systemctl restart
> openstack-nova-compute
>     (undercloud) [stack@undercloud-0 ~]$ openstack server list
>    
> +--------------------------------------+--------------+--------+-------------
> -----------+----------------+------------+
>     | ID                                   | Name         | Status |
> Networks               | Image          | Flavor     |
>    
> +--------------------------------------+--------------+--------+-------------
> -----------+----------------+------------+
>     | 9f80c38a-9f33-4a18-88e0-b89776e62150 | compute-0    | ERROR  |
> ctlplane=192.168.24.18 | overcloud-full | compute    |
>     | e87efe17-b939-4df2-af0c-8e2effd58c95 | controller-1 | ERROR  |
> ctlplane=192.168.24.9  | overcloud-full | controller |
>     | 5a3ea20c-75e8-49fe-90b6-edad01fc0a48 | controller-2 | ERROR  |
> ctlplane=192.168.24.13 | overcloud-full | controller |
>     | ba0f26e7-ec2c-4e61-be8e-05edf00ce78a | controller-0 | ERROR  |
> ctlplane=192.168.24.8  | overcloud-full | controller |
>    
> +--------------------------------------+--------------+--------+-------------
> -----------+----------------+------------+
> 
> Version (osp13 beta):
> 
>     undercloud) [stack@undercloud-0 ~]$ yum list | egrep 'ironic|nova'
>     openstack-ironic-api.noarch      1:10.1.2-3.el7ost     
> @rhelosp-13.0-puddle
>     openstack-ironic-common.noarch   1:10.1.2-3.el7ost     
> @rhelosp-13.0-puddle
>     openstack-ironic-conductor.noarch
>     openstack-ironic-inspector.noarch
>     openstack-ironic-staging-drivers.noarch
>     openstack-nova-api.noarch        1:17.0.3-0.20180420001138.el7ost
>     openstack-nova-common.noarch     1:17.0.3-0.20180420001138.el7ost
>     openstack-nova-compute.noarch    1:17.0.3-0.20180420001138.el7ost
>     openstack-nova-conductor.noarch  1:17.0.3-0.20180420001138.el7ost
>     openstack-nova-placement-api.noarch
>     openstack-nova-scheduler.noarch  1:17.0.3-0.20180420001138.el7ost
>     puppet-ironic.noarch             12.4.0-0.20180329034302.8285d85.el7ost
>     puppet-nova.noarch               12.4.0-3.el7ost       
> @rhelosp-13.0-puddle
>     python-ironic-inspector-client.noarch
>     python-ironic-lib.noarch         2.12.1-1.el7ost       
> @rhelosp-13.0-puddle
>     python-nova.noarch               1:17.0.3-0.20180420001138.el7ost
>     python-novaclient.noarch         1:9.1.1-1.el7ost      
> @rhelosp-13.0-puddle
>     python2-ironic-neutron-agent.noarch
>     python2-ironicclient.noarch      2.2.0-1.el7ost        
> @rhelosp-13.0-puddle
>     openstack-ironic-python-agent.noarch
>     openstack-ironic-ui.noarch       3.1.2-1.el7ost        
> rhelosp-13.0-puddle
>     openstack-nova.noarch            1:17.0.3-0.20180420001138.el7ost
>     openstack-nova-cells.noarch      1:17.0.3-0.20180420001138.el7ost
>     openstack-nova-console.noarch    1:17.0.3-0.20180420001138.el7ost
>     openstack-nova-migration.noarch  1:17.0.3-0.20180420001138.el7ost
>     openstack-nova-network.noarch    1:17.0.3-0.20180420001138.el7ost
>     openstack-nova-novncproxy.noarch 1:17.0.3-0.20180420001138.el7ost
>     openstack-nova-serialproxy.noarch
>     openstack-nova-spicehtml5proxy.noarch
>     python-nova-tests.noarch         1:17.0.3-0.20180420001138.el7ost
>     python-novajoin.noarch           1.0.18-2.el7ost       
> rhelosp-13.0-puddle
>     python2-ironic-python-agent.noarch
> 
> 
> Re-Assigning this one to DFG:Compute.  Beside the race/wrong handling
> of the vm_state, it would be nice to get a way to recover from the
> error state.

nova reset-state?

> 
> Thanks,

Comment 22 Matthew Booth 2018-06-18 13:45:54 UTC
(In reply to Jiri Stransky from comment #15)
> I think the hypothesis with race in service startup is correct. I'm able to
> reproduce the 'ERROR' state of overcloud in `openstack server list` by
> simply doing:
> 
> sudo systemctl stop openstack-ironic-conductor
> sudo systemctl restart openstack-nova-compute
> 
> After this ^ nodes go to 'ERROR'. Tried to recover from it this way:
> 
> sudo systemctl start openstack-ironic-conductor
> sudo systemctl restart openstack-nova-compute
> 
> but the nodes still stay in 'ERROR'.

Hard reboot is intended to resolve this in the general case. However, in this specific case a reset-state would probably be more appropriate, as no action should actually be required on an ironic node.

Comment 23 Carlos Camacho 2018-06-18 14:32:52 UTC
Hello,

We can re-set the node state with: 

    (undercloud) [stack@undercloud ~]$ openstack server list
    +--------------------------------------+-------------------------+--------+------------------------+----------------+--------------+
    | ID                                   | Name                    | Status | Networks               | Image          | Flavor       |
    +--------------------------------------+-------------------------+--------+------------------------+----------------+--------------+
    | a8c447c1-7f47-42be-807a-c475dc76ea89 | overcloud-controller-0  | ERROR  | ctlplane=192.168.24.6  | overcloud-full | oooq_control |
    | 8ef23886-298e-451b-8094-f10c6c6c02fb | overcloud-novacompute-0 | ERROR  | ctlplane=192.168.24.11 | overcloud-full | oooq_compute |
    +--------------------------------------+-------------------------+--------+------------------------+----------------+--------------+
    (undercloud) [stack@undercloud ~]$ openstack server set --state active a8c447c1-7f47-42be-807a-c475dc76ea89
    (undercloud) [stack@undercloud ~]$ openstack server list
    +--------------------------------------+-------------------------+--------+------------------------+----------------+--------------+
    | ID                                   | Name                    | Status | Networks               | Image          | Flavor       |
    +--------------------------------------+-------------------------+--------+------------------------+----------------+--------------+
    | a8c447c1-7f47-42be-807a-c475dc76ea89 | overcloud-controller-0  | ACTIVE | ctlplane=192.168.24.6  | overcloud-full | oooq_control |
    | 8ef23886-298e-451b-8094-f10c6c6c02fb | overcloud-novacompute-0 | ERROR  | ctlplane=192.168.24.11 | overcloud-full | oooq_compute |
    +--------------------------------------+-------------------------+--------+------------------------+----------------+--------------+



mmm In my case the nodes always worked, just had to manually set the state to ACTIVE again.

Comment 24 Jiri Stransky 2018-06-18 14:42:28 UTC
The `openstack server set --state active` workaround seems to work for me (at least as far as the CLI status output is concerned, not sure if something still stays broken under the hood).

However `nova reset-state` brings the node to ERROR again. And i tested on environment where the nodes never went to ERROR previously, and `nova reset-state` moves them to ERROR anyway...

Comment 25 Matthew Booth 2018-06-18 15:13:02 UTC
After more discussion this isn't a PIDONE thing, as that's not relevant to the undercloud. We're going to tackle this for GA by ensuring that if nova compute and ironic conductor are both installed and running, then ironic conductor should start first. We're going to do this in 2 ways:

1. Adding a systemd Before= dependency on nova-compute to ironic-conductor
2. Adding a corresponding dependencies to the puppet modules in instack-undercloud to ensure the same happens during upgrade.

Comment 26 Sofer Athlan-Guyot 2018-06-18 15:52:42 UTC
Hi,

so:

   for i in controller-{0,1,2} compute-0 ; do 
      openstack server set --state active  $i ;
   done

is working as expected.

Message post bz collusion ...

> 1. Adding a systemd Before= dependency on nova-compute to ironic-conductor
> 2. Adding a corresponding dependencies to the puppet modules in instack-undercloud to ensure the same happens during upgrade.

So I have a review (https://review.openstack.org/576165) for 2, undercloud ordering, but that still leave open:
 a. containerized undercloud ordering;
 b. overcloud container ordering (as I have understood, that it may be an issue there as well, right ?)

I think we should tackle the a. only in an upstream launchpad bug and b. in its own bz ?

Comment 27 Matthew Booth 2018-06-19 08:49:46 UTC
(In reply to Sofer Athlan-Guyot from comment #26)
> Hi,
> 
> so:
> 
>    for i in controller-{0,1,2} compute-0 ; do 
>       openstack server set --state active  $i ;
>    done
> 
> is working as expected.
> 
> Message post bz collusion ...
> 
> > 1. Adding a systemd Before= dependency on nova-compute to ironic-conductor
> > 2. Adding a corresponding dependencies to the puppet modules in instack-undercloud to ensure the same happens during upgrade.
> 
> So I have a review (https://review.openstack.org/576165) for 2, undercloud
> ordering, but that still leave open:
>  a. containerized undercloud ordering;
>  b. overcloud container ordering (as I have understood, that it may be an
> issue there as well, right ?)
> 
> I think we should tackle the a. only in an upstream launchpad bug and b. in
> its own bz ?

I already created bug 1592466 for that. We need a separate bz to land it anyway, as it's in a different component. I've moved your gerrit review over there and assigned you to it.

Comment 29 Matthew Booth 2018-06-19 15:56:31 UTC
I tested the proposed systemd change on an OSP13 undercloud, and although nova-compute does indeed start after ironic-conductor, I only observed a delay of approximately 1 second. I could also see from the logs that nova compute was coming up before ironic conductor initialisation had completed.

For the sake of posterity here are my notes on the issue:

In the systemd config for ironic conductor we have added:
  Before=openstack-nova-compute.service
 
Ironic conductor does:
    mgr = rpc_service.RPCService(CONF.host,
                                 'ironic.conductor.manager',
                                 'ConductorManager')
 
   # Part 1 below
    launcher = service.launch(CONF, mgr, restart_method='mutate')
 
   # Part 2 below
    launcher.wait()
 
# Part 1
service.launch() does ServiceLauncher.launch_service(cond manager)
ServiceLauncher.launch_service() does self.services.add(cond manager)
Services.add() does threadgroup.add_thread(run_service, cond manager, done event)
 
run_service is executed in a separate thread which does:
cond manager.start()
done.wait()
 
Ironic RPCService.start does:
  ...
  self.manager.init_host(admin_context)
...
 
So... the thread calls start() on the service then immediately waits for the event to signal that it should exit. Note that there was no systemd call in the thread.
 
# Part 2
ServiceLauncher.wait() does:
  systemd.notify_once()
  ...wait for all service threads to exit...
 
So... notice that we called systemd.notify_once() immediately without waiting for the completion of service.start().

Comment 30 Matthew Booth 2018-06-19 15:58:50 UTC
We're currently investigating hacking something into nova's ironic driver instead. This would be a downstream-only change intended to get us to GA, with a clear goal to remove it in favour of a proper fix for z1.

The proper fix would be either of (but preferrably both): bug 1592427 bug 1592432

Comment 31 Sofer Athlan-Guyot 2018-06-19 16:34:59 UTC
So to add to matthew comment, we have added a retry loop in the nova-compute code, but both review (ordering restart and retry loop) would be needed to make sure that the retry loop doesn't fail too often.

Comment 32 Sofer Athlan-Guyot 2018-06-19 17:06:00 UTC
Adding the gerrit early downport of the ordering restart if it's needed (not merged in master)

Comment 33 Dan Smith 2018-06-19 17:22:43 UTC
I don't think the systemd service ordering or notification changes will solve the problem. I also don't think a static hack of sleep(10) before starting nova-compute is reliable enough to be considered a serious fix.

I've proposed an upstream retry loop in plug_vifs() which can differentiate a normal failure from the race-at-service-startup case, and only retry (with a delay) in that situation. The downstream pre-backport is proposed here:

https://code.engineering.redhat.com/gerrit/#/c/141934

It would be good to get ironic people to validate that approach before we move ahead, but my suggestion would be to go with that slightly more invasive code fix because I think it will pay off in reliability.

Comment 34 Carlos Camacho 2018-06-20 19:30:18 UTC
Hi Dan FYI,

Please use the cloned BZs for tracking this issue/fix on the different components involved:

openstack-nova: https://bugzilla.redhat.com/show_bug.cgi?id=1592427
openstack-ironic: https://bugzilla.redhat.com/show_bug.cgi?id=1592432


We are using this BZ for merging and making available the fix for the component instack-undercloud.

The fix we will track in this BZ will be https://review.openstack.org/#/c/576165/ with the corresponding backport/package build.

I know it's a little messier to use a different BZ for each fix in each component  but in this way, each PM should be able to prioritize the work involved for each DFG.

Thanks!

Comment 40 Jiri Stransky 2018-06-25 15:16:15 UTC
*** Bug 1593909 has been marked as a duplicate of this bug. ***

Comment 42 errata-xmlrpc 2018-06-27 13:57:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086