Bug 1714023

Summary: [RFE] Consider re-enabling power sync between Nova and Ironic
Product: Red Hat OpenStack Reporter: bkopilov <bkopilov>
Component: openstack-novaAssignee: RHOS Maint <rhos-maint>
Status: CLOSED ERRATA QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: medium    
Version: 15.0 (Stein)CC: acanan, bfournie, bkopilov, dasmith, dcadzow, dtantsur, eglynn, jhakimra, jkreger, kchamart, lkuchlan, ltoscano, mbooth, mschuppe, sbauza, sgordon, tshefi, vromanso
Target Milestone: Upstream M2Keywords: FutureFeature, Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: x86_64   
OS: Linux   
Whiteboard: docs-accepted
Fixed In Version: openstack-nova-20.2.1-0.20200528080027.1e95025.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1720203 (view as bug list) Environment:
Last Closed: 2020-07-29 07:49:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1720203    
Attachments:
Description Flags
undercloud logs rhos15 for nova stop / start command
none
undercloud logs after nova stop / start - reroduced again none

Description bkopilov 2019-05-26 17:00:06 UTC
Description of problem:
nova stop 2afc7428-6ae9-4752-b24a-64d96b94850f   ------->  Power is Running (on rhos14 is off)

  SubjectAltNameWarning
+--------------------------------------+--------------+---------+------------+-------------+------------------------+
| ID                                   | Name         | Status  | Task State | Power State | Networks               |
+--------------------------------------+--------------+---------+------------+-------------+------------------------+
| 2afc7428-6ae9-4752-b24a-64d96b94850f | compute-0    | SHUTOFF | -          | Running     | ctlplane=192.168.24.11 |
| a1e11e81-89df-481e-a7ee-697cfecc8f06 | compute-1    | ACTIVE  | -          | Running     | ctlplane=192.168.24.9  |
| 7afd4919-83cb-42ed-ba9d-ad1760ea24b6 | controller-0 | ACTIVE  | -          | Running     | ctlplane=192.168.24.8  |
| 66df7a56-08c3-483b-ace0-26b562f0922c | controller-1 | ACTIVE  | -          | Running     | ctlplane=192.168.24.10 |
| 4bcbb70d-2d46-4784-8c70-5179c5001db4 | controller-2 | ACTIVE  | -          | Running     | ctlplane=192.168.24.21 |
+--------------------------------------+--------------+---------+------------+-------------+-------------



nova start 2afc7428-6ae9-4752-b24a-64d96b94850f

--------->  Power state changed to shutdown ?????  after start
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| 2afc7428-6ae9-4752-b24a-64d96b94850f | compute-0    | ACTIVE | -          | Shutdown    | ctlplane=192.168.24.11 |


Version-Release number of selected component (if applicable):


How reproducible:
nova stop command on undercloud Power State = Running
nova start state changed to Power State = Shutdown



Actual results:
nova start power off the instance

Expected results:
nova start should start the instance.

Additional info:

Comment 1 bkopilov 2019-05-27 03:50:17 UTC
Created attachment 1573734 [details]
undercloud logs rhos15 for nova stop / start command

Comment 2 Martin Schuppert 2019-05-27 09:13:53 UTC
2019-05-26 23:45:02.696 8 DEBUG nova.virt.ironic.driver [-] [instance: 2afc7428-6ae9-4752-b24a-64d96b94850f] Still waiting for ironic node 5d21946a-b852-4af2-853e-e28410872ecc to power on: power_state="power off", target_power_state="power on", provision_state="active", target_prov
ision_state=None _log_ironic_polling /usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py:130
2019-05-26 23:45:04.698 8 DEBUG nova.virt.ironic.driver [-] [instance: 2afc7428-6ae9-4752-b24a-64d96b94850f] Still waiting for ironic node 5d21946a-b852-4af2-853e-e28410872ecc to power on: power_state="power off", target_power_state="power on", provision_state="active", target_prov
ision_state=None _log_ironic_polling /usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py:130
2019-05-26 23:45:06.698 8 DEBUG nova.virt.ironic.driver [-] [instance: 2afc7428-6ae9-4752-b24a-64d96b94850f] Still waiting for ironic node 5d21946a-b852-4af2-853e-e28410872ecc to power on: power_state="power off", target_power_state="power on", provision_state="active", target_prov
ision_state=None _log_ironic_polling /usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py:130
2019-05-26 23:45:08.699 8 DEBUG nova.virt.ironic.driver [-] [instance: 2afc7428-6ae9-4752-b24a-64d96b94850f] Still waiting for ironic node 5d21946a-b852-4af2-853e-e28410872ecc to power on: power_state="power off", target_power_state="power on", provision_state="active", target_prov
ision_state=None _log_ironic_polling /usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py:130
2019-05-26 23:45:10.699 8 DEBUG nova.virt.ironic.driver [-] [instance: 2afc7428-6ae9-4752-b24a-64d96b94850f] Still waiting for ironic node 5d21946a-b852-4af2-853e-e28410872ecc to power on: power_state="power off", target_power_state="power on", provision_state="active", target_prov
ision_state=None _log_ironic_polling /usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py:130
2019-05-26 23:45:11.481 8 DEBUG oslo_service.periodic_task [req-729a7d94-ccca-4a2c-803c-4a3ac52662c5 - - - - -] Running periodic task ComputeManager._poll_rescued_instances run_periodic_tasks /usr/lib/python3.6/site-packages/oslo_service/periodic_task.py:217
2019-05-26 23:45:12.699 8 DEBUG nova.virt.ironic.driver [-] [instance: 2afc7428-6ae9-4752-b24a-64d96b94850f] Still waiting for ironic node 5d21946a-b852-4af2-853e-e28410872ecc to power on: power_state="power off", target_power_state="power on", provision_state="active", target_prov
ision_state=None _log_ironic_polling /usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py:130
2019-05-26 23:45:14.699 8 INFO nova.virt.ironic.driver [req-06862a17-5734-4c5d-9c67-69323320d5c9 5491fcd93c2d48c8b2bd5b0eff94d0e4 4201997a545a4323a8edd47823ebc2df - default default] [instance: 2afc7428-6ae9-4752-b24a-64d96b94850f] Successfully powered on Ironic node 5d21946a-b852-4
af2-853e-e28410872ecc
2019-05-26 23:45:14.700 8 DEBUG nova.compute.manager [req-06862a17-5734-4c5d-9c67-69323320d5c9 5491fcd93c2d48c8b2bd5b0eff94d0e4 4201997a545a4323a8edd47823ebc2df - default default] [instance: 2afc7428-6ae9-4752-b24a-64d96b94850f] Checking state _get_power_state /usr/lib/python3.6/si
te-packages/nova/compute/manager.py:1269


Is port 6232 the one associated with instance: 2afc7428-6ae9-4752-b24a-64d96b94850f / ironic node 5d21946a-b852-4af2-853e-e28410872ecc ?

From ironic log we see that the power status for one of the ipmi power status requests had state off:

2019-05-26 23:46:39.202 9 DEBUG ironic.common.utils [req-669e01c2-0e04-4589-8ea5-ac5d4165fc42 - - - - -] Execution completed, command line is "ipmitool -I lanplus -H 172.16.0.1 -L ADMINISTRATOR -p 6232 -U admin -v -R 12 -N 5 -f /tmp/tmpypeoxnao power status" execute /usr/lib/python
3.6/site-packages/ironic/common/utils.py:77
2019-05-26 23:46:39.202 9 DEBUG ironic.common.utils [req-669e01c2-0e04-4589-8ea5-ac5d4165fc42 - - - - -] Command stdout is: "Chassis Power is off" execute /usr/lib/python3.6/site-packages/ironic/common/utils.py:78

Comment 3 bkopilov 2019-05-27 12:50:33 UTC
Reinstalled rhos15 :
same issue , adding ironic :

(undercloud) (undercloud) [root@undercloud-0 stack]# ironic node-list

+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 77670189-6a91-4448-aa7d-1b8a33c3250f | compute-0    | ffe50055-6d5b-46d0-aac8-0a3f5e6ecb7d | power on    | active             | False       |
| 5d21946a-b852-4af2-853e-e28410872ecc | compute-1    | afaccb5f-88aa-40c6-8307-d4ef949a7b43 | power on    | active             | False       |
| b9e6ebe9-3cb2-4bf2-8e8a-70e16d8618b1 | controller-0 | 1707a068-0be3-4299-8291-0d026c9da24c | power on    | active             | False       |
| df96e9bf-6e9b-4e93-acb1-2ee2dd25b806 | controller-1 | 1b7d71e7-b38c-44d7-b021-124ee11b7a6b | power on    | active             | False       |
| 35550a91-eb19-4b1e-ac24-1a84376cc762 | controller-2 | 8a9f0f65-75ac-49f4-abaf-d1c976761688 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
(undercloud) (undercloud) [root@undercloud-0 stack]# nova list

+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ffe50055-6d5b-46d0-aac8-0a3f5e6ecb7d | compute-0    | ACTIVE | -          | Running     | ctlplane=192.168.24.10 |
| afaccb5f-88aa-40c6-8307-d4ef949a7b43 | compute-1    | ACTIVE | -          | Running     | ctlplane=192.168.24.18 |
| 8a9f0f65-75ac-49f4-abaf-d1c976761688 | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.14 |
| 1707a068-0be3-4299-8291-0d026c9da24c | controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.16 |
| 1b7d71e7-b38c-44d7-b021-124ee11b7a6b | controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.13 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+


Stop instance :
moved by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
+--------------------------------------+--------------+---------+------------+-------------+------------------------+
| ID                                   | Name         | Status  | Task State | Power State | Networks               |
+--------------------------------------+--------------+---------+------------+-------------+------------------------+
| ffe50055-6d5b-46d0-aac8-0a3f5e6ecb7d | compute-0    | SHUTOFF | -          | Running     | ctlplane=192.168.24.10 |
| afaccb5f-88aa-40c6-8307-d4ef949a7b43 | compute-1    | ACTIVE  | -          | Running     | ctlplane=192.168.24.18 |
| 8a9f0f65-75ac-49f4-abaf-d1c976761688 | controller-0 | ACTIVE  | -          | Running     | ctlplane=192.168.24.14 |
| 1707a068-0be3-4299-8291-0d026c9da24c | controller-1 | ACTIVE  | -          | Running     | ctlplane=192.168.24.16 |
| 1b7d71e7-b38c-44d7-b021-124ee11b7a6b | controller-2 | ACTIVE  | -          | Running     | ctlplane=192.168.24.13 |
+--------------------------------------+--------------+---------+------------+-------------+------------------------+


ironic status :
moved by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 77670189-6a91-4448-aa7d-1b8a33c3250f | compute-0    | ffe50055-6d5b-46d0-aac8-0a3f5e6ecb7d | power off   | active             | False       |
| 5d21946a-b852-4af2-853e-e28410872ecc | compute-1    | afaccb5f-88aa-40c6-8307-d4ef949a7b43 | power on    | active             | False       |
| b9e6ebe9-3cb2-4bf2-8e8a-70e16d8618b1 | controller-0 | 1707a068-0be3-4299-8291-0d026c9da24c | power on    | active             | False       |
| df96e9bf-6e9b-4e93-acb1-2ee2dd25b806 | controller-1 | 1b7d71e7-b38c-44d7-b021-124ee11b7a6b | power on    | active             | False       |
| 35550a91-eb19-4b1e-ac24-1a84376cc762 | controller-2 | 8a9f0f65-75ac-49f4-abaf-d1c976761688 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
(undercloud) (undercloud) [root@undercloud-0 stack]# 




when start instance:
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ffe50055-6d5b-46d0-aac8-0a3f5e6ecb7d | compute-0    | ACTIVE | -          | Shutdown    | ctlplane=192.168.24.10 |
| afaccb5f-88aa-40c6-8307-d4ef949a7b43 | compute-1    | ACTIVE | -          | Running     | ctlplane=192.168.24.18 |
| 8a9f0f65-75ac-49f4-abaf-d1c976761688 | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.14 |
| 1707a068-0be3-4299-8291-0d026c9da24c | controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.16 |
| 1b7d71e7-b38c-44d7-b021-124ee11b7a6b | controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.13 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+


ironic stats :
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 77670189-6a91-4448-aa7d-1b8a33c3250f | compute-0    | ffe50055-6d5b-46d0-aac8-0a3f5e6ecb7d | power on    | active             | False       |
| 5d21946a-b852-4af2-853e-e28410872ecc | compute-1    | afaccb5f-88aa-40c6-8307-d4ef949a7b43 | power on    | active             | False       |
| b9e6ebe9-3cb2-4bf2-8e8a-70e16d8618b1 | controller-0 | 1707a068-0be3-4299-8291-0d026c9da24c | power on    | active             | False       |
| df96e9bf-6e9b-4e93-acb1-2ee2dd25b806 | controller-1 | 1b7d71e7-b38c-44d7-b021-124ee11b7a6b | power on    | active             | False       |
| 35550a91-eb19-4b1e-ac24-1a84376cc762 | controller-2 | 8a9f0f65-75ac-49f4-abaf-d1c976761688 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+

Comment 4 bkopilov 2019-05-27 12:52:49 UTC
Created attachment 1573920 [details]
undercloud logs after nova stop / start - reroduced again

Comment 6 Bob Fournier 2019-05-31 18:10:24 UTC
Per Martin's question in Comment 2 - can you provide the instackenv.json file used to import the nodes so we can see the IPMI ports being used? Also can you check the output of virsh list when you do the stop and start?  Seems like some kind of mismatch.

Comment 7 Bob Fournier 2019-05-31 19:36:04 UTC
I'm assuming the IPMI port for the logs from Comment 3 is 6234.  Looking at Ironic logs, all looks well.

Node 77670189-6a91-4448-aa7d-1b8a33c3250f, power is ON:
ironic-conductor.log:2019-05-27 08:46:42.350 9 DEBUG ironic.common.utils [req-669e01c2-0e04-4589-8ea5-ac5d4165fc42 - - - - -] Execution completed, command line is "ipmitool -I lanplus -H 172.16.0.1 -L ADMINISTRATOR -p 6234 -U admin -v -R 12 -N 5 -f /tmp/tmpyoo7gk0x power status" execute /usr/lib/python3.6/site-packages/ironic/common/utils.py:77
ironic-conductor.log-2019-05-27 08:46:42.350 9 DEBUG ironic.common.utils [req-669e01c2-0e04-4589-8ea5-ac5d4165fc42 - - - - -] Command stdout is: "Chassis Power is on
ironic-conductor.log-" execute /usr/lib/python3.6/site-packages/ironic/common/utils.py:78

Node 77670189-6a91-4448-aa7d-1b8a33c3250f is powered off:
2019-05-27 08:47:25.953 9 DEBUG ironic.common.utils [-] Execution completed, command line is "ipmitool -I lanplus -H 172.16.0.1 -L ADMINISTRATOR -p 6234 -U admin -v -R 12 -N 5 -f /tmp/tmp1f2b6e4p power status" execute /usr/lib/python3.6/site-packages/ironic/common/utils.py:77
2019-05-27 08:47:25.953 9 DEBUG ironic.common.utils [-] Command stdout is: "Chassis Power is off
" execute /usr/lib/python3.6/site-packages/ironic/common/utils.py:78
2019-05-27 08:47:25.977 9 INFO ironic.conductor.utils [req-7e83ac36-45ec-4505-9975-911fcaf53579 688c000b558e4cdab5c8be18e5d2fb96 b15fe71e61ad4c34b9916773e7e7c190 - default default] Successfully set node 77670189-6a91-4448-aa7d-1b8a33c3250f power state to power off by soft power off.

Node 77670189-6a91-4448-aa7d-1b8a33c3250f is powered back on:
2019-05-27 08:49:04.222 9 DEBUG oslo_concurrency.processutils [-] CMD "ipmitool -I lanplus -H 172.16.0.1 -L ADMINISTRATOR -p 6234 -U admin -v -R 12 -N 5 -f /tmp/tmpimtl8m4c power status" returned: 0 in 0.028s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409
2019-05-27 08:49:04.223 9 DEBUG ironic.common.utils [-] Execution completed, command line is "ipmitool -I lanplus -H 172.16.0.1 -L ADMINISTRATOR -p 6234 -U admin -v -R 12 -N 5 -f /tmp/tmpimtl8m4c power status" execute /usr/lib/python3.6/site-packages/ironic/common/utils.py:77
2019-05-27 08:49:04.223 9 DEBUG ironic.common.utils [-] Command stdout is: "Chassis Power is on
2019-05-27 08:49:04.256 9 INFO ironic.conductor.utils [req-2c61aed9-0e89-45eb-a11c-3674f4143a2c 688c000b558e4cdab5c8be18e5d2fb96 b15fe71e61ad4c34b9916773e7e7c190 - default default] Successfully set node 77670189-6a91-4448-aa7d-1b8a33c3250f power state to power on by power on.

The “Power State” output in “ironic node-list” is also correct and matches this.

The nova “Status” is also correct.  Its only the the nova “Power State” that is out of sync.

In nova logs we also see the node successfully transitioned to off:
2019-05-27 08:47:05.646 8 DEBUG nova.virt.ironic.driver [-] [instance: ffe50055-6d5b-46d0-aac8-0a3f5e6ecb7d] Still waiting for ironic node 77670189-6a91-4448-aa7d-1b8a33c3250f to soft power off: power_state="power on", target_power_state="power off", provision_state="active", target_provision_state=None _log_ironic_polling /usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py:130
…
2019-05-27 08:47:13.657 8 DEBUG nova.virt.ironic.driver [-] [instance: ffe50055-6d5b-46d0-aac8-0a3f5e6ecb7d] Still waiting for ironic node 77670189-6a91-4448-aa7d-1b8a33c3250f to soft power off: power_state="power on", target_power_state="power off", provision_state="active", target_provision_state=None _log_ironic_polling /usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py:130
2019-05-27 08:47:27.711 8 INFO nova.virt.ironic.driver [req-420cdcea-94c2-4975-87e4-fa73f77cc672 5491fcd93c2d48c8b2bd5b0eff94d0e4 4201997a545a4323a8edd47823ebc2df - default default] [instance: ffe50055-6d5b-46d0-aac8-0a3f5e6ecb7d] Successfully soft powered off Ironic node 77670189-6a91-4448-aa7d-1b8a33c3250f

But cache is still being used, has the “Power State” not been updated yet?
2019-05-27 08:48:17.357 8 DEBUG nova.virt.ironic.driver [req-729a7d94-ccca-4a2c-803c-4a3ac52662c5 - - - - -] Using cache for node 77670189-6a91-4448-aa7d-1b8a33c3250f, age: 0.43614840507507324 _sync_node_from_cache /usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py:941

Likewise the Power On sequence looks fine:
2019-05-27 08:48:53.399 8 DEBUG nova.virt.ironic.driver [req-f09ad125-f9c2-4a08-9148-c91602cbb7a0 5491fcd93c2d48c8b2bd5b0eff94d0e4 4201997a545a4323a8edd47823ebc2df - default default] [instance: ffe50055-6d5b-46d0-aac8-0a3f5e6ecb7d] Power on called for instance power_on /usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py:1448
2019-05-27 08:48:53.613 8 DEBUG nova.virt.ironic.driver [-] [instance: ffe50055-6d5b-46d0-aac8-0a3f5e6ecb7d] Still waiting for ironic node 77670189-6a91-4448-aa7d-1b8a33c3250f to power on: power_state="power off", target_power_state="power on", provision_state="active", target_provision_state=None _log_ironic_polling /usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py:130
…
2019-05-27 08:49:03.605 8 DEBUG nova.virt.ironic.driver [-] [instance: ffe50055-6d5b-46d0-aac8-0a3f5e6ecb7d] Still waiting for ironic node 77670189-6a91-4448-aa7d-1b8a33c3250f to power on: power_state="power off", target_power_state="power on", provision_state="active", target_provision_state=None _log_ironic_polling /usr/lib/python3.6/site-packages/nova/virt/ironic/driver.py:130
2019-05-27 08:49:05.615 8 INFO nova.virt.ironic.driver [req-f09ad125-f9c2-4a08-9148-c91602cbb7a0 5491fcd93c2d48c8b2bd5b0eff94d0e4 4201997a545a4323a8edd47823ebc2df - default default] [instance: ffe50055-6d5b-46d0-aac8-0a3f5e6ecb7d] Successfully powered on Ironic node 77670189-6a91-4448-aa7d-1b8a33c3250f


So all looks like its working well, the instance is successfully transitioned, the node is power cycled correctly.  The only issue is that the 'nova list' "Power State" appears to be using a cached value.  Can you give it some time after the 'stop' to see if/when the "Power State" is updated?

Comment 8 Bob Fournier 2019-06-03 14:34:26 UTC
This doesn't seem to be an Ironic bug as the node power is being cycled correctly and reflected in the UI.  Its just that the nova "Power State" field isn't being updated in a timely manner to match the nova "Status" field.  Sending back to nova team to look at how this field is updated specifically.

Comment 9 Matthew Booth 2019-06-07 13:35:55 UTC
This looks like a bug in the nova ironic driver. Instance list is returned by n-api from the db without hitting the driver, so the incorrect value of 'Power State' reported by nova is because it's incorrect in the nova db.

In stop_instance, nova compute does:

            self._power_off_instance(context, instance, clean_shutdown)
            instance.power_state = self._get_power_state(context, instance)
            instance.vm_state = vm_states.STOPPED
            instance.task_state = None
            instance.save(expected_task_state=expected_task_state)

_power_off_instance() calls down to the driver to power off, and waits for the power off to complete. Nova compute them calls down into the driver again to get the power state, and uses the value returned by the second call to actually set the power state.

The ironic driver's power_off calls ironic's node.set_power_state, then polls ironic until the power is off. It fetches data from ironic via the driver's _validate_instance_and_node function. On success it logs "Successfully soft powered off Ironic node", which we can see in the logs:

2019-05-26 23:44:26.896 [N-CPU] 8 INFO nova.virt.ironic.driver [req-833a62ff-5638-475b-94c2-703d7c9492c8 5491fcd93c2d48c8b2bd5b0eff94d0e4 4201997a545a4323a8edd47823ebc2df - default default] [instance: 2afc7428-6ae9-4752-b24a-64d96b94850f] Successfully soft powered off Ironic node 5d21946a-b852-4af2-853e-e28410872ecc

Nova compute's _get_power_state() calls the driver's get_info(), and here is where I believe the problem is. The signature of this method is:

    def get_info(self, instance, use_cache=True):

Nova compute hasn't passed use_cache=False, so this method returns cached values. We can also verify this in the logs: this call did not make a second trip to ironic. The cache in question seems to be populated only by the ironic driver's _refresh_cache(), which is mainly updated by the request tracker loop... which hasn't been called yet. Specifically, this cache was *not* updated by power_off, so it's out of date.

The ironic driver's get_info therefore returns out of date info to nova compute, which stores it in the db, and Power Status is reported incorrectly.

Note that the exact same logic explains why subsequently, when starting the instance the power state is set to Shutdown: because the node cache was updated in the intervening period, and now reflects that the power is down.

Comment 10 Bob Fournier 2019-06-07 13:58:01 UTC
Thanks for looking at this Matt!  Switching back to HardProv, Dmitry identified this change https://review.opendev.org/#/c/640772/, which sets the cache to True as possible source of problem.

Comment 11 Bob Fournier 2019-06-07 14:28:31 UTC
I'd like to reduce the severity here if no objections - as noted this isn't an issue with starting or stopping instances but only with the reporting of the Power State field in the deprecated nova commands.  The "openstack server list" command doesn't include this field and it wouldn't even be seen.

Comment 12 Bob Fournier 2019-06-07 22:19:58 UTC
Maybe related to fix - https://review.opendev.org/#/c/636132/

Comment 13 Dmitry Tantsur 2019-09-06 14:29:59 UTC
We have explicitly disabled nova power sync in the undercloud: https://github.com/openstack/tripleo-heat-templates/blob/master/environments/undercloud.yaml#L93. If my memory serves me well, this is because otherwise Nova would force whatever power sync it thinks is correct on a node.

As of the Train (OSP 16) release we have an ability for ironic to tell nova the correct power state. Given the risks, I think re-enabling power sync is still an RFE. It does not require a lot of code, but it will require some QA coverage.

Comment 16 Bob Fournier 2020-07-14 18:59:57 UTC
It looks like the Power State is being properly reported now via 'nova list'.  This is with OSP-16.1, compose 16.1_20200625.1.

(undercloud) [stack@hardprov-dl360-g9-01 ~]$ nova list
+--------------------------------------+------+--------+------------+-------------+-----------------------------+
| ID                                   | Name | Status | Task State | Power State | Networks                    |
+--------------------------------------+------+--------+------------+-------------+-----------------------------+
| 38d656e6-cb89-40d8-a60d-03705cd56411 | dell | ACTIVE | -          | Running     | ctlplane=fe32:dead:beef::29 |
+--------------------------------------+------+--------+------------+-------------+-----------------------------+
(undercloud) [stack@hardprov-dl360-g9-01 ~]$ nova stop dell
Request to stop server dell has been accepted.

<after 10 seconds>
+--------------------------------------+------+---------+------------+-------------+-----------------------------+
| ID                                   | Name | Status  | Task State | Power State | Networks                    |
+--------------------------------------+------+---------+------------+-------------+-----------------------------+
| 38d656e6-cb89-40d8-a60d-03705cd56411 | dell | SHUTOFF | -          | Shutdown    | ctlplane=fe32:dead:beef::29 |
+--------------------------------------+------+---------+------------+-------------+-----------------------------+

(undercloud) [stack@hardprov-dl360-g9-01 ~]$ nova start dell
Request to start server dell has been accepted.

(undercloud) [stack@hardprov-dl360-g9-01 ~]$ nova list
+--------------------------------------+------+--------+------------+-------------+-----------------------------+
| ID                                   | Name | Status | Task State | Power State | Networks                    |
+--------------------------------------+------+--------+------------+-------------+-----------------------------+
| 38d656e6-cb89-40d8-a60d-03705cd56411 | dell | ACTIVE | -          | Running     | ctlplane=fe32:dead:beef::29 |
+--------------------------------------+------+--------+------------+-------------+-----------------------------+

Comment 25 errata-xmlrpc 2020-07-29 07:49:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148