Bug 1672972

Summary: [RFE] After a Nova compute shutdown, the instances' state held by the hypervisor is not updated
Product: Red Hat OpenStack Reporter: vivek koul <vkoul>
Component: openstack-novaAssignee: melanie witt <mwitt>
Status: CLOSED ERRATA QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: low    
Version: 10.0 (Newton)CC: alifshit, bdobreli, dasmith, egallen, eglynn, greartes, igallagh, jamsmith, jhakimra, joflynn, jparker, kchamart, lyarwood, mariel, mwitt, sbauza, scohen, sgordon, skovili, spower, stephenfin, vromanso
Target Milestone: gaKeywords: FutureFeature, Triaged, ZStream
Target Release: 17.1Flags: ifrangs: needinfo? (mwitt)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-23.2.2-1.20221021161917.a9e8162.el9ost Doc Type: Enhancement
Doc Text:
This enhancement helps cloud users determine if the reason they are unable to access an "ACTIVE" instance is because the Compute node that hosts the instance is unreachable. RHOSP administrators can now configure the following parameters to enable a custom policy that provides a status in the `host_status` field to cloud users when they run the `openstack show server details` command, if the host Compute node is unreachable: + * `NovaApiHostStatusPolicy`: Specifies the role the custom policy applies to. * `NovaShowHostStatus`: Specifies the level of host status to show to the cloud user, for example, "UNKNOWN".
Story Points: ---
Clone Of:
: 1994072 (view as bug list) Environment:
Last Closed: 2023-08-16 01:09:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: Ussuri
Embargoed:
Bug Depends On:    
Bug Blocks: 1994072, 2150082    

Description vivek koul 2019-02-06 11:15:25 UTC
Description of problem: 
The customer noticed that after a Nova compute shutdown, the instances' state held by the hypervisor is not updated and it seems that they're active and running when in fact they aren't.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Spawn an instance.
2. Shutdown the compute node on which instance got spawned
     # shutdown -h now
3. In nova list the instances are still in active state.

Actual results:
Instance are still in active state

Expected results:
Instance should be in shutdown state.

Additional info:

Comment 2 Matthew Booth 2019-02-08 15:40:51 UTC
This is something we could look to improve cosmetically in Nova. Possibly marking instances unknown if the host state is also unknown?

Comment 10 melanie witt 2019-05-08 16:25:21 UTC
As discussed in earlier comments on this BZ, the host status is available in its own field:

$ nova list --fields name,status,task_state,power_state,host_status

+--------------------------------------+---------+---------+------------+-------------+-------------+
| ID                                   | Name    | Status  | Task State | Power State | Host Status |
+--------------------------------------+---------+---------+------------+-------------+-------------+
| 4xxx750b-6ba5-4284-xxxd-128c0bfda783 | vivek   | ACTIVE  | None       | Running     | UNKNOWN     |
+--------------------------------------+---------+---------+------------+-------------+-------------+

with the 'host_status' field being controlled by policy. If you want non-admins to be able to view the 'host_status' field, the policy.json must be adjusted.

Hypervisor details cannot be known if nova-api is not able to communicate with nova-compute. At best, the status is "unknown". The 'host_status' field contains this information. Policy defaults to not revealing hypervisor details in server fields and such details are only available to admin. If the customer wishes to reveal this information to end users, they must adjust their policy.json.

Comment 11 melanie witt 2019-06-06 22:26:02 UTC
To add more detail, the way that server status is shown is expected behavior and not a bug. There is code for detecting the power state of a server and updating the status accordingly, but that code runs on the compute host, in the nova-compute service. There is a periodic task that queries libvirt for the virtual machine state, and if the state is found to be powered down, the nova server status is updated. If the periodic task happens to run while the compute host is powering down, after the libvirt domain is shutdown but before nova-compute is stopped, you will see the server status updated as SHUTOFF (this is fortunate timing when this happens).

Today, the only way to get additional information about the server is via the 'host_status' field, which is accessible by running the 'nova list --fields name,status,task_state,power_state,host_status' command.

With 'host_status', the host status can reflect the state UNKNOWN, which indicates that the host could have been powered down. Host status will also reveal whether the hypervisor is UP or DOWN (forced_down for maintenance). Note that the policy.json must be adjusted if it is considered acceptable to expose 'host_status' to allow non-admin users:

  "os_compute_api:servers:show:host_status": "rule:admin_api"

If the policy is not set to allow the user, they will get the error: "ERROR (CommandError): Non-existent fields are specified: [u'host_status']".

Comment 12 melanie witt 2019-06-06 22:33:24 UTC
I started a thread to get feedback from the upstream community about changing the way server status is shown:

http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006526.html

There was some amount of support for the idea of showing server status as UNKNOWN if the underlying host status is UNKNOWN. This would not leak any details about hypervisors to end users.

If the community accepts the proposal, it would require a new nova compute API microversion to add the functionality. So, the change would not be backportable.

I'm working on a draft for a nova spec proposal presently.

Comment 13 melanie witt 2019-06-18 22:37:17 UTC
Spec has been proposed here: https://review.opendev.org/666181

Comment 18 melanie witt 2019-09-09 21:03:11 UTC
This blueprint got deferred to U upstream:

https://blueprints.launchpad.net/nova/+spec/policy-rule-for-host-status-unknown

Comment 20 melanie witt 2019-11-04 23:20:53 UTC
https://review.opendev.org/679181 has merged upstream.

Comment 27 spower 2022-07-05 15:08:41 UTC
TRAC team have stated there will be no RFEs in zstreams for OSP 17.0 so moving this to 17.1. Any questions please contact rhos-trac.

Comment 49 Joanne O'Flynn 2023-06-09 09:30:14 UTC
This enhancement helps cloud users determine if the reason they are unable to access an "ACTIVE" instance is because the Compute node that hosts the instance is unreachable. RHOSP administrators can now configure the `NovaShowHostStatus` and `NovaApiHostStatusPolicy` parameters to enable a custom policy that displays the "UNKNOWN" `host_status` to cloud users when they run the `openstack show server details` command, if the host Compute node is unreachable.

Comment 57 errata-xmlrpc 2023-08-16 01:09:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577