Description of problem: The customer noticed that after a Nova compute shutdown, the instances' state held by the hypervisor is not updated and it seems that they're active and running when in fact they aren't. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Spawn an instance. 2. Shutdown the compute node on which instance got spawned # shutdown -h now 3. In nova list the instances are still in active state. Actual results: Instance are still in active state Expected results: Instance should be in shutdown state. Additional info:
This is something we could look to improve cosmetically in Nova. Possibly marking instances unknown if the host state is also unknown?
As discussed in earlier comments on this BZ, the host status is available in its own field: $ nova list --fields name,status,task_state,power_state,host_status +--------------------------------------+---------+---------+------------+-------------+-------------+ | ID | Name | Status | Task State | Power State | Host Status | +--------------------------------------+---------+---------+------------+-------------+-------------+ | 4xxx750b-6ba5-4284-xxxd-128c0bfda783 | vivek | ACTIVE | None | Running | UNKNOWN | +--------------------------------------+---------+---------+------------+-------------+-------------+ with the 'host_status' field being controlled by policy. If you want non-admins to be able to view the 'host_status' field, the policy.json must be adjusted. Hypervisor details cannot be known if nova-api is not able to communicate with nova-compute. At best, the status is "unknown". The 'host_status' field contains this information. Policy defaults to not revealing hypervisor details in server fields and such details are only available to admin. If the customer wishes to reveal this information to end users, they must adjust their policy.json.
To add more detail, the way that server status is shown is expected behavior and not a bug. There is code for detecting the power state of a server and updating the status accordingly, but that code runs on the compute host, in the nova-compute service. There is a periodic task that queries libvirt for the virtual machine state, and if the state is found to be powered down, the nova server status is updated. If the periodic task happens to run while the compute host is powering down, after the libvirt domain is shutdown but before nova-compute is stopped, you will see the server status updated as SHUTOFF (this is fortunate timing when this happens). Today, the only way to get additional information about the server is via the 'host_status' field, which is accessible by running the 'nova list --fields name,status,task_state,power_state,host_status' command. With 'host_status', the host status can reflect the state UNKNOWN, which indicates that the host could have been powered down. Host status will also reveal whether the hypervisor is UP or DOWN (forced_down for maintenance). Note that the policy.json must be adjusted if it is considered acceptable to expose 'host_status' to allow non-admin users: "os_compute_api:servers:show:host_status": "rule:admin_api" If the policy is not set to allow the user, they will get the error: "ERROR (CommandError): Non-existent fields are specified: [u'host_status']".
I started a thread to get feedback from the upstream community about changing the way server status is shown: http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006526.html There was some amount of support for the idea of showing server status as UNKNOWN if the underlying host status is UNKNOWN. This would not leak any details about hypervisors to end users. If the community accepts the proposal, it would require a new nova compute API microversion to add the functionality. So, the change would not be backportable. I'm working on a draft for a nova spec proposal presently.
Spec has been proposed here: https://review.opendev.org/666181
This blueprint got deferred to U upstream: https://blueprints.launchpad.net/nova/+spec/policy-rule-for-host-status-unknown
https://review.opendev.org/679181 has merged upstream.
TRAC team have stated there will be no RFEs in zstreams for OSP 17.0 so moving this to 17.1. Any questions please contact rhos-trac.
This enhancement helps cloud users determine if the reason they are unable to access an "ACTIVE" instance is because the Compute node that hosts the instance is unreachable. RHOSP administrators can now configure the `NovaShowHostStatus` and `NovaApiHostStatusPolicy` parameters to enable a custom policy that displays the "UNKNOWN" `host_status` to cloud users when they run the `openstack show server details` command, if the host Compute node is unreachable.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577