Bug 1183757 - RHOS: Unable to start a suspended instance after relationships & power states refresh
Summary: RHOS: Unable to start a suspended instance after relationships & power states...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Insight
Version: 5.3.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: GA
: 5.4.0
Assignee: Greg Blomquist
QA Contact: Jan Krocil
URL:
Whiteboard:
Depends On:
Blocks: 1187770
TreeView+ depends on / blocked
 
Reported: 2015-01-19 17:20 UTC by Jan Krocil
Modified: 2015-06-16 12:47 UTC (History)
4 users (show)

Fixed In Version: 5.4.0.0.11
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1187770 (view as bug list)
Environment:
Last Closed: 2015-06-16 12:47:58 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1100 0 normal SHIPPED_LIVE CFME 5.4.0 bug fixes, and enhancement update 2015-06-16 16:28:42 UTC

Description Jan Krocil 2015-01-19 17:20:10 UTC
Description of problem:
User is unable to start a suspended appliance after running "Relationships and power states refresh" due to "raw_power_state"

Version-Release number of selected component (if applicable):
5.3.2.6 - 5.3.2.6.20150108100920_387a856

How reproducible:
Always (RHOS3 & RHOS4)

Steps to Reproduce:
1. Spawn an instance on RHOS
2. Suspend it (the power state in CFME web UI will change to "suspended")
3. Refresh relationships and power states
4. Power state will change to "off" (raw_power_state is "SHUTDOWN", in the DB)

There is no request found in the fog.log. See Additional info for more...

Actual results:
Instance remains suspended.

Expected results:
Instance starts up.

Additional info:

Looking at the code (vmdb/app/models/vm_openstack/operations/power.rb:6-14):
```
  def raw_start
    with_provider_connection do |connection|
      case raw_power_state
      when "PAUSED"    then connection.unpause_server(ems_ref)
      when "SUSPENDED" then connection.resume_server(ems_ref)
      when "STOPPED"   then connection.start_server(ems_ref)
      end
    end
  end
```

"SHUTDOWN" is unaccounted-for.

The real issue (I think), though, is that CFME uses RHOS's "Power State" ("SHUTDOWN") as its raw_power_state instead of using RHOS's "Status".

Or if we want to keep using RHOS's Power State, the above-mentioned code should probably target it instead & when the raw_power_state is manually set after power control action invocation, it should use values of that kind as well.

Just thinking out loud...

Comment 1 Jan Krocil 2015-01-19 17:22:50 UTC
Here's a related, closed bug: https://bugzilla.redhat.com/show_bug.cgi?id=1135606

Comment 2 Jan Krocil 2015-01-19 17:27:16 UTC
Just a note:
If you do a suspend, let the instance's power state be manually set by CFME and then go for "Power > Start" without refreshing the relationships and power states (and letting the power state change to "off") the instance will start back up.

Comment 5 Greg Blomquist 2015-02-16 21:30:11 UTC
There are a few problems going on here:

1) relying on "status" from openstack is problematic because "status" can refer to transitional states as well; such as: "stopping".

2) there seems to be a discrepancy between the openstack "status" and the openstack "power_state" in this case.  The "status" shows "suspended", but the "power_state" shows "shutdown".  I'm guessing this is because of the underlying driver (libvirt) not correctly handling the "suspended" power status.  And, this is a problem because...

3) Nova is not currently handling "status"/"power_state" discrepancies when the "status" is "suspended":

https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L5912-L5918

Taking psav's original suggestion of just handling "shutdown" as if it were "suspended" is apparently the right approach.  This is risky, because I'm not altogether sure what "shutdown" is really supposed to mean in openstack.  There's already a "stopped" power state.  So, this may come back to bite us later.

Comment 6 Greg Blomquist 2015-02-16 21:30:45 UTC
d'oh, gave credit to psav, when it was jan all along!  my bad

Comment 7 Greg Blomquist 2015-02-16 22:32:25 UTC
https://github.com/ManageIQ/manageiq/pull/1731

Comment 8 CFME Bot 2015-02-17 05:10:44 UTC
New commit detected on manageiq/master:
https://github.com/ManageIQ/manageiq/commit/67fea97e94384c622e0642c23b3b662805e89ad9

commit 67fea97e94384c622e0642c23b3b662805e89ad9
Author:     Greg Blomquist <gblomqui>
AuthorDate: Mon Feb 16 16:33:23 2015 -0500
Commit:     Greg Blomquist <gblomqui>
CommitDate: Mon Feb 16 17:31:46 2015 -0500

    Resume "shutdown" instances in OpenStack
    
    There's a number of problems leading to the cause of this bug.  This fix
    attempts to handle one of those problems.  But, it's possible that this problem
    will creep up again, and we may have to deal with this same problem again later.
    
    The issue is because of three things happening at the same time:
    
    1) OpenStack does not correctly set the power_state for suspended instances.
       The vm_state is set to "suspended" while the power_state ends up as "shutdown".
       It's possible that this is specific to libvirt instances.  I'm not altogether
       certain yet.
    
    2) OpenStack does not handle discrepancies between vm_status and power_state
       when the vm_status is "suspended":
    https://github.com/openstack/nova/blob/47fc1a6e5674fadecca253629d36430ceb5c8471/nova/compute/manager.py#L5912-L5918
    
    3) ManageIQ did not previously handle starting a "shutdown" openstack instance.
    
    With this patch, ManageIQ handles "shutdown" openstack instances as if they are
    "suspended" to match the way OpenStack handled suspended libvirt instances.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1183757

 vmdb/app/models/vm_openstack/operations/power.rb                         | 1 +
 .../ems_refresh/refreshers/openstack_refresher_rhos_havana_spec.rb       | 1 +
 2 files changed, 2 insertions(+)

Comment 9 CFME Bot 2015-02-17 17:30:48 UTC
New commit detected on cfme/5.3.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=72dff993d5f8cbe9cf08aaf68218945a58e4689d

commit 72dff993d5f8cbe9cf08aaf68218945a58e4689d
Author:     Greg Blomquist <gblomqui>
AuthorDate: Mon Feb 16 16:33:23 2015 -0500
Commit:     Greg Blomquist <gblomqui>
CommitDate: Tue Feb 17 12:30:02 2015 -0500

    Resume "shutdown" instances in OpenStack
    
    There's a number of problems leading to the cause of this bug.  This fix
    attempts to handle one of those problems.  But, it's possible that this problem
    will creep up again, and we may have to deal with this same problem again later.
    
    The issue is because of three things happening at the same time:
    
    1) OpenStack does not correctly set the power_state for suspended instances.
       The vm_state is set to "suspended" while the power_state ends up as "shutdown".
       It's possible that this is specific to libvirt instances.  I'm not altogether
       certain yet.
    
    2) OpenStack does not handle discrepancies between vm_status and power_state
       when the vm_status is "suspended":
    https://github.com/openstack/nova/blob/47fc1a6e5674fadecca253629d36430ceb5c8471/nova/compute/manager.py#L5912-L5918
    
    3) ManageIQ did not previously handle starting a "shutdown" openstack instance.
    
    With this patch, ManageIQ handles "shutdown" openstack instances as if they are
    "suspended" to match the way OpenStack handled suspended libvirt instances.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1183757

 vmdb/app/models/vm_openstack/operations/power.rb                         | 1 +
 .../ems_refresh/refreshers/openstack_refresher_rhos_havana_spec.rb       | 1 +
 2 files changed, 2 insertions(+)

Comment 11 Jan Krocil 2015-04-24 10:23:08 UTC
Verified fixed in 5.4.0.0.22 - 5.4.0.0.22.20150420163946_26004d1.

running > suspend > suspended (no rel. refresh) > start > on (OK)

running > suspend > suspended > rel. refresh > off > start > on (OK)

Note:
There is also this (more general) bug, related to openstack power control:
https://bugzilla.redhat.com/show_bug.cgi?id=1115557

Comment 13 errata-xmlrpc 2015-06-16 12:47:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1100.html


Note You need to log in before you can comment on or make changes to this bug.