Bug 1306830

Summary: un-retiring a VM does not unset the retirement_state field in vmdb
Product: Red Hat CloudForms Management Engine Reporter: Colin Arnott <carnott>
Component: AutomateAssignee: Tina Fitzgerald <tfitzger>
Status: CLOSED DUPLICATE QA Contact: Taras Lehinevych <tlehinev>
Severity: high Docs Contact:
Priority: high    
Version: 5.5.0CC: benglish, carnott, dajohnso, jdeubel, jhardy, jprause, mkanoor, obarenbo, tfitzger
Target Milestone: GAKeywords: ZStream
Target Release: 5.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: vm:retirement:rest
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1309825 (view as bug list) Environment:
Last Closed: 2016-03-09 20:32:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1309825    
Attachments:
Description Flags
automate SCVMM retirement model change
none
automate tag retirement model change
none
Microsoft provider vm destroy change none

Description Colin Arnott 2016-02-11 20:11:24 UTC
Description of problem:
when you un-retire a VM by setting the retirement date to the future for a retired VM, the instance skips the modification of the `retirement_state` field in the vmdb

Version-Release number of selected component (if applicable):
cfme-5.5.2.4-1.el7cf.x86_64

How reproducible:
easily

Steps to Reproduce:
1. retire a VM
2. un-retire a VM by setting the retirement date in the future
3. confirm `retired` field is set to false

Actual results:
`retirement_state` field is set to `retired`

Expected results:
`retirement_state` field is unset

Comment 1 mkanoor 2016-02-11 22:30:40 UTC
With the default database and the default retirement state machine the VM is deleted from the VMDB.
In this use case are you using a customized state machine or an old legacy retirement model.

Comment 2 Colin Arnott 2016-02-11 22:49:57 UTC
I am using a freshly installed CloudForms 4.0 instance with the default state machine and retirement model.

Comment 4 mkanoor 2016-02-12 14:30:31 UTC
Can you please send us a log after you have just done the retirement.
After retirement the vm instance should get deleted from the database. If that is not happening we might have an issue in retirement itself.

Comment 5 Jared Deubel 2016-02-12 15:20:58 UTC
Madhu, 
Based on default retirement process we should just mark the VM as retired (R) and then not allow anybody to start it including the VCenter, it should not delete the VM from the vmdb unless specified in the state machine as shown in (https://access.redhat.com/documentation/en/red-hat-cloudforms/4.0/provisioning-virtual-machines-and-hosts/chapter-6-retirement). 

It seems that this is not the case and since we are not changing the value the UI is showing (R) which is causing confusion in the customer environment. 

We can provide the logs if needed please let us know.

Comment 6 Colin Arnott 2016-02-12 19:50:06 UTC
So, I have tested this issue further and there appear to be some inconsistencies with the handling of retirement across the infrastructure providers:

  rhev) remove the vm from rhev, and remove it from vmdb
 scvmm) set the VM state to `retired` and do not modify the state scvmm is in
vmware) remove the vm from vmware, and archive the vm in CloudForms, but still in the `retiring` state
   osp) not tested

This seems like a bug itself, but if we assume that the scvmm behavior is default we can test with that.

As such, I made the following pull request to assist in fixing the code:
https://github.com/ManageIQ/manageiq/pull/6649

Comment 8 mkanoor 2016-02-15 15:15:13 UTC
Jared,
Can you please get this information from the customer.
(a) Are they using the default State Machine which deletes the VM from VMDB?
(b) Are they using their old State Machines from a previous version?
(c) Which provider are they using (SCVMM or VMWare)
(d) Can we get logs both evm.log and automation.log




Thanks,
Madhu

Comment 9 Tina Fitzgerald 2016-02-15 20:34:21 UTC
Hi Colin,

We think the customers problem is caused by issues previously reported and fixed in the upcoming release.  Would you apply the attached fixes in your environment to validate that the VMs are removed during retirement?

To apply fixes:
1. Unlock the ManageIQ domain. 
  Using db access: (miq_ae_namespace table, name = ManageIQ, system = true, we need to set system to false)
  Using rails console:    MiqAeNamespace.find_by_name("ManageIQ").update_attributes(:system => false)
2. Using the Automate Explorer, rename the ManageIQ domain to preserve the original version.
3. Stop server.
4. Extract tar files contents. (3 files) 
5. Restart server.
6. Retirement SCVMM instance and validate that instance has been removed from the provider and that it no longer exists in the VMDB.

Fixes:
automate_retirement_scvmm_changes.tar 
db/fixtures/ae_datastore/ManageIQ/Infrastructure/VM/Retirement/StateMachines/Methods.class/checkpreretirement.yaml
db/fixtures/ae_datastore/ManageIQ/Infrastructure/VM/Retirement/StateMachines/Methods.class/preretirement.yaml

automate_retirement_tag_change.tar 
 db/fixtures/ae_datastore/ManageIQ/Cloud/VM/Retirement/StateMachines/Methods.class/__methods__/remove_from_provider.rb
 db/fixtures/ae_datastore/ManageIQ/Infrastructure/VM/Retirement/StateMachines/Methods.class/__methods__/remove_from_provider.rb

model_vm_destroy_change.tar 
app/models/manageiq/providers/microsoft/infra_manager.rb

Let me know if you have any questions.

Thanks,
Tina

Comment 10 Tina Fitzgerald 2016-02-15 20:37:19 UTC
Created attachment 1127424 [details]
automate SCVMM retirement model change

Comment 11 Tina Fitzgerald 2016-02-15 20:37:56 UTC
Created attachment 1127425 [details]
automate tag retirement model change

Comment 12 Tina Fitzgerald 2016-02-15 20:38:55 UTC
Created attachment 1127426 [details]
Microsoft provider vm destroy change

Comment 14 Colin Arnott 2016-02-17 16:58:06 UTC
Tina,

Can we open up a second bug to track this SCVMM issue, as it does not impact the client. Their issue was only with the retirement_state field in vmdb, and that should be fixed by Madhu's pull upstream.

I am happy to test out your fix, but it would be against a plain cfme-5.5 box, not anything related to the client.

Comment 15 Tina Fitzgerald 2016-02-17 19:58:46 UTC
Hi Colin,

The pull request that Madhu created was to fix a symptom of the problem.  

The default(ManageIQ domain) retirement process is designed to do the following steps:
1. Preretirement - (Powers off a powered on VM).
2. Checkpreretirement - (Check that the VM is powered off, wait and retry if necessary).
3. Remove the VM from the provider.
4. Send email regarding the retired VM.  
5. Remove the VM from the VMDB.

Since there are multiple providers, code changes are sometimes necessary for the provider specific functionality required to handle the power off and removal from provider.

This is the issue the customer is experiencing while trying to retire the SCVMM VMs.

There are several pieces to the issue.
1. The customer is missing automate specific instance changes to call the preretirement(power off) and check pre retirement(check powered off) methods.

symptom: The issue would present itself through the VM entering retirement in a powered on state.(Unless it had been already powered off outside of retirement.) 

fix: The "automate SCVMM retirement model change" attachment addresses this issue.

2. The customer is missing SCVMM specific code to remove the VM from the provider.

symptom: The VM would NOT get removed from the provider.

fix: The "Microsoft provider vm destroy change" attachment addresses this issue.

3. The issue is that a VM not created by our provision process will not be fully retired and removed from the provider.

*Note - I'm not sure if this is an issue with the customer, but the fix is small and worth including here.

symptom: The retirement process would end without error, but the VM would be left in the provider and in the VMDB.   

fix: The "automate tag retirement model change" addresses this issue.

The fix from Madhu would not be necessary if the retirement process was working properly.

Let me know if you have any questions.

Thanks,
Tina

Comment 16 mkanoor 2016-02-17 20:02:08 UTC
This ticket is the same as https://bugzilla.redhat.com/show_bug.cgi?id=1297351

Comment 17 CFME Bot 2016-02-23 20:15:55 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/8bc5f47daafb7f37057bdf31a1a5ec2b59fd664c

commit 8bc5f47daafb7f37057bdf31a1a5ec2b59fd664c
Author:     Madhu Kanoor <mkanoor>
AuthorDate: Mon Feb 15 15:59:47 2016 -0500
Commit:     Madhu Kanoor <mkanoor>
CommitDate: Tue Feb 23 13:38:09 2016 -0500

    Reset retirement_state
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1306830
    
    If the retirment date/time is changed, reset the retirment_state
    to nil, log a warning message if someone changes the date/time
    when a retirment is in progress.

 app/models/mixins/retirement_mixin.rb        | 14 +++++++++-----
 spec/models/vm/retirement_management_spec.rb | 20 ++++++++++++++++++++
 2 files changed, 29 insertions(+), 5 deletions(-)

Comment 18 CFME Bot 2016-02-23 20:15:59 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/00025318ccdf7b7551132be551579971fd5313bc

commit 00025318ccdf7b7551132be551579971fd5313bc
Author:     Madhu Kanoor <mkanoor>
AuthorDate: Mon Feb 22 15:32:54 2016 -0500
Commit:     Madhu Kanoor <mkanoor>
CommitDate: Tue Feb 23 13:38:09 2016 -0500

    PR Review
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1306830

 spec/models/vm/retirement_management_spec.rb | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

Comment 19 CFME Bot 2016-02-23 20:16:05 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/7124a0e89c001215dd1a4017ef59847df15f4276

commit 7124a0e89c001215dd1a4017ef59847df15f4276
Author:     Madhu Kanoor <mkanoor>
AuthorDate: Tue Feb 23 11:16:05 2016 -0500
Commit:     Madhu Kanoor <mkanoor>
CommitDate: Tue Feb 23 13:38:09 2016 -0500

    Rubocop issues
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1306830

 app/models/mixins/retirement_mixin.rb        | 6 +++---
 spec/models/vm/retirement_management_spec.rb | 6 ++----
 2 files changed, 5 insertions(+), 7 deletions(-)

Comment 20 Tina Fitzgerald 2016-02-29 17:49:02 UTC
Hi Colin,

Can you validate the information provided in Comment 15 resolves the customer reported issue in your environment?

The fixes made by Madhu are not intended to resolve this issue.

Let me know if you have any questions.

Thanks,
Tina

Comment 21 Colin Arnott 2016-02-29 18:05:46 UTC
Yes, the customer's issue has been resolved. Thank you for your assistance.

Comment 22 Colin Arnott 2016-03-01 19:04:43 UTC
I have installed your patches on my test appliance and I was able to retire a VM. However I appear to not be able to start or stop instances and the retirement did not stop or remove the VM.

To best assist you, I will give you access to this machine (out of band), so you can see exactly what is going on. Let me know if you have any other questions.

Comment 23 Tina Fitzgerald 2016-03-08 15:22:33 UTC
Hi Colin,

Did the fixes resolve the customer issue?

Thanks,
Tina


> On Tue, Mar 1, 2016 at 2:07 PM, Colin Arnott <carnott> wrote:
>
>> Colin Arnott:
>>> admin:smartvm.redhat.com
>>>
>>> Let me know if you need anything else.
>>>
>> The machine that was used to test this issue is 'patrick-rhel-6.6' on
>> the 'cf-scvmm-1' provider.
>>
>> --
>> Colin Arnott | Associate Technical Support Engineer
>> US CEE Cloud SD
>> Red Hat, Inc.
>>
>> Better technology. Faster innovation. Powered by community collaboration.
>> See how it works at redhat.com
>>
>

---------- Forwarded message ----------
From: Tina Fitzgerald <tfitzger>
Date: Tue, Mar 1, 2016 at 4:13 PM
Subject: Re: [Bug 1306830] reproducer credentials
To: Colin Arnott <carnott>


Hi Colin,

Thanks for your email.

I checked your environment and the ManageIQ model changes are not in your ManageIQ domain. The only change I see in the code change for the vm_destroy in the "Microsoft provider vm destroy change" attachment. 

Your environment is missing:
"automate SCVMM retirement model change"
"automate tag retirement model change"
Retirement should work properly once those fixes are in place.
Let me know if you need help setting it up. 

Thanks,
Tina


--------- Forwarded message ----------
From: Tina Fitzgerald <tfitzger>
Date: Tue, Mar 1, 2016 at 5:42 PM
Subject: Re: [Bug 1306830] reproducer credentials
To: Colin Arnott <carnott>


Hi Colin,

I can see the files in the proper location.  The ManageIQ domain doesn't get loaded if there is already a ManageIQ domain in the datastore.

Can you rename the ManageIQ domain through the Automate Explorer and restart the server? That will create a new ManageIQ domain that should have the correct model changes.

Thanks,
Tina


that is strange, I have stopped the evm server, re-extracted the tar
archives (they are stored at /tmp/), and restarted the evm server; am I
missing a step, or is there something else wrong?

For my exact actions, look in the bash history when you ssh in.

--
Colin Arnott | Associate Technical Support Engineer
US CEE Cloud SD
Red Hat, Inc.

Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Comment 24 Colin Arnott 2016-03-08 16:25:45 UTC
Tina,

The customer stopped responding to the ticket, but last I spoke they were content with the solution that was provided.

It looks like the fix you gave me had some issues, and my results a lab machine are above.

Comment 25 Tina Fitzgerald 2016-03-09 20:31:41 UTC
This issue has been resolved by fixes contained in these tickets:

https://bugzilla.redhat.com/show_bug.cgi?id=1297351
https://bugzilla.redhat.com/show_bug.cgi?id=1299069

Comment 26 Tina Fitzgerald 2016-03-09 20:32:18 UTC

*** This bug has been marked as a duplicate of bug 1297351 ***