Bug 1377415

Summary: Can not retire existing VMs
Product: Red Hat CloudForms Management Engine Reporter: Chen <cchen>
Component: AutomateAssignee: William Fitzgerald <wfitzger>
Status: CLOSED NOTABUG QA Contact: Dave Johnson <dajohnso>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.6.0CC: cchen, jhardy, mkanoor, obarenbo, tfitzger
Target Milestone: GA   
Target Release: 5.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-21 07:18:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Chen 2016-09-19 15:46:02 UTC
Description of problem:

Can not retire existing VMs 

Version-Release number of selected component (if applicable):

CFME 5.6.1.2-20160810181333_8ba817b
rhevm-3.5.1.1-0.1.el6ev.noarch

How reproducible:

100%

Steps to Reproduce:
1. Connect the CFME to RHEV-3.5 which has some existing VMs
2. Tag one VM with "lifecycle/Fully retire VM and remove from Provider" tag
3. Retire that VM from CFME

Actual results:

The VM can not be retired and the automate job will fail after 100 trials

Expected results:

The VM should be able to be retired

Additional info:

CheckRemovedFromProvider is always showing as retry.

[----] I, [2016-09-19T10:50:50.033202 #13580:d59990]  INFO -- : Followed  Relationship [miqaedb:/Infrastructure/VM/Retirement/StateMachines/Methods/CheckRemovedFromProvider#create]
[----] I, [2016-09-19T10:50:50.033318 #13580:d59990]  INFO -- : Processed  State=[CheckRemovedFromProvider] with Result=[retry]

There is no problems with the VMs which were provisioned through CFME.

Comment 3 William Fitzgerald 2016-09-19 21:18:16 UTC
Chen,

I looked at the logs.  I logged in and tried to retire the VM that failed.  It was just starting the check_removed_from_provider retry loop.

I saw these errors in the evm.log at the same time retirement was in the retry loop.
I am going to research these messages.
 
[----] E, [2016-09-19T17:00:27.639073 #31714:175a3e0] ERROR -- : <API> MIQ(ApiController.api_error) API Error
[----] E, [2016-09-19T17:00:27.639185 #31714:175a3e0] ERROR -- : <API> MIQ(ApiController.api_error) ApiController::AuthenticationError: Invalid Authentication Token 53045c1b453459ce2c16b0718060771d specified
[----] I, [2016-09-19T17:00:28.016440 #31587:108798c]  INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::EventCatcher::Runner#process_event) EMS [10.66.219.172] as [admin@internal] Caught event [UNKNOWN]
[----] I, [2016-09-19T17:00:28.024454 #31587:108798c]  INFO -- : MIQ(MiqQueue.put) Message id: [1000000021523],  id: [], Zone: [default], Role: [event], Server: [], Ident: [ems], Target id: [1000000000001], Instance id: [], Task id: [], Command: [EmsEvent.add_rhevm], Timeout: [600], Priority: [100], State: [ready], Deliver On: [], Data: [], Args: [{:id=>"223502", :href=>"/api/events/223502", :description=>"ETL Service Started", :severity=>"normal", :code=>9700, :time=>2016-09-19 09:01:02 -0400, :name=>"UNKNOWN"}]
[----] I, [2016-09-19T17:00:28.024537 #31587:108798c]  INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::EventCatcher::Runner#process_event) EMS [10.66.219.172] as [admin@internal] Caught event [UNKNOWN]
[----] I, [2016-09-19T17:00:28.029273 #31587:108798c]  INFO -- : MIQ(MiqQueue.put) Message id: [1000000021524],  id: [], Zone: [default], Role: [event], Server: [], Ident: [ems], Target id: [1000000000001], Instance id: [], Task id: [], Command: [EmsEvent.add_rhevm], Timeout: [600], Priority: [100], State: [ready], Deliver On: [], Data: [], Args: [{:id=>"223503", :href=>"/api/events/223503", :description=>"ETL service start has encountered an error. Please consult the service log for more details.", :severity=>"error", :code=>9704, :time=>2016-09-19 09:01:02 -0400, :name=>"UNKNOWN"}]
[----] I, [2016-09-19T17:00:28.029351 #31587:108798c]  INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::EventCatcher::Runner#process_event) EMS [10.66.219.172] as [admin@internal] Caught event [UNKNOWN]
[----] I, [2016-09-19T17:00:28.034497 #31587:108798c]  INFO -- : MIQ(MiqQueue.put) Message id: [1000000021525],  id: [], Zone: [default], Role: [event], Server: [], Ident: [ems], Target id: [1000000000001], Instance id: [], Task id: [], Command: [EmsEvent.add_rhevm], Timeout: [600], Priority: [100], State: [ready], Deliver On: [], Data: [], Args: [{:id=>"223504", :href=>"/api/events/223504", :description=>"ETL Service Stopped", :severity=>"normal", :code=>9701, :time=>2016-09-19 09:01:02 -0400, :name=>"UNKNOWN"}]


Billy

Comment 4 William Fitzgerald 2016-09-19 21:42:58 UTC
Chen,

From the evm.log, you are having a problem with this provider.  

You need to fix the provider problem before CheckRemovedFromProvider will

be successful.

Billy

Comment 5 Chen 2016-09-20 01:27:49 UTC
Hi Billy,

Thank you very much for your reply.

>[----] E, [2016-09-19T17:00:27.639073 #31714:175a3e0] ERROR -- : <API> MIQ(ApiController.api_error) API Error
>[----] E, [2016-09-19T17:00:27.639185 #31714:175a3e0] ERROR -- : <API> MIQ(ApiController.api_error) ApiController::AuthenticationError: Invalid Authentication Token 53045c1b453459ce2c16b0718060771d specified

But why the VMs which were provisioned by CFME had no problem to be retired ? 

Best Regards,
Chen

Comment 6 William Fitzgerald 2016-09-20 15:41:08 UTC
Chen,

  Not sure why the other Vm's had no problem.  I suspect that the provider was ok when you provisioned but encountered an error afterwards.

Can you provision a Vm and retire it immediately?

Thanks

Billy

Comment 7 Chen 2016-09-21 01:01:55 UTC
Hi Billy,

What I did was:

1. Provision a VM from CFME and retire it - Success
2. Retire an existing VM - Failed
3. Repeat 1 again - Success
4. Repeat 2 again - Failed

The log shows ETL service is not running. And I found that ovirt-engine-dwhd can not start and throwing a java Null Point exception. I think I need to fix that first. But I'm curious why there is different behaviour between the provisioned VMs and existing VMs. I mean why existing VMs need ETL service...

I will try VMware provider as well.

Best Regards,
Chen

Comment 8 Chen 2016-09-21 07:18:16 UTC
Hi Billy,

There is no problem for VMware. So I believe the problem was with RHEV provider.

Thank you for your help and I will close the bug.

Best Regards,
Chen