Bug 1374995

Summary: [RHV 4.0] Provision VM ends up with "Validating New Vm" endless retries
Product: Red Hat CloudForms Management Engine Reporter: Ilanit Stein <istein>
Component: ProvidersAssignee: Piotr Kliczewski <pkliczew>
Status: CLOSED CURRENTRELEASE QA Contact: Ilanit Stein <istein>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 5.7.0CC: jfrey, jhardy, obarenbo, simaishi, ykaul
Target Milestone: GAKeywords: Regression, TestOnly, ZStream
Target Release: 5.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: rhev
Fixed In Version: 5.7.0.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1376528 (view as bug list) Environment:
Last Closed: 2017-01-11 19:55:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1359139    
Bug Blocks: 1376528    
Attachments:
Description Flags
evm.log
none
CFME_rhevm.log
none
RHV_4.0.4_engine.log
none
RHV_4.0.3_engine.log none

Description Ilanit Stein 2016-09-11 15:03:09 UTC
Description of problem:

Provision VM from template, from CF side ends up with VM created on RHV side,
but for this VM there is no request for start from CF side.
The Request seem remain 'Active', and has "Validating New Vm" endless retries.

Last Message: "[EVM] VM [<vm name>] Step [CheckProvisioned] Status [Creating VM] Message [Validating New Vm] Current Retry Number [47]" 

Version-Release number of selected component (if applicable):
CFME 5.7.0.0.20160906172503_12f65cb
RHV-4.0.4.2-0.1.el7ev

How reproducible:
Tried to provision VM 3 times, and got the same result.

Steps to Reproduce:
1.Provision VM from template on RHV-4.0.4

Actual results:
VM is created but the request seem to be stuck in "Validating New Vm" endless retries. 

Expected results:
VM should be created and started in a reasonable time of several minutes.

Additional info:

For RHEVM-3.6, from CF, Provision VM from a template, ends up successfully. A VM is created, and also started. Request's last message: [EVM] VM [test_prov_boot] Provisioned Successfully,

For RHV-4.0.3-0.1.el7ev, From CF, provision VM from template, ends up with ERROR:
[EVM] VM [test_prov] Step [CheckProvisioned] Status [[Ovirt::MissingResourceError]: Ovirt::MissingResourceError] Message [[Ovirt::MissingResourceError]: Ovirt::MissingResourceError]
VM from template is created, but CFME do not send a request to start it.

For the same template, from RHV UI, create VM from template, add nic, and start works fine.

Comment 2 Ilanit Stein 2016-09-11 15:20:14 UTC
Created attachment 1199885 [details]
evm.log

* RHEVM-3.6 successful provision:
Request ID 1000000000001
[EVM] VM [test_prov_boot] Provisioned Successfully
Created On Sun, 11 Sep 2016 03:51:18 +0000
Last Update Sun, 11 Sep 2016 03:54:15 +0000
Completed Sun, 11 Sep 2016 03:54:15 +0000

* RHV-4.0.3 
Request ID 1000000000002
[EVM] VM [test_prov] Step [CheckProvisioned] Status [[Ovirt::MissingResourceError]: Ovirt::MissingResourceError] Message [[Ovirt::MissingResourceError]: Ovirt::MissingResourceError]
Created On Sun, 11 Sep 2016 07:05:13 +0000
Last Update Sun, 11 Sep 2016 07:06:51 +0000
Completed Sun, 11 Sep 2016 07:05:52 +0000


* RHV-4.0.4
Request ID 1000000000003
Last Message [EVM] VM [istein] Step [CheckProvisioned] Status [State=<CheckProvisioned> running raised exception: <number of retries <101> exceeded maximum of <100>>] Message [Validating New Vm]
Created On Sun, 11 Sep 2016 09:09:17 +0000
Last Update Sun, 11 Sep 2016 11:13:51 +0000
Completed Sun, 11 Sep 2016 11:13:51 +0000

(istein  VM has been deleted manually, while there were retries like:
Last Message [EVM] VM [istein] Step [CheckProvisioned] Status [Creating VM] Message [Validating New Vm] Current Retry Number [83] ....)

VMs istein, istein, istein_1, istein_2 (Request IDs: 1000000000004, 1000000000005, 1000000000005) all failed the same.

Comment 3 Ilanit Stein 2016-09-11 15:31:24 UTC
Created attachment 1199886 [details]
CFME_rhevm.log

Comment 4 Ilanit Stein 2016-09-11 15:35:37 UTC
Created attachment 1199887 [details]
RHV_4.0.4_engine.log

Comment 5 Ilanit Stein 2016-09-11 15:38:31 UTC
Created attachment 1199888 [details]
RHV_4.0.3_engine.log

Comment 6 Ilanit Stein 2016-09-15 05:57:35 UTC
I see same problem with versions: 
RHV-4.0.3-0.1.el7ev
CFME-5.6.1.2.20160810181333_8ba817b
VM is created, provision fail on CFME side, and VM is not run. Running it manually works fine.

Comment 7 Piotr Kliczewski 2016-09-15 09:11:19 UTC
Here are errors in the logs:

I see two types of issues on manageiq sied:

1. User info fetch fails

evm.log

[----] E, [2016-09-11T10:21:29.382488 #13495:12d1148] ERROR -- : <RHEVM> Ovirt::Service#parse_error_response Return from URL: <https://jenkins-vm-09.scl.lab.tlv.redhat.com/ovirt-engine/api/users/4fa6316f-4f28-4ed1-b454-37183141f4eb> Data:
[----] E, [2016-09-11T10:21:29.383134 #13495:12d1148] ERROR -- : MIQ(MiqQueue#deliver) Message id: [1000000007193], Error: [Ovirt::MissingResourceError]
[----] E, [2016-09-11T10:21:29.383430 #13495:12d1148] ERROR -- : [Ovirt::MissingResourceError]: Ovirt::MissingResourceError  Method:[rescue in deliver]
[----] E, [2016-09-11T10:21:29.383524 #13495:12d1148] ERROR -- : /opt/rh/cfme-gemset/gems/ovirt-0.13.0/lib/ovirt/service.rb:311:in `parse_error_response'
/opt/rh/cfme-gemset/gems/ovirt-0.13.0/lib/ovirt/service.rb:282:in `block in resource_verb'
/opt/rh/cfme-gemset/gems/rest-client-2.0.0/lib/restclient/request.rb:858:in `process_result'
/opt/rh/cfme-gemset/gems/rest-client-2.0.0/lib/restclient/request.rb:776:in `block in transmit'
/opt/rh/rh-ruby23/root/usr/share/ruby/net/http.rb:853:in `start'
/opt/rh/cfme-gemset/gems/rest-client-2.0.0/lib/restclient/request.rb:766:in `transmit'
/opt/rh/cfme-gemset/gems/rest-client-2.0.0/lib/restclient/request.rb:215:in `execute'
/opt/rh/cfme-gemset/gems/rest-client-2.0.0/lib/restclient/request.rb:52:in `execute'
/opt/rh/cfme-gemset/gems/rest-client-2.0.0/lib/restclient/resource.rb:51:in `get'
/opt/rh/cfme-gemset/gems/ovirt-0.13.0/lib/ovirt/service.rb:277:in `resource_verb'
/opt/rh/cfme-gemset/gems/ovirt-0.13.0/lib/ovirt/service.rb:251:in `resource_get'
/opt/rh/cfme-gemset/gems/ovirt-0.13.0/lib/ovirt/base.rb:231:in `find_by_href'
/var/www/miq/vmdb/app/models/manageiq/providers/redhat/infra_manager/event_parser.rb:38:in `block in event_to_hash'

rhevm.log

[----] E, [2016-09-11T10:21:29.382612 #13495:12d1148] ERROR -- : Ovirt::Service#parse_error_response Return from URL: <https://jenkins-vm-09.scl.lab.tlv.redhat.com/ovirt-engine/api/users/4fa6316f-4f28-4ed1-b454-37183141f4eb> Data:


2, Vm fetch fails

evm.log

[----] E, [2016-09-11T10:28:32.544122 #39407:12d1148] ERROR -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Refresher#refresh) EMS: [vm09], id: [1000000000003] Refresh failed
[----] E, [2016-09-11T10:28:32.544314 #39407:12d1148] ERROR -- : [NoMethodError]: undefined method `detect' for nil:NilClass  Method:[rescue in block in refresh]
[----] E, [2016-09-11T10:28:32.544487 #39407:12d1148] ERROR -- : /var/www/miq/vmdb/app/models/manageiq/providers/redhat/infra_manager/refresh_parser.rb:223:in `block in host_inv_to_switch_hashes'
/var/www/miq/vmdb/app/models/manageiq/providers/redhat/infra_manager/refresh_parser.rb:220:in `each'
/var/www/miq/vmdb/app/models/manageiq/providers/redhat/infra_manager/refresh_parser.rb:220:in `host_inv_to_switch_hashes'
/var/www/miq/vmdb/app/models/manageiq/providers/redhat/infra_manager/refresh_parser.rb:103:in `block in host_inv_to_hashes'
/var/www/miq/vmdb/app/models/manageiq/providers/redhat/infra_manager/refresh_parser.rb:78:in `each'
/var/www/miq/vmdb/app/models/manageiq/providers/redhat/infra_manager/refresh_parser.rb:78:in `host_inv_to_hashes'
/var/www/miq/vmdb/app/models/manageiq/providers/redhat/infra_manager/refresh_parser.rb:12:in `ems_inv_to_hashes'


rhevm.log

[----] E, [2016-09-11T10:28:32.127919 #39407:bd7b56c] ERROR -- : Ovirt::Service#parse_error_response Return from URL: <https://jenkins-vm-09.scl.lab.tlv.redhat.com/ovirt-engine/api/vms/08b65709-173c-44a5-851d-e760abda521d> Data:


I see both issues occurring from time to time


In the engine log I see two types of failures:

2016-09-10 10:12:52,023 ERROR [org.ovirt.engine.core.sso.utils.SsoUtils] (default task-15) [] OAuthException invalid_scope: The requested scope '[ovirt-ext=revoke:revoke-all]' is invalid, unknown, malformed, or exceeds the scope granted by the resource owner.
2016-09-10 10:12:52,192 ERROR [org.ovirt.engine.core.sso.utils.SsoUtils] (default task-12) [] OAuthException invalid_scope: The requested scope '[ovirt-ext=revoke:revoke-all]' is invalid, unknown, malformed, or exceeds the scope granted by the resource owner.

and

2016-09-10 10:33:17,422 ERROR [org.ovirt.engine.api.restapi.util.LinkHelper] (default task-63) [] Can't find relative path for class "org.ovirt.engine.api.resource.VmDisksResource", will return null
2016-09-10 10:33:17,423 ERROR [org.ovirt.engine.api.restapi.util.LinkHelper] (default task-63) [] Can't find relative path for class "org.ovirt.engine.api.resource.VmDisksResource", will return null
2016-09-10 10:33:17,508 ERROR [org.ovirt.engine.api.restapi.util.LinkHelper] (default task-28) [] Can't find relative path for class "org.ovirt.engine.api.resource.VmDisksResource", will return null
2016-09-10 10:33:17,508 ERROR [org.ovirt.engine.api.restapi.util.LinkHelper] (default task-28) [] Can't find relative path for class "org.ovirt.engine.api.resource.VmDisksResource", will return null
2016-09-10 10:33:17,553 ERROR [org.ovirt.engine.api.restapi.util.LinkHelper] (default task-60) [] Can't find relative path for class "org.ovirt.engine.api.resource.VmDisksResource", will return null
2016-09-10 10:33:17,553 ERROR [org.ovirt.engine.api.restapi.util.LinkHelper] (default task-60) [] Can't find relative path for class "org.ovirt.engine.api.resource.VmDisksResource", will return null
2016-09-10 10:33:17,694 ERROR [org.ovirt.engine.api.restapi.util.LinkHelper] (default task-61) [] Can't find relative path for class "org.ovirt.engine.api.resource.TemplateDisksResource", will return null
2016-09-10 10:33:17,694 ERROR [org.ovirt.engine.api.restapi.util.LinkHelper] (default task-61) [] Can't find relative path for class "org.ovirt.engine.api.resource.TemplateDisksResource", will return null

Still investigating

Comment 8 Piotr Kliczewski 2016-09-15 12:37:35 UTC
This issue is related to BZ #1359139 and it was fixed in 4.0.4.

There is still issue with stating provisioned vm. When vm is stored in db using wrong url:

[----] I, [2016-09-15T08:14:35.945656 #1845:e85990]  INFO -- : MIQ(EmsRefresh.save_vms_inventory) EMS: [rhv-40-ipinto], id: [1000000000005] Updating Vm [istein] id: [1000000000069] location: [6b93cb29-4978-45e5-b6bd-e0b2276e0211.ovf] storage id: [1000000000037] uid_ems: [6b93cb29-4978-45e5-b6bd-e0b2276e0211] ems_ref: [/api/vms/6b93cb29-4978-45e5-b6bd-e0b2276e0211]


and when we look for specific vm in db we fail because we use different url:

[----] I, [2016-09-15T08:14:47.582551 #3143:e85990]  INFO -- : Q-task_id([miq_provision_1000000000003]) MIQ(ManageIQ::Providers::Redhat::InfraManager::Provision#poll_destination_in_vmdb) Unable to find Vm [istein] with ems_ref [/ovirt-engine/api/vms/6b93cb29-4978-45e5-b6bd-e0b2276e0211], will retry

Comment 10 Satoe Imaishi 2016-10-10 12:56:32 UTC
PR: https://github.com/ManageIQ/manageiq/pull/11285

Comment 11 Ilanit Stein 2016-10-13 07:34:43 UTC
Verified on CFME-5.7.0.3.20160927165516_075d0f3
and RHV-4.0.4.2-0.1.el7ev