Bug 1509020 - VM retirement fails when using ovirt-engine SDK (V4)
Summary: VM retirement fails when using ovirt-engine SDK (V4)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.10.0
Assignee: Piotr Kliczewski
QA Contact: Ilanit Stein
URL:
Whiteboard:
Depends On:
Blocks: 1511029 1511032 1511036
TreeView+ depends on / blocked
 
Reported: 2017-11-02 17:08 UTC by Ievgen Zapolskyi
Modified: 2018-06-21 21:00 UTC (History)
10 users (show)

Fixed In Version: 5.10.0.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1510459 1511029 1511032 1511036 (view as bug list)
Environment:
Last Closed: 2018-06-21 21:00:00 UTC
Category: Bug
Cloudforms Team: RHEVM
Target Upstream Version:
Embargoed:
izapolsk: automate_bug-


Attachments (Terms of Use)
cfme logs (1.96 MB, application/x-bzip)
2017-11-02 17:08 UTC, Ievgen Zapolskyi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ManageIQ manageiq-providers-ovirt pull 131 0 'None' 'closed' 'Fix vm removal for apiv4' 2019-11-11 15:49:50 UTC
oVirt gerrit 83645 0 'None' 'MERGED' 'Vm remove fails due to spelling mistake' 2019-11-11 15:49:50 UTC

Description Ievgen Zapolskyi 2017-11-02 17:08:08 UTC
Created attachment 1347082 [details]
cfme logs

Description of problem:
When vm is provisioned in rhv41, CFME receives USER_ADD_VM_STARTED/USER_ADD_VM_FINISHED_SUCCESS.
appliance doesn't create appropriate MiqEvents for such provider events atm.
It also doesn't display them in Timelines and ignores them in other places like Control.


Version-Release number of selected component (if applicable):
5.9.0.4
5.8.2.3

How reproducible:
100%

Steps to Reproduce:
1. Add rhv41 provider
2. Provision VM
3. Look for vm_create events in VM Timelines and event_streams table

Actual results:
Timelines doesn't display any create vm event for such vm
event_streams table contains appropriate EmsEvents (USER_ADD_VM_STARTED, USER_ADD_VM_FINISHED_SUCCESS) but doesn't have any MiqEvents.

Expected results:
appropriate vm_create events should be present in event_streams table, Timelines, etc

Comment 3 Piotr Kliczewski 2017-11-03 11:27:05 UTC
I attempted to reproduce the issue by creating and running a vm from both rhv admin portal and manageiq classic ui. Both vms were created and started correctly. I inspected event_streams table and I see events for both vms. When I checked VM Timelines I do not see the events but I presume that the view is updated after some time.

When checking the logs I found 2 issues which were not reproduced in my env.
@Juan are those known/fixed or something new:

[----] E, [2017-11-02T11:12:12.688264 #13184:1257140] ERROR -- : MIQ(MiqQueue#m_callback) Message id: [3513]: undefined method `destroy' for #<ManageIQ::Providers::Redhat::InfraManager::OvirtServices::Strategies::V4::VmProxyDecorator:0x00000011388c70>
[----] E, [2017-11-02T11:12:12.689005 #13184:1257140] ERROR -- : [NoMethodError]: undefined method `destroy' for #<ManageIQ::Providers::Redhat::InfraManager::OvirtServices::Strategies::V4::VmProxyDecorator:0x00000011388c70>  Method:[block in method_missing]
[----] E, [2017-11-02T11:12:12.689502 #13184:1257140] ERROR -- : /opt/rh/rh-ruby23/root/usr/share/ruby/delegate.rb:87:in `method_missing'
/var/www/miq/vmdb/app/models/mixins/provider_object_mixin.rb:15:in `block in with_provider_object'
/opt/rh/cfme-gemset/bundler/gems/manageiq-providers-ovirt-57b7ca2155d6/app/models/manageiq/providers/redhat/infra_manager/api_integration.rb:131:in `with_provider_connection'
/var/www/miq/vmdb/app/models/mixins/provider_object_mixin.rb:12:in `with_provider_object'
/opt/rh/cfme-gemset/bundler/gems/manageiq-providers-ovirt-57b7ca2155d6/app/models/manageiq/providers/redhat/infra_manager/vm/operations.rb:9:in `raw_destroy'
/var/www/miq/vmdb/app/models/mixins/miq_policy_mixin.rb:115:in `check_policy_prevent_callback'
/var/www/miq/vmdb/app/models/miq_queue.rb:495:in `m_callback'
/var/www/miq/vmdb/app/models/miq_queue.rb:464:in `delivered'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:116:in `deliver_queue_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:152:in `block in do_work'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `loop'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `do_work'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `block in do_work_loop'


[----] E, [2017-11-02T11:19:23.657929 #13158:1257140] ERROR -- : Q-task_id([miq_provision_4]) MIQ(ManageIQ::Providers::Redhat::InfraManager::Provision#provision_error) [[OvirtSDK4::Error]: HTTP response code is 200.] encountered during phase [customize_destination]
[----] E, [2017-11-02T11:19:23.658093 #13158:1257140] ERROR -- : Q-task_id([miq_provision_4]) /opt/rh/rh-ruby23/root/usr/share/gems/gems/ovirt-engine-sdk-4.1.9/lib/ovirtsdk4/connection.rb:509:in `raise_error'
/opt/rh/rh-ruby23/root/usr/share/gems/gems/ovirt-engine-sdk-4.1.9/lib/ovirtsdk4/service.rb:333:in `internal_read_body'
/opt/rh/rh-ruby23/root/usr/share/gems/gems/ovirt-engine-sdk-4.1.9/lib/ovirtsdk4/service.rb:239:in `block in internal_update'
/opt/rh/rh-ruby23/root/usr/share/gems/gems/ovirt-engine-sdk-4.1.9/lib/ovirtsdk4/service.rb:46:in `wait'
/opt/rh/rh-ruby23/root/usr/share/gems/gems/ovirt-engine-sdk-4.1.9/lib/ovirtsdk4/service.rb:244:in `internal_update'
/opt/rh/rh-ruby23/root/usr/share/gems/gems/ovirt-engine-sdk-4.1.9/lib/ovirtsdk4/services.rb:26671:in `update'
/opt/rh/rh-ruby23/root/usr/share/ruby/delegate.rb:83:in `method_missing'
/opt/rh/cfme-gemset/bundler/gems/manageiq-providers-ovirt-57b7ca2155d6/app/models/manageiq/providers/redhat/infra_manager/ovirt_services/strategies/v4.rb:348:in `update_memory!'
/opt/rh/cfme-gemset/bundler/gems/manageiq-providers-ovirt-57b7ca2155d6/app/models/manageiq/providers/redhat/infra_manager/provision/configuration/container.rb:17:in `configure_memory'
/opt/rh/cfme-gemset/bundler/gems/manageiq-providers-ovirt-57b7ca2155d6/app/models/manageiq/providers/redhat/infra_manager/provision/configuration.rb:34:in `block in configure_container'
/var/www/miq/vmdb/app/models/mixins/provider_object_mixin.rb:15:in `block in with_provider_object'
/opt/rh/cfme-gemset/bundler/gems/manageiq-providers-ovirt-57b7ca2155d6/app/models/manageiq/providers/redhat/infra_manager/api_integration.rb:131:in `with_provider_connection'
/var/www/miq/vmdb/app/models/mixins/provider_object_mixin.rb:12:in `with_provider_object'
/opt/rh/cfme-gemset/bundler/gems/manageiq-providers-ovirt-57b7ca2155d6/app/models/manageiq/providers/redhat/infra_manager/provision/configuration.rb:32:in `configure_container'
/opt/rh/cfme-gemset/bundler/gems/manageiq-providers-ovirt-57b7ca2155d6/app/models/manageiq/providers/redhat/infra_manager/provision/state_machine.rb:44:in `customize_destination'
/var/www/miq/vmdb/app/models/miq_request_task/state_machine.rb:21:in `signal'
/var/www/miq/vmdb/app/models/miq_provision/state_machine.rb:50:in `poll_destination_in_vmdb'
/var/www/miq/vmdb/app/models/miq_queue.rb:452:in `block in dispatch_method'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:91:in `block in timeout'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `block in catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:106:in `timeout'

Comment 4 Juan Hernández 2017-11-03 12:25:05 UTC
In the second error message that Piotr mentions it looks like RHV returned an empty request when asked to update the virtual machine. The SDK generates an error message which is not very clear, we will improve it. But to determine what happened in the RHV side we need the logs, in particular the following files:

  /var/log/ovirt-engine/engine.log
  /var/log/ovirt-engine/server.log
  /var/log/httpd/ovirt-requests-log
  /var/log/httpd/ssl_accesss_log

If the RHV system is still running, it would also help if we can connect with SSH and take a look.

I am still looking at the first error message that Piotor mentions.

Comment 5 Juan Hernández 2017-11-03 13:00:17 UTC
The first error that Piotr mentions seems to be related to a request to destroy a virtual machine:

  [----] I, [2017-11-02T11:12:12.574584 #13184:1257140]  INFO -- : MIQ(MiqAlert.evaluate_alerts) [request_vm_destroy] Target: ManageIQ::Providers::Redhat::InfraManager::Vm Name: [test-rhv41-events-test], Id: [117]

Not sure where that originates from, may it be manual VM retirement? Anyhow I think that we don't have the required `destroy` method implemented, thus the error.

Ilanit, are we testing VM retirement? Can you check if it works and open a separate bug if it doesn't?

Comment 6 Ievgen Zapolskyi 2017-11-03 15:57:32 UTC
please find me in irc or gitter, I'll give you credentials to that rhv box.
btw, I can provide appliance with rhv41 provider. You can see that Miq event "create vm" is absent for appropriate Evm event.

Comment 7 Juan Hernández 2017-11-03 18:08:57 UTC
Looking at the logs provided by Ievgen I see this, in the web server log:

[02/Nov/2017:11:19:03 -0400] 10.8.197.125 "Correlation-Id: -" "Duration: 20027513us" "PUT /ovirt-engine/api/vms/1ca05054-a786-4204-9b40-9fe13f920385 HTTP/1.1" -

That means that the web server didn't write a response body.

Then, in the engine log I see this, for the same request:

2017-11-02 11:19:23,631-04 ERROR [org.ovirt.engine.api.restapi.resource.validation.IOExceptionMapper] (default task-34) [] IO exception while processing "PUT" request for path "/vms/1ca05054-a786-4204-9b40-9fe13f920385"
2017-11-02 11:19:23,631-04 ERROR [org.ovirt.engine.api.restapi.resource.validation.IOExceptionMapper] (default task-34) [] Exception: java.io.IOException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Premature end of file.

That indicates that the engine never received the request body. But apparently the client send it correctly.

In order to obtain more information I would need to modify the provider, to generate a debug file containing all the HTTP traffic sent and received from the server. Is that possible?

Comment 8 Juan Hernández 2017-11-03 18:47:08 UTC
I saw a behaviour similar to this if I run a script that performs multiple VM updates in a row, but only if I kill the script when it is in the middle of waiting for the response to a 'Expect: 100-continue' request. As the oVirt server doesn't support that, I am modifying the SDK to avoid that, just in case:

  Don't send 'Expect: 100-continue'
  https://gerrit.ovirt.org/83595

That should make updates from the provider faster, but I am not convinced it is the cause of this issue, I still would like to have a dump of the HTTP traffic, if possible.

Comment 9 Ilanit Stein 2017-11-06 08:03:33 UTC
Juan,

I've tested on CFME-5.8.2.3/RHV-4.1.6 VM retirement, 
and the behavior was as expected:

1. For a VM that was provisioned from CFME = It was stopped, and removed (turned into Archived, in CFME side)

2. For a VM that was not provisioned from CFME - It was stopped.

Comment 10 Juan Hernández 2017-11-06 08:19:29 UTC
So we need to understand in what situation is that `destroy` method called, and we need to implement it. Piotr, any idea?

Comment 11 Piotr Kliczewski 2017-11-06 10:10:00 UTC
The only idea I have is to reproduce it and see where exactly it is failing. I am on it.

Comment 12 Piotr Kliczewski 2017-11-06 12:44:23 UTC
I was able to reproduce it by retiring a vm.

Comment 15 Ilanit Stein 2017-11-07 12:53:48 UTC
As the fix attached to this bug (already merged) refer to another problem - 
VM retire failure, for V4 (V4 is default from CFME-5.9),
that was found along the investigation, 
I am turning this bug into VM retire failure, and opening another for the  
"cfme ignores create vm events from rvh41 provider"

Comment 18 Piotr Kliczewski 2017-11-07 15:35:49 UTC
Based on the provided env I can see that for received event:

[----] I, [2017-11-07T10:12:28.579519 #13923:31f13c]  INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::EventCatcher::Runner#queue_event) EMS [env-rhv41-01-mgr.cfme2.lab.eng.rdu2.redhat.com] as [admin@internal] Caught event [USER_ADD_VM_STARTED]
[----] I, [2017-11-07T10:12:28.856247 #13923:31f13c]  INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager#with_provider_connection) Connecting through ManageIQ::Providers::Redhat::InfraManager: [rhv41]
[----] I, [2017-11-07T10:12:28.905401 #13923:31f13c]  INFO -- : MIQ(MiqQueue.put) Message id: [429],  id: [], Zone: [default], Role: [event], Server: [], Ident: [ems], Target id: [1], Instance id: [], Task id: [], Command: [EmsEvent.add], Timeout: [600], Priority: [100], State: [ready], Deliver On: [], Data: [], Args: [{:event_type=>"USER_ADD_VM_STARTED", :source=>"RHEVM", :message=>"VM test41 creation was initiated by admin@internal-authz.", :timestamp=>Tue, 07 Nov 2017 10:12:23 -0500, :username=>"admin@internal-authz",

There is corresponding entry in the db:

#<EmsEvent:0x00000009853348
 id: 206,
 event_type: "USER_ADD_VM_STARTED",
 message: "VM test41 creation was initiated by admin@internal-authz.",
 timestamp: Tue, 07 Nov 2017 15:12:23 UTC +00:00,

but it is not part of the timeline. I checked for `USER_STARTED_VM` and `USER_RUN_VM` but no events in the time line. When checking the logs I see:

[----] W, [2017-11-07T10:31:59.751405 #13300:e82344]  WARN -- : security warning, CSP violation report follows: {"csp-report"=>{"blocked-uri"=>"data", "document-uri"=>"https://10.8.198.4/dashboard/timeline/", "original-policy"=>"default-src https://10.8.198.4; connect-src https://10.8.198.4 wss://10.8.198.4; frame-src https://10.8.198.4; script-src 'unsafe-eval' 'unsafe-inline' https://10.8.198.4; style-src 'unsafe-inline' https://10.8.198.4; report-uri https://10.8.198.4/dashboard/csp_report", "referrer"=>"https://10.8.198.4/vm_infra/explorer", "violated-directive"=>"default-src https://10.8.198.4"}}

which may be the root cause of the issue. It seems not to be related to RHV. Can you check whether it is possible to overcome the warning?

Comment 19 Ievgen Zapolskyi 2017-11-08 09:40:06 UTC
Piotr,
 1. actually this is not RHEVM issue because CFME receives appropriate EmsEvents.
 But it doesn't produce internal MiqEvent (event_type = 'vm_create') when it receives EmsEvents(USER_ADD_VM_STARTED/USER_ADD_VM_FINISHED_SUCCESS). That's the point.
 2. I guess you can disable csp in Firefox's if you set security.csp.enable = false in about:config
 3. I found those events in rhv41 provider Timelines. So, it seems those weren't displayed in appropriate vm Timelines because some relationship was missing in EmsEvent.

Comment 20 Piotr Kliczewski 2017-11-08 10:00:02 UTC
(In reply to Ievgen Zapolskyi from comment #19)
> Piotr,
>  1. actually this is not RHEVM issue because CFME receives appropriate
> EmsEvents.
>  But it doesn't produce internal MiqEvent (event_type = 'vm_create') when it
> receives EmsEvents(USER_ADD_VM_STARTED/USER_ADD_VM_FINISHED_SUCCESS). That's
> the point.

OK. Let's find the reason for it.

>  2. I guess you can disable csp in Firefox's if you set security.csp.enable
> = false in about:config

Will check

>  3. I found those events in rhv41 provider Timelines. So, it seems those
> weren't displayed in appropriate vm Timelines because some relationship was
> missing in EmsEvent.

This could be related to your 1.


Note You need to log in before you can comment on or make changes to this bug.