1528856 – [RHV][Graph Refresh] Hosts not displayed on CFME, after moved to maintenance

Bug 1528856 - [RHV][Graph Refresh] Hosts not displayed on CFME, after moved to maintenance

Summary: [RHV][Graph Refresh] Hosts not displayed on CFME, after moved to maintenance

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Providers
Sub Component:
Version:	5.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	GA
Target Release:	5.10.0
Assignee:	Boriso
QA Contact:	Angelina Vasileva
Docs Contact:
URL:
Whiteboard:	rhev:graph refresh:host target refresh
Depends On:
Blocks:	1532255 1539775
TreeView+	depends on / blocked

Reported:	2017-12-24 15:53 UTC by Ilanit Stein
Modified:	2019-08-01 11:49 UTC (History)
CC List:	10 users (show)
Fixed In Version:	5.10.0.0
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1539775 (view as bug list)
Environment:
Last Closed:	2018-06-21 21:16:20 UTC
Category:	Bug
Cloudforms Team:	RHEVM
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
evm.log (2.10 MB, application/x-gzip) 2017-12-24 15:57 UTC, Ilanit Stein	no flags	Details
automation.log (1.11 MB, application/x-gzip) 2017-12-24 15:57 UTC, Ilanit Stein	no flags	Details
rhevm.log (1.31 MB, application/x-gzip) 2017-12-24 15:58 UTC, Ilanit Stein	no flags	Details
RHV engine.log (11.85 MB, application/x-gzip) 2017-12-25 07:54 UTC, Ilanit Stein	no flags	Details
host_mixed_2_is_displayed_in_maintenance.png (83.88 KB, image/png) 2018-01-04 14:46 UTC, Ilanit Stein	no flags	Details
host_mixed_2_is_gone.png (83.41 KB, image/png) 2018-01-04 14:47 UTC, Ilanit Stein	no flags	Details
reproduce_evm.log (1.20 MB, application/x-gzip) 2018-01-04 14:48 UTC, Ilanit Stein	no flags	Details
reproduce_automation.log (157.87 KB, application/x-gzip) 2018-01-04 14:48 UTC, Ilanit Stein	no flags	Details
reproduce_rhevm.log (76.68 KB, application/x-gzip) 2018-01-04 14:49 UTC, Ilanit Stein	no flags	Details
View All

Description Ilanit Stein 2017-12-24 15:53:54 UTC

Description of problem:
Have CFME-5.9 connected to a Scale RHV env, 
with ~1800VMs, and ~400 hosts.
On RHV side, move one host to maintenance.

After ~15 min host is not displayed at all under CFME UI, Compute->Infrastructure->Hosts.

Later, after moving a second host to maintenance, the first host, appeared on CFME UI, in status maintenance.

Moving both hosts back up => CFME UI displayed those 2 hosts in UP state, quite quickly.

Then moving on RHV side the same 2 hosts to maintenance => even after ~20 min,
both hosts were not displayed at all on CFME side.

Moving both hosts back up => CFME UI displayed those 2 hosts in UP state.

In evm.log/automation.log, there are no errors

automation.log report on UNKNOWN event:

[----] I, [2017-12-24T10:37:06.637397 #22706:120f138]  INFO -- : Instantiating [/System/Process/Event?EventStream%3A%3Aevent_stream=32507&Host%3A%3Ahost=1410&MiqServer%3A%3Amiq_server=1&User%3A%3Auser=1&event_id=32507&event_stream_id=32507&event_type=UNKNOWN&host_id=1410&object_name=Event&vmdb_object_type=event_stream]
[----] I, [2017-12-24T10:37:06.660628 #22706:120f138]  INFO -- : Updated namespace [/System/Process/Event?EventStream%3A%3Aevent_stream=32507&Host%3A%3Ahost=1410&MiqServer%3A%3Amiq_server=1&User%3A%3Auser=1&event_id=32507&event_stream_id=32507&event_type=UNKNOWN&host_id=1410&object_name=Event&vmdb_object_type=event_stream  ManageIQ/System]
[----] I, [2017-12-24T10:37:06.724341 #22706:120f138]  INFO -- : Following Relationship [miqaedb:/System/Event/EmsEvent/RHEVM/UNKNOWN#create]
[----] I, [2017-12-24T10:37:06.747575 #22706:120f138]  INFO -- : Updated namespace [miqaedb:/System/Event/EmsEvent/RHEVM/UNKNOWN#create  ManageIQ/System/Event/EmsEvent]
[----] I, [2017-12-24T10:37:06.806212 #22706:120f138]  INFO -- : Instance [/ManageIQ/System/Event/EmsEvent/RHEVM/UNKNOWN] not found in MiqAeDatastore - trying [.missing]

and this is reflected in evm.log as well:

[----] I, [2017-12-24T10:41:48.250751 #22706:120f138]  INFO -- : MIQ(MiqAeEngine.deliver) Delivering {:event_id=>32512, :event_stream_id=>32512, :event_type=>"UNKNOWN", "Host::host"=>1410, :host_id=>1410} for object [EmsEvent.32512] with state [] to Automate

Version-Release number of selected component (if applicable):
CFME-5.9.0.14,
RHV-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Ilanit Stein 2017-12-24 15:57:01 UTC

Created attachment 1371911 [details]
evm.log

First host moved to maintenance: hera10.eng.lab.tlv.redhat.com
Second host moved to maintenance: nested_host_from_cluster_nested_10.35.51.157

Looking for "UNKNOWN" show in the log the "unknown" event

Comment 3 Ilanit Stein 2017-12-24 15:57:32 UTC

Created attachment 1371912 [details]
automation.log

First host moved to maintenance: hera10.eng.lab.tlv.redhat.com
Second host moved to maintenance: nested_host_from_cluster_nested_10.35.51.157

Looking for "UNKNOWN" show in the log the "unknown" event

Comment 4 Ilanit Stein 2017-12-24 15:58:08 UTC

Created attachment 1371913 [details]
rhevm.log

Comment 5 Oved Ourfali 2017-12-25 06:14:36 UTC

Can you also attach the RHV logs?

Comment 6 Ilanit Stein 2017-12-25 07:54:43 UTC

Created attachment 1372071 [details]
RHV engine.log

Comment 7 Ilanit Stein 2017-12-25 10:05:20 UTC

Important info:
The bug was reported for CFME-5.9.0.14 + RHV Graph refresh - ON

The problems that follow moving host to maintenance, on RHV side,
and have on CFME in response: host vanish, and not displayed in state maintenance, for a very long time, is specifically related to RHV Graph refresh.

For CFME-5.9.0.14 + RHV Graph refresh - OFF,
changing host to maintenance, on RHV side, 
is followed by (after ~1 minute) update of this host state, on the CFME side to maintenance (was tested 3 times).

Comment 8 Piotr Kliczewski 2018-01-03 14:26:01 UTC

Checking the logs I see that there are no issues during refresh nor the host seems not to be deleted. I only found:

[----] E, [2017-12-24T09:02:32.772102 #22784:148de8c] ERROR -- : MIQ(host_controller-show): Can't access selected records

I checked the steps defined in comment #1 and I was not able to reproduce it on the latest master.

Ilanit, please retest and provide new logs

Comment 9 Ilanit Stein 2018-01-04 14:46:22 UTC

Reproduced for a small scale RHV.

1. On RHV side, move host to maintenance.
2. On CFME side, Check Compute->Hosts: The host is no longer displayed.
   See screenshot attachment "host_mixed_2_is_gone.png"
3. Run Provider full refresh & check again Compute->Hosts:
   Now Host is displayed in status maintenance.
   See screenshot attachment "host_mixed_2_is_displayed_in_maintenance.png"
4. On RHV side, Activate host.
5. On CFME side, Check Compute->Hosts: The host is no longer displayed.
6. Run Provider full refresh & check again Compute->Hosts:
Now Host is displayed in status UP. 

evm\rhevm\automation logs attached.

Check for host_mixed_2 moved to maintenance, in evm.log, it is @ 2018-01-04T09:16

Comment 10 Ilanit Stein 2018-01-04 14:46:51 UTC

Created attachment 1376990 [details]
host_mixed_2_is_displayed_in_maintenance.png

Comment 11 Ilanit Stein 2018-01-04 14:47:23 UTC

Created attachment 1376992 [details]
host_mixed_2_is_gone.png

Comment 12 Ilanit Stein 2018-01-04 14:48:08 UTC

Created attachment 1376993 [details]
reproduce_evm.log

Comment 13 Ilanit Stein 2018-01-04 14:48:36 UTC

Created attachment 1376994 [details]
reproduce_automation.log

Comment 14 Ilanit Stein 2018-01-04 14:49:02 UTC

Created attachment 1376995 [details]
reproduce_rhevm.log

Comment 15 Ilanit Stein 2018-01-04 14:57:34 UTC

In evm.log (attached reproduce_evm.log), 
there is an error (that also repeat later on),
but I am not sure what is it related to:

[----] E, [2018-01-04T09:14:11.523087 #60277:12fb13c] ERROR -- : MIQ(MiqQueue#deliver) Message id: [256292], Error: [no implicit conversion of Symbol into Integer]
[----] E, [2018-01-04T09:14:11.523282 #60277:12fb13c] ERROR -- : [TypeError]: no implicit conversion of Symbol into Integer  Method:[block in method_missing]
[----] E, [2018-01-04T09:14:11.523450 #60277:12fb13c] ERROR -- : /var/www/miq/vmdb/app/models/ems_event.rb:81:in `[]'
/var/www/miq/vmdb/app/models/ems_event.rb:81:in `add'
/var/www/miq/vmdb/app/models/miq_queue.rb:449:in `block in dispatch_method'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:91:in `block in timeout'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `block in catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:106:in `timeout'
/var/www/miq/vmdb/app/models/miq_queue.rb:448:in `dispatch_method'
/var/www/miq/vmdb/app/models/miq_queue.rb:425:in `block in deliver'
/var/www/miq/vmdb/app/models/user.rb:253:in `with_user_group'
/var/www/miq/vmdb/app/models/miq_queue.rb:425:in `deliver'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:104:in `deliver_queue_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:152:in `block in do_work'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `loop'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `do_work'
/var/www/miq/vmdb/app/models/miq_event_handler/runner.rb:24:in `do_work'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:329:in `block in do_work_loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:326:in `loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:326:in `do_work_loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:153:in `run'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:127:in `start'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:22:in `start_worker'
/var/www/miq/vmdb/app/models/miq_worker.rb:357:in `block in start_runner_via_fork'
/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
/var/www/miq/vmdb/app/models/miq_worker.rb:355:in `start_runner_via_fork'
/var/www/miq/vmdb/app/models/miq_worker.rb:349:in `start_runner'
/var/www/miq/vmdb/app/models/miq_worker.rb:396:in `start'
/var/www/miq/vmdb/app/models/miq_worker.rb:266:in `start_worker'
/var/www/miq/vmdb/app/models/miq_worker.rb:153:in `block in sync_workers'
/var/www/miq/vmdb/app/models/miq_worker.rb:153:in `times'
/var/www/miq/vmdb/app/models/miq_worker.rb:153:in `sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:53:in `block in sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `each'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:22:in `monitor_workers'
/var/www/miq/vmdb/app/models/miq_server.rb:329:in `block in monitor'
/opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-d865ee63c000/lib/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-d865ee63c000/lib/gems/pending/util/extensions/miq-benchmark.rb:28:in `realtime_block'
/var/www/miq/vmdb/app/models/miq_server.rb:329:in `monitor'
/var/www/miq/vmdb/app/models/miq_server.rb:351:in `block (2 levels) in monitor_loop'
/opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-d865ee63c000/lib/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-d865ee63c000/lib/gems/pending/util/extensions/miq-benchmark.rb:35:in `realtime_block'
/var/www/miq/vmdb/app/models/miq_server.rb:351:in `block in monitor_loop'
/var/www/miq/vmdb/app/models/miq_server.rb:350:in `loop'
/var/www/miq/vmdb/app/models/miq_server.rb:350:in `monitor_loop'
/var/www/miq/vmdb/app/models/miq_server.rb:234:in `start'
/var/www/miq/vmdb/lib/workers/evm_server.rb:27:in `start'
/var/www/miq/vmdb/lib/workers/evm_server.rb:48:in `start'
/var/www/miq/vmdb/lib/workers/bin/evm_server.rb:4:in `<main>'

Piotr,
Would you please take a look?

thanks.

Comment 16 Piotr Kliczewski 2018-01-04 16:04:18 UTC

I do not see this failure in my log and it could be the cause of the issue. I will investigate.

Comment 17 Satoe Imaishi 2018-01-29 15:14:02 UTC

PR: https://github.com/ManageIQ/manageiq-providers-ovirt/pull/155

Note You need to log in before you can comment on or make changes to this bug.