Bug 1528856

Summary: [RHV][Graph Refresh] Hosts not displayed on CFME, after moved to maintenance
Product: Red Hat CloudForms Management Engine Reporter: Ilanit Stein <istein>
Component: ProvidersAssignee: Boriso <bodnopoz>
Status: CLOSED CURRENTRELEASE QA Contact: Angelina Vasileva <anikifor>
Severity: high Docs Contact:
Priority: high    
Version: 5.9.0CC: bodnopoz, cpelland, gblomqui, istein, jfrey, jhardy, mperina, obarenbo, simaishi, smallamp
Target Milestone: GAKeywords: Regression, TestOnly
Target Release: 5.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: rhev:graph refresh:host target refresh
Fixed In Version: 5.10.0.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1539775 (view as bug list) Environment:
Last Closed: 2018-06-21 21:16:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: Bug
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: RHEVM Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1532255, 1539775    
Attachments:
Description Flags
evm.log
none
automation.log
none
rhevm.log
none
RHV engine.log
none
host_mixed_2_is_displayed_in_maintenance.png
none
host_mixed_2_is_gone.png
none
reproduce_evm.log
none
reproduce_automation.log
none
reproduce_rhevm.log none

Description Ilanit Stein 2017-12-24 15:53:54 UTC
Description of problem:
Have CFME-5.9 connected to a Scale RHV env, 
with ~1800VMs, and ~400 hosts.
On RHV side, move one host to maintenance.

After ~15 min host is not displayed at all under CFME UI, Compute->Infrastructure->Hosts.

Later, after moving a second host to maintenance, the first host, appeared on CFME UI, in status maintenance.

Moving both hosts back up => CFME UI displayed those 2 hosts in UP state, quite quickly.

Then moving on RHV side the same 2 hosts to maintenance => even after ~20 min,
both hosts were not displayed at all on CFME side.

Moving both hosts back up => CFME UI displayed those 2 hosts in UP state.

In evm.log/automation.log, there are no errors

automation.log report on UNKNOWN event:

[----] I, [2017-12-24T10:37:06.637397 #22706:120f138]  INFO -- : Instantiating [/System/Process/Event?EventStream%3A%3Aevent_stream=32507&Host%3A%3Ahost=1410&MiqServer%3A%3Amiq_server=1&User%3A%3Auser=1&event_id=32507&event_stream_id=32507&event_type=UNKNOWN&host_id=1410&object_name=Event&vmdb_object_type=event_stream]
[----] I, [2017-12-24T10:37:06.660628 #22706:120f138]  INFO -- : Updated namespace [/System/Process/Event?EventStream%3A%3Aevent_stream=32507&Host%3A%3Ahost=1410&MiqServer%3A%3Amiq_server=1&User%3A%3Auser=1&event_id=32507&event_stream_id=32507&event_type=UNKNOWN&host_id=1410&object_name=Event&vmdb_object_type=event_stream  ManageIQ/System]
[----] I, [2017-12-24T10:37:06.724341 #22706:120f138]  INFO -- : Following Relationship [miqaedb:/System/Event/EmsEvent/RHEVM/UNKNOWN#create]
[----] I, [2017-12-24T10:37:06.747575 #22706:120f138]  INFO -- : Updated namespace [miqaedb:/System/Event/EmsEvent/RHEVM/UNKNOWN#create  ManageIQ/System/Event/EmsEvent]
[----] I, [2017-12-24T10:37:06.806212 #22706:120f138]  INFO -- : Instance [/ManageIQ/System/Event/EmsEvent/RHEVM/UNKNOWN] not found in MiqAeDatastore - trying [.missing]

and this is reflected in evm.log as well:

[----] I, [2017-12-24T10:41:48.250751 #22706:120f138]  INFO -- : MIQ(MiqAeEngine.deliver) Delivering {:event_id=>32512, :event_stream_id=>32512, :event_type=>"UNKNOWN", "Host::host"=>1410, :host_id=>1410} for object [EmsEvent.32512] with state [] to Automate

Version-Release number of selected component (if applicable):
CFME-5.9.0.14,
RHV-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Ilanit Stein 2017-12-24 15:57:01 UTC
Created attachment 1371911 [details]
evm.log

First host moved to maintenance: hera10.eng.lab.tlv.redhat.com
Second host moved to maintenance: nested_host_from_cluster_nested_10.35.51.157

Looking for "UNKNOWN" show in the log the "unknown" event

Comment 3 Ilanit Stein 2017-12-24 15:57:32 UTC
Created attachment 1371912 [details]
automation.log

First host moved to maintenance: hera10.eng.lab.tlv.redhat.com
Second host moved to maintenance: nested_host_from_cluster_nested_10.35.51.157

Looking for "UNKNOWN" show in the log the "unknown" event

Comment 4 Ilanit Stein 2017-12-24 15:58:08 UTC
Created attachment 1371913 [details]
rhevm.log

Comment 5 Oved Ourfali 2017-12-25 06:14:36 UTC
Can you also attach the RHV logs?

Comment 6 Ilanit Stein 2017-12-25 07:54:43 UTC
Created attachment 1372071 [details]
RHV engine.log

Comment 7 Ilanit Stein 2017-12-25 10:05:20 UTC
Important info:
The bug was reported for CFME-5.9.0.14 + RHV Graph refresh - ON

The problems that follow moving host to maintenance, on RHV side,
and have on CFME in response: host vanish, and not displayed in state maintenance, for a very long time, is specifically related to RHV Graph refresh.

For CFME-5.9.0.14 + RHV Graph refresh - OFF,
changing host to maintenance, on RHV side, 
is followed by (after ~1 minute) update of this host state, on the CFME side to maintenance (was tested 3 times).

Comment 8 Piotr Kliczewski 2018-01-03 14:26:01 UTC
Checking the logs I see that there are no issues during refresh nor the host seems not to be deleted. I only found:

[----] E, [2017-12-24T09:02:32.772102 #22784:148de8c] ERROR -- : MIQ(host_controller-show): Can't access selected records

I checked the steps defined in comment #1 and I was not able to reproduce it on the latest master.

Ilanit, please retest and provide new logs

Comment 9 Ilanit Stein 2018-01-04 14:46:22 UTC
Reproduced for a small scale RHV.

1. On RHV side, move host to maintenance.
2. On CFME side, Check Compute->Hosts: The host is no longer displayed.
   See screenshot attachment "host_mixed_2_is_gone.png"
3. Run Provider full refresh & check again Compute->Hosts:
   Now Host is displayed in status maintenance.
   See screenshot attachment "host_mixed_2_is_displayed_in_maintenance.png"
4. On RHV side, Activate host.
5. On CFME side, Check Compute->Hosts: The host is no longer displayed.
6. Run Provider full refresh & check again Compute->Hosts:
Now Host is displayed in status UP. 

evm\rhevm\automation logs attached.

Check for host_mixed_2 moved to maintenance, in evm.log, it is @ 2018-01-04T09:16

Comment 10 Ilanit Stein 2018-01-04 14:46:51 UTC
Created attachment 1376990 [details]
host_mixed_2_is_displayed_in_maintenance.png

Comment 11 Ilanit Stein 2018-01-04 14:47:23 UTC
Created attachment 1376992 [details]
host_mixed_2_is_gone.png

Comment 12 Ilanit Stein 2018-01-04 14:48:08 UTC
Created attachment 1376993 [details]
reproduce_evm.log

Comment 13 Ilanit Stein 2018-01-04 14:48:36 UTC
Created attachment 1376994 [details]
reproduce_automation.log

Comment 14 Ilanit Stein 2018-01-04 14:49:02 UTC
Created attachment 1376995 [details]
reproduce_rhevm.log

Comment 15 Ilanit Stein 2018-01-04 14:57:34 UTC
In evm.log (attached reproduce_evm.log), 
there is an error (that also repeat later on),
but I am not sure what is it related to:

[----] E, [2018-01-04T09:14:11.523087 #60277:12fb13c] ERROR -- : MIQ(MiqQueue#deliver) Message id: [256292], Error: [no implicit conversion of Symbol into Integer]
[----] E, [2018-01-04T09:14:11.523282 #60277:12fb13c] ERROR -- : [TypeError]: no implicit conversion of Symbol into Integer  Method:[block in method_missing]
[----] E, [2018-01-04T09:14:11.523450 #60277:12fb13c] ERROR -- : /var/www/miq/vmdb/app/models/ems_event.rb:81:in `[]'
/var/www/miq/vmdb/app/models/ems_event.rb:81:in `add'
/var/www/miq/vmdb/app/models/miq_queue.rb:449:in `block in dispatch_method'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:91:in `block in timeout'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `block in catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:106:in `timeout'
/var/www/miq/vmdb/app/models/miq_queue.rb:448:in `dispatch_method'
/var/www/miq/vmdb/app/models/miq_queue.rb:425:in `block in deliver'
/var/www/miq/vmdb/app/models/user.rb:253:in `with_user_group'
/var/www/miq/vmdb/app/models/miq_queue.rb:425:in `deliver'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:104:in `deliver_queue_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:152:in `block in do_work'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `loop'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `do_work'
/var/www/miq/vmdb/app/models/miq_event_handler/runner.rb:24:in `do_work'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:329:in `block in do_work_loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:326:in `loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:326:in `do_work_loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:153:in `run'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:127:in `start'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:22:in `start_worker'
/var/www/miq/vmdb/app/models/miq_worker.rb:357:in `block in start_runner_via_fork'
/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
/var/www/miq/vmdb/app/models/miq_worker.rb:355:in `start_runner_via_fork'
/var/www/miq/vmdb/app/models/miq_worker.rb:349:in `start_runner'
/var/www/miq/vmdb/app/models/miq_worker.rb:396:in `start'
/var/www/miq/vmdb/app/models/miq_worker.rb:266:in `start_worker'
/var/www/miq/vmdb/app/models/miq_worker.rb:153:in `block in sync_workers'
/var/www/miq/vmdb/app/models/miq_worker.rb:153:in `times'
/var/www/miq/vmdb/app/models/miq_worker.rb:153:in `sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:53:in `block in sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `each'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:22:in `monitor_workers'
/var/www/miq/vmdb/app/models/miq_server.rb:329:in `block in monitor'
/opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-d865ee63c000/lib/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-d865ee63c000/lib/gems/pending/util/extensions/miq-benchmark.rb:28:in `realtime_block'
/var/www/miq/vmdb/app/models/miq_server.rb:329:in `monitor'
/var/www/miq/vmdb/app/models/miq_server.rb:351:in `block (2 levels) in monitor_loop'
/opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-d865ee63c000/lib/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-d865ee63c000/lib/gems/pending/util/extensions/miq-benchmark.rb:35:in `realtime_block'
/var/www/miq/vmdb/app/models/miq_server.rb:351:in `block in monitor_loop'
/var/www/miq/vmdb/app/models/miq_server.rb:350:in `loop'
/var/www/miq/vmdb/app/models/miq_server.rb:350:in `monitor_loop'
/var/www/miq/vmdb/app/models/miq_server.rb:234:in `start'
/var/www/miq/vmdb/lib/workers/evm_server.rb:27:in `start'
/var/www/miq/vmdb/lib/workers/evm_server.rb:48:in `start'
/var/www/miq/vmdb/lib/workers/bin/evm_server.rb:4:in `<main>'

Piotr,
Would you please take a look?

thanks.

Comment 16 Piotr Kliczewski 2018-01-04 16:04:18 UTC
I do not see this failure in my log and it could be the cause of the issue. I will investigate.