Bug 1286811

Summary: VimBroker Worker cache did not exit on "change in broker cache scope"
Product: Red Hat CloudForms Management Engine Reporter: Alex Krzos <akrzos>
Component: ProvidersAssignee: Adam Grare <agrare>
Status: CLOSED ERRATA QA Contact: Jiri Stefanisin <jistefan>
Severity: high Docs Contact:
Priority: high    
Version: 5.5.0CC: jfrey, jhardy, jprause, mfeifer, obarenbo
Target Milestone: GA   
Target Release: 5.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 5.6.0.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1289746 (view as bug list) Environment:
Last Closed: 2016-06-29 15:13:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1289746    
Attachments:
Description Flags
Grepped evm log for Vim Broker and the complete evm.log file as well. none

Description Alex Krzos 2015-11-30 19:07:11 UTC
Created attachment 1100581 [details]
Grepped evm log for Vim Broker and the complete evm.log file as well.

Description of problem:
While adjusting appliance roles then adding a provider, a refresh completes however host inventory is missing.  This is due to the Vim Broker worker not exiting and restarting with the correct cache scope.

Refresh worker Logs showed:
[----] W, [2015-11-30T13:17:00.836192 #14927:661994]  WARN -- : MIQ(ManageIQ::Providers::Vmware::InfraManager::RefreshParser.host_inv_to_hashes) Missing configuration for Host [host-105]: [product]. Skipping.
...
[----] W, [2015-11-30T13:17:01.644311 #14927:661994]  WARN -- : MIQ(EmsRefresh.save_hosts_inventory) EMS: [vmware-small], id: [1] Processing Host: [host-105] failed with error [MiqException::MiqIncompleteData: Incomplete data from EMS]. Skipping Host.

This indicates that the Broker's cache was not set correctly, however I had enabled the inventory role (After adding the provider) and the Broker Worker existed at the time of the refresh.

Version-Release number of selected component (if applicable):
5.5.0.12

How reproducible:
Reproduced in this instance, not sure if it will always occur.

Steps to Reproduce:
1. Deploy new 5.5 appliance
2. Turn off inventory role
3. Add vmware provider
4. Turn on inventory role
5. See if host inventory is missing due to VimBroker Worker not exiting on broker cache change

Actual results:
The refresh completes with the warnings in the log and the hosts are not in the inventory.

Expected results:
The VimBroker Worker to exit on enabling of the ems_inventory and being replaced by a new Vim Broker worker with the correct cache to complete a full refresh.

Additional info:
I was able to resolve the issue by restarting evmserverd, which spawned a new VimBrokerWorker with the correct cache containing my hosts.

I suspect I could have killed/restarted the Vim Broker worker rather than restarting the entire application to achieve the same result.

On 5.4, in the same scenario, when you enable ems_inventory, the Broker worker is restarted so it obtains the correct cache_scope and you will see a message in the logs: "Exiting due to change in broker cache scope. Stopping Broker."  This log line can not be found on 5.5 in the same scenario.  

Attached is grepped output from evm.log for the VIMBroker Worker, Refresh Worker and AuditSucess(Which shows I saved ems_inventory as a role before the refresh completed).  Also attached is the entire evm.log during which this occurred for further analysis.


Also note there is several stack traces in the complete log which might be the reason why the VimBroker worker never restarted after it's cache should have changed.

Comment 2 Greg Blomquist 2015-12-01 15:11:08 UTC
Adam, can you take a look at this?  

See if you can reproduce what Alex is seeing with 5.4 vs. 5.5.

Comment 3 Adam Grare 2015-12-14 22:13:51 UTC
This seems to be caused by the other roles that were disabled.

From the logs I see the following roles were removed:
Roles removed:   ["automate", "ems_inventory", "event", "reporting", "smartstate", "web_services"]

But only the inventory role was added back:
Roles added:     ["ems_inventory"]

If I just remove ems_inventory everything works correctly.  If I remove all of those roles then I see the same result as reported.

This is still a regression from 5.4, I'll work on finding out which role is causing the issue.

Comment 5 CFME Bot 2016-01-11 19:36:05 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/4547a4c7b4c3e5dcd5d7d4c2aa73b5c7e21d5464

commit 4547a4c7b4c3e5dcd5d7d4c2aa73b5c7e21d5464
Author:     Adam Grare <agrare>
AuthorDate: Mon Jan 11 11:44:19 2016 -0500
Commit:     Adam Grare <agrare>
CommitDate: Mon Jan 11 12:04:23 2016 -0500

    Fix for disabling the web_services worker
    
    Disabling the web_services role fails to terminate the worker
    due to 'pop' no longer being defined for associations
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1286811

 app/models/mixins/miq_web_server_worker_mixin.rb | 1 +
 1 file changed, 1 insertion(+)

Comment 6 Adam Grare 2016-01-11 21:44:09 UTC
Root cause was using 'pop()' on an active-record association which was deprecated in rails 4.1.

Steps to reproduce:
Go to configuration settings (Configure -> Configuration), down to Server Roles.  Uncheck "Web Services" and hit save.

In the evm.log you should see the following

MIQ(MiqServer#log_role_changes) Server's roles have changed:
MIQ(MiqServer#log_role_changes)   Roles removed:   ["web_services"]
MIQ(MiqWebServiceWorker.sync_workers) Workers are being synchronized: Current #: [1], Desired #: [0]

Before the fix you should see:
MIQ(MiqServer#monitor) undefined method `pop' for #<MiqWebServiceWorker::ActiveRecord_Relation: 0x00000004663510>
[NoMethodError]: undefined method `pop' for #<MiqWebServiceWorker::ActiveRecord_Relation: 0x00000004663510>  Method:[rescue in monitor]
MIQ(MiqServer#monitor) Reconnecting to database after error...

After the fix you should see:
MIQ(MiqWebServiceWorker.sync_workers) Unreserved port=4000, Current ports in use: []
MIQ(MiqServer#stop_worker) Stopping Worker [MiqWebServiceWorker] with ID: [10], PID: [8597], GUID: [922b471e-b8ab-11e5-9965-54ee753e66dc], status [started]...
MIQ(MiqWebServiceWorker::Runner) ID [10] PID [8597] GUID [922b471e-b8ab-11e5-9965-54ee753e66dc] Exit request received. Worker exiting.

Comment 9 errata-xmlrpc 2016-06-29 15:13:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1348