Created attachment 1100581 [details]
Grepped evm log for Vim Broker and the complete evm.log file as well.
Description of problem:
While adjusting appliance roles then adding a provider, a refresh completes however host inventory is missing. This is due to the Vim Broker worker not exiting and restarting with the correct cache scope.
Refresh worker Logs showed:
[----] W, [2015-11-30T13:17:00.836192 #14927:661994] WARN -- : MIQ(ManageIQ::Providers::Vmware::InfraManager::RefreshParser.host_inv_to_hashes) Missing configuration for Host [host-105]: [product]. Skipping.
[----] W, [2015-11-30T13:17:01.644311 #14927:661994] WARN -- : MIQ(EmsRefresh.save_hosts_inventory) EMS: [vmware-small], id:  Processing Host: [host-105] failed with error [MiqException::MiqIncompleteData: Incomplete data from EMS]. Skipping Host.
This indicates that the Broker's cache was not set correctly, however I had enabled the inventory role (After adding the provider) and the Broker Worker existed at the time of the refresh.
Version-Release number of selected component (if applicable):
Reproduced in this instance, not sure if it will always occur.
Steps to Reproduce:
1. Deploy new 5.5 appliance
2. Turn off inventory role
3. Add vmware provider
4. Turn on inventory role
5. See if host inventory is missing due to VimBroker Worker not exiting on broker cache change
The refresh completes with the warnings in the log and the hosts are not in the inventory.
The VimBroker Worker to exit on enabling of the ems_inventory and being replaced by a new Vim Broker worker with the correct cache to complete a full refresh.
I was able to resolve the issue by restarting evmserverd, which spawned a new VimBrokerWorker with the correct cache containing my hosts.
I suspect I could have killed/restarted the Vim Broker worker rather than restarting the entire application to achieve the same result.
On 5.4, in the same scenario, when you enable ems_inventory, the Broker worker is restarted so it obtains the correct cache_scope and you will see a message in the logs: "Exiting due to change in broker cache scope. Stopping Broker." This log line can not be found on 5.5 in the same scenario.
Attached is grepped output from evm.log for the VIMBroker Worker, Refresh Worker and AuditSucess(Which shows I saved ems_inventory as a role before the refresh completed). Also attached is the entire evm.log during which this occurred for further analysis.
Also note there is several stack traces in the complete log which might be the reason why the VimBroker worker never restarted after it's cache should have changed.
Adam, can you take a look at this?
See if you can reproduce what Alex is seeing with 5.4 vs. 5.5.
This seems to be caused by the other roles that were disabled.
From the logs I see the following roles were removed:
Roles removed: ["automate", "ems_inventory", "event", "reporting", "smartstate", "web_services"]
But only the inventory role was added back:
Roles added: ["ems_inventory"]
If I just remove ems_inventory everything works correctly. If I remove all of those roles then I see the same result as reported.
This is still a regression from 5.4, I'll work on finding out which role is causing the issue.
New commit detected on ManageIQ/manageiq/master:
Author: Adam Grare <email@example.com>
AuthorDate: Mon Jan 11 11:44:19 2016 -0500
Commit: Adam Grare <firstname.lastname@example.org>
CommitDate: Mon Jan 11 12:04:23 2016 -0500
Fix for disabling the web_services worker
Disabling the web_services role fails to terminate the worker
due to 'pop' no longer being defined for associations
app/models/mixins/miq_web_server_worker_mixin.rb | 1 +
1 file changed, 1 insertion(+)
Root cause was using 'pop()' on an active-record association which was deprecated in rails 4.1.
Steps to reproduce:
Go to configuration settings (Configure -> Configuration), down to Server Roles. Uncheck "Web Services" and hit save.
In the evm.log you should see the following
MIQ(MiqServer#log_role_changes) Server's roles have changed:
MIQ(MiqServer#log_role_changes) Roles removed: ["web_services"]
MIQ(MiqWebServiceWorker.sync_workers) Workers are being synchronized: Current #: , Desired #: 
Before the fix you should see:
MIQ(MiqServer#monitor) undefined method `pop' for #<MiqWebServiceWorker::ActiveRecord_Relation: 0x00000004663510>
[NoMethodError]: undefined method `pop' for #<MiqWebServiceWorker::ActiveRecord_Relation: 0x00000004663510> Method:[rescue in monitor]
MIQ(MiqServer#monitor) Reconnecting to database after error...
After the fix you should see:
MIQ(MiqWebServiceWorker.sync_workers) Unreserved port=4000, Current ports in use: 
MIQ(MiqServer#stop_worker) Stopping Worker [MiqWebServiceWorker] with ID: , PID: , GUID: [922b471e-b8ab-11e5-9965-54ee753e66dc], status [started]...
MIQ(MiqWebServiceWorker::Runner) ID  PID  GUID [922b471e-b8ab-11e5-9965-54ee753e66dc] Exit request received. Worker exiting.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.