Bug 1372940 - Cannot Reconfigure VM logs show that the "Broker Worker is not available"
Summary: Cannot Reconfigure VM logs show that the "Broker Worker is not available"
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.6.0
Hardware: All
OS: All
urgent
urgent
Target Milestone: GA
: 5.7.0
Assignee: Adam Grare
QA Contact: Dave Johnson
URL:
Whiteboard:
Depends On:
Blocks: 1374014
TreeView+ depends on / blocked
 
Reported: 2016-09-04 02:25 UTC by Josh Carter
Modified: 2019-12-16 06:37 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1374014 (view as bug list)
Environment:
Last Closed: 2016-09-27 13:01:00 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:


Attachments (Terms of Use)

Description Josh Carter 2016-09-04 02:25:51 UTC
Description of problem:

Customer ticket description
"Each time we attempt to reconfigure a VM we get a message in the service request logs that the "Broker Worker is not available"

Running a bin/rake evm:status shows that the MiqVimBrokerWorker is starting, but it never does, and eventually it then says stopping only to repeat the process all over again."

Log snippet 

[----] I, [2016-09-03T18:33:10.768636 #21627:11fb994]  INFO -- : <AutomationEngine> Followed  Relationship [miqaedb:/System/Policy/request_approved#create]
[----] I, [2016-09-03T18:33:10.768910 #21627:11fb994]  INFO -- : <AutomationEngine> Followed  Relationship [miqaedb:/System/Event/RequestEvent/Request/request_
approved#create]
[----] I, [2016-09-03T18:33:10.771642 #21627:11fb994]  INFO -- : MIQ(MiqQueue#delivered) Message id: [1000000045394], State: [ok], Delivered in [1.779454409] s
econds
[----] I, [2016-09-03T18:33:11.810710 #21606:11fb994]  INFO -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(ManageIQ::Providers::Vmware::InfraManager::
Vm#set_spec_option) numCoresPerSocket was NOT set due to nil
[----] I, [2016-09-03T18:33:11.810832 #21606:11fb994]  INFO -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(ManageIQ::Providers::Vmware::InfraManager::
Vm#set_spec_option) memoryMB was set to 2048 (Fixnum)
[----] I, [2016-09-03T18:33:11.810955 #21606:11fb994]  INFO -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(ManageIQ::Providers::Vmware::InfraManager::
Vm#set_spec_option) numCPUs was NOT set due to nil
[----] I, [2016-09-03T18:33:11.811096 #21606:11fb994]  INFO -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(VmReconfigureTask#do_request) Config spec: 
 (VimHash) xsiType: <VirtualMachineConfigSpec>  vimType: <>
[----] I, [2016-09-03T18:33:11.811202 #21606:11fb994]  INFO -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(VmReconfigureTask#do_request) Config spec: 
["memoryMB"](Fixnum) = 2048
[----] E, [2016-09-03T18:33:11.820982 #21606:11fb994] ERROR -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(VmReconfigureTask#execute) [Error: VM/Template <CFTest> with Id: <1000000000074>: Provider authentication failed.] encountered during VM Reconfigure
[----] E, [2016-09-03T18:33:11.821150 #21606:11fb994] ERROR -- : Q-task_id([vm_reconfigure_task_1000000000002]) [RuntimeError]: VM/Template <CFTest> with Id: <1000000000074>: Provider authentication failed.  Method:[rescue in execute]
[----] E, [2016-09-03T18:33:11.821297 #21606:11fb994] ERROR -- : Q-task_id([vm_reconfigure_task_1000000000002]) /var/www/miq/vmdb/app/models/vm_or_template.rb:327:in `run_command_via_parent'
/var/www/miq/vmdb/app/models/vm_or_template/operations/configuration.rb:104:in `spec_reconfigure'
/var/www/miq/vmdb/app/models/vm_reconfigure_task.rb:47:in `do_request'
/var/www/miq/vmdb/app/models/miq_request_task.rb:193:in `execute'
/var/www/miq/vmdb/app/models/miq_queue.rb:345:in `block in deliver'
/opt/rh/rh-ruby22/root/usr/share/ruby/timeout.rb:89:in `block in timeout'
/opt/rh/rh-ruby22/root/usr/share/ruby/timeout.rb:34:in `block in catch'
/opt/rh/rh-ruby22/root/usr/share/ruby/timeout.rb:34:in `catch'
/opt/rh/rh-ruby22/root/usr/share/ruby/timeout.rb:34:in `catch'
/opt/rh/rh-ruby22/root/usr/share/ruby/timeout.rb:104:in `timeout'
/var/www/miq/vmdb/app/models/miq_queue.rb:341:in `deliver'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:106:in `deliver_queue_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:152:in `block in do_work'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `loop'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `do_work'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:334:in `block in do_work_loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `do_work_loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:153:in `run'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:128:in `start'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:21:in `start_worker'
/var/www/miq/vmdb/app/models/miq_worker.rb:347:in `block in start'
/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
/var/www/miq/vmdb/app/models/miq_worker.rb:345:in `start'
/var/www/miq/vmdb/app/models/miq_worker.rb:274:in `start_worker'
/var/www/miq/vmdb/app/models/miq_worker.rb:154:in `block in sync_workers'
/var/www/miq/vmdb/app/models/miq_worker.rb:154:in `times'
/var/www/miq/vmdb/app/models/miq_worker.rb:154:in `sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:52:in `block in sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `each'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `sync_workers'
/var/www/miq/vmdb/app/models/miq_server.rb:173:in `start'
/var/www/miq/vmdb/app/models/miq_server.rb:265:in `start'
/var/www/miq/vmdb/lib/workers/evm_server.rb:65:in `start'
/var/www/miq/vmdb/lib/workers/evm_server.rb:92:in `start'
/var/www/miq/vmdb/lib/workers/bin/evm_server.rb:4:in `<main>'
[----] I, [2016-09-03T18:33:11.847348 #21606:11fb994]  INFO -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(MiqQueue#delivered) Message id: [1000000045396], State: [ok], Delivered in [1.813261218] seconds
[----] I, [2016-09-03T18:33:11.849331 #21606:11fb994]  INFO -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(MiqQueue#m_callback) Message id: [1000000045396], Invoking Callback with args: ["ok", "Message delivered successfully", "nil"]

Version-Release number of selected component (if applicable): 5.6.1


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Adam Grare 2016-09-07 18:20:55 UTC
The problem looks to be that the worker never reaches the "Started" state

It looks like the worker starts:
[----] I, [2016-09-03T18:21:44.892846 #21068:11fb994]  INFO -- : MIQ(MiqVimBrokerWorker#start) Worker started: ID [1000000000293], PID [21779], GUID [cbfaf632-7224-11e6-8fe2-0050568c6f5a]

But later we see:
[----] W, [2016-09-03T18:31:46.203494 #21068:11fb994]  WARN -- : MIQ(MiqServer#wait_for_started_workers) After waiting 600 seconds, no longer waiting for the following workers to start:[----] W, [2016-09-03T18:31:46.203665 #21068:11fb994]  WARN -- : Worker type: MiqVimBrokerWorker, pid: 21779, guid: cbfaf632-7224-11e6-8fe2-0050568c6f5a, status: starting

If the worker's state isn't Started then it will not be returned from MiqWorker#find_current, which means connect will return "Broker Worker is not available" and it will not heartbeat causing it to be killed later.


There was speculation that a "Bad Gateway" error caused the broker to lock-up somehow, however the first time the broker process hit this was well after the "Broker Worker is not available" error:

Broker Worker not available error:
[----] E, [2016-09-03T18:37:37.453312 #21619:11fb994] ERROR -- : Q-task_id([vm_reconfigure_task_1000000000003]) MIQ(ManageIQ::Providers::Vmware::InfraManager#connect) Broker Worker is  not available

First occurrence of "Bad Gateway" for process 21779
[----] E, [2016-09-03T18:40:36.455081 #21779:13961b4] ERROR -- : <VIM> Response <#<Handsoap::Response:0x0000000314e800 @http_body="<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body bgcolor=\"white\">\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx/1.6.3</center>\r\n</body>\r\n</html>\r\n", @soap_namespace="http://schemas.xmlsoap.org/soap/envelope/", @document=nil, @fault=nil>> has no XML document


Note You need to log in before you can comment on or make changes to this bug.