Hide Forgot
Description of problem: Customer ticket description "Each time we attempt to reconfigure a VM we get a message in the service request logs that the "Broker Worker is not available" Running a bin/rake evm:status shows that the MiqVimBrokerWorker is starting, but it never does, and eventually it then says stopping only to repeat the process all over again." Log snippet [----] I, [2016-09-03T18:33:10.768636 #21627:11fb994] INFO -- : <AutomationEngine> Followed Relationship [miqaedb:/System/Policy/request_approved#create] [----] I, [2016-09-03T18:33:10.768910 #21627:11fb994] INFO -- : <AutomationEngine> Followed Relationship [miqaedb:/System/Event/RequestEvent/Request/request_ approved#create] [----] I, [2016-09-03T18:33:10.771642 #21627:11fb994] INFO -- : MIQ(MiqQueue#delivered) Message id: [1000000045394], State: [ok], Delivered in [1.779454409] s econds [----] I, [2016-09-03T18:33:11.810710 #21606:11fb994] INFO -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(ManageIQ::Providers::Vmware::InfraManager:: Vm#set_spec_option) numCoresPerSocket was NOT set due to nil [----] I, [2016-09-03T18:33:11.810832 #21606:11fb994] INFO -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(ManageIQ::Providers::Vmware::InfraManager:: Vm#set_spec_option) memoryMB was set to 2048 (Fixnum) [----] I, [2016-09-03T18:33:11.810955 #21606:11fb994] INFO -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(ManageIQ::Providers::Vmware::InfraManager:: Vm#set_spec_option) numCPUs was NOT set due to nil [----] I, [2016-09-03T18:33:11.811096 #21606:11fb994] INFO -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(VmReconfigureTask#do_request) Config spec: (VimHash) xsiType: <VirtualMachineConfigSpec> vimType: <> [----] I, [2016-09-03T18:33:11.811202 #21606:11fb994] INFO -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(VmReconfigureTask#do_request) Config spec: ["memoryMB"](Fixnum) = 2048 [----] E, [2016-09-03T18:33:11.820982 #21606:11fb994] ERROR -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(VmReconfigureTask#execute) [Error: VM/Template <CFTest> with Id: <1000000000074>: Provider authentication failed.] encountered during VM Reconfigure [----] E, [2016-09-03T18:33:11.821150 #21606:11fb994] ERROR -- : Q-task_id([vm_reconfigure_task_1000000000002]) [RuntimeError]: VM/Template <CFTest> with Id: <1000000000074>: Provider authentication failed. Method:[rescue in execute] [----] E, [2016-09-03T18:33:11.821297 #21606:11fb994] ERROR -- : Q-task_id([vm_reconfigure_task_1000000000002]) /var/www/miq/vmdb/app/models/vm_or_template.rb:327:in `run_command_via_parent' /var/www/miq/vmdb/app/models/vm_or_template/operations/configuration.rb:104:in `spec_reconfigure' /var/www/miq/vmdb/app/models/vm_reconfigure_task.rb:47:in `do_request' /var/www/miq/vmdb/app/models/miq_request_task.rb:193:in `execute' /var/www/miq/vmdb/app/models/miq_queue.rb:345:in `block in deliver' /opt/rh/rh-ruby22/root/usr/share/ruby/timeout.rb:89:in `block in timeout' /opt/rh/rh-ruby22/root/usr/share/ruby/timeout.rb:34:in `block in catch' /opt/rh/rh-ruby22/root/usr/share/ruby/timeout.rb:34:in `catch' /opt/rh/rh-ruby22/root/usr/share/ruby/timeout.rb:34:in `catch' /opt/rh/rh-ruby22/root/usr/share/ruby/timeout.rb:104:in `timeout' /var/www/miq/vmdb/app/models/miq_queue.rb:341:in `deliver' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:106:in `deliver_queue_message' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:152:in `block in do_work' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `loop' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `do_work' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:334:in `block in do_work_loop' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `loop' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `do_work_loop' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:153:in `run' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:128:in `start' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:21:in `start_worker' /var/www/miq/vmdb/app/models/miq_worker.rb:347:in `block in start' /opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork' /opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork' /var/www/miq/vmdb/app/models/miq_worker.rb:345:in `start' /var/www/miq/vmdb/app/models/miq_worker.rb:274:in `start_worker' /var/www/miq/vmdb/app/models/miq_worker.rb:154:in `block in sync_workers' /var/www/miq/vmdb/app/models/miq_worker.rb:154:in `times' /var/www/miq/vmdb/app/models/miq_worker.rb:154:in `sync_workers' /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:52:in `block in sync_workers' /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `each' /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `sync_workers' /var/www/miq/vmdb/app/models/miq_server.rb:173:in `start' /var/www/miq/vmdb/app/models/miq_server.rb:265:in `start' /var/www/miq/vmdb/lib/workers/evm_server.rb:65:in `start' /var/www/miq/vmdb/lib/workers/evm_server.rb:92:in `start' /var/www/miq/vmdb/lib/workers/bin/evm_server.rb:4:in `<main>' [----] I, [2016-09-03T18:33:11.847348 #21606:11fb994] INFO -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(MiqQueue#delivered) Message id: [1000000045396], State: [ok], Delivered in [1.813261218] seconds [----] I, [2016-09-03T18:33:11.849331 #21606:11fb994] INFO -- : Q-task_id([vm_reconfigure_task_1000000000002]) MIQ(MiqQueue#m_callback) Message id: [1000000045396], Invoking Callback with args: ["ok", "Message delivered successfully", "nil"] Version-Release number of selected component (if applicable): 5.6.1 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
The problem looks to be that the worker never reaches the "Started" state It looks like the worker starts: [----] I, [2016-09-03T18:21:44.892846 #21068:11fb994] INFO -- : MIQ(MiqVimBrokerWorker#start) Worker started: ID [1000000000293], PID [21779], GUID [cbfaf632-7224-11e6-8fe2-0050568c6f5a] But later we see: [----] W, [2016-09-03T18:31:46.203494 #21068:11fb994] WARN -- : MIQ(MiqServer#wait_for_started_workers) After waiting 600 seconds, no longer waiting for the following workers to start:[----] W, [2016-09-03T18:31:46.203665 #21068:11fb994] WARN -- : Worker type: MiqVimBrokerWorker, pid: 21779, guid: cbfaf632-7224-11e6-8fe2-0050568c6f5a, status: starting If the worker's state isn't Started then it will not be returned from MiqWorker#find_current, which means connect will return "Broker Worker is not available" and it will not heartbeat causing it to be killed later. There was speculation that a "Bad Gateway" error caused the broker to lock-up somehow, however the first time the broker process hit this was well after the "Broker Worker is not available" error: Broker Worker not available error: [----] E, [2016-09-03T18:37:37.453312 #21619:11fb994] ERROR -- : Q-task_id([vm_reconfigure_task_1000000000003]) MIQ(ManageIQ::Providers::Vmware::InfraManager#connect) Broker Worker is not available First occurrence of "Bad Gateway" for process 21779 [----] E, [2016-09-03T18:40:36.455081 #21779:13961b4] ERROR -- : <VIM> Response <#<Handsoap::Response:0x0000000314e800 @http_body="<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body bgcolor=\"white\">\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx/1.6.3</center>\r\n</body>\r\n</html>\r\n", @soap_namespace="http://schemas.xmlsoap.org/soap/envelope/", @document=nil, @fault=nil>> has no XML document