Bug 1505401 - "Event Monitor Thread" crash during the event processing from OpenStack via AMQP
Summary: "Event Monitor Thread" crash during the event processing from OpenStack via AMQP
Keywords:
Status: CLOSED DUPLICATE of bug 1451437
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.8.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: GA
: cfme-future
Assignee: Tzu-Mainn Chen
QA Contact: Dave Johnson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-23 13:55 UTC by Andrea Perotti
Modified: 2021-03-11 16:04 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-23 15:53:42 UTC
Category: Bug
Cloudforms Team: Openstack
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andrea Perotti 2017-10-23 13:55:53 UTC
Description of problem:

AMQP, it seems that "Event Monitor Thread" crash during the event processing:

Appliance is running 5.8.1.5.  Exception caught here:

	[----] I, [2017-10-18T10:47:28.949504 #8485:bf1134]  INFO -- : MIQ(ManageIQ::Providers::Openstack::CloudManager::EventCatcher::Runner#queue_event) EMS [apidev.fastcloud.it] as [cloudforms] Caught event []
	[----] I, [2017-10-18T10:47:28.957821 #8485:bf1134]  INFO -- : EMS [apidev.fastcloud.it] as [cloudforms] ID [1000000002120] PID [8485] GUID [5fef097a-b3e0-11e7-a1dc-005056a52c50] An error has occurred during work processing: undefined method `key?' for nil:NilClass
	/var/www/miq/vmdb/app/models/manageiq/providers/openstack/cloud_manager/event_parser.rb:19:in `event_to_hash'
	/var/www/miq/vmdb/app/models/manageiq/providers/openstack/cloud_manager/event_catcher/runner.rb:5:in `add_openstack_queue'
	/var/www/miq/vmdb/app/models/manageiq/providers/openstack/event_catcher_mixin.rb:69:in `queue_event'
	/var/www/miq/vmdb/app/models/manageiq/providers/base_manager/event_catcher/runner.rb:131:in `process_event'
	/var/www/miq/vmdb/app/models/manageiq/providers/base_manager/event_catcher/runner.rb:193:in `block in process_events'
	/var/www/miq/vmdb/app/models/manageiq/providers/base_manager/event_catcher/runner.rb:191:in `each'
	/var/www/miq/vmdb/app/models/manageiq/providers/base_manager/event_catcher/runner.rb:191:in `process_events'
	/var/www/miq/vmdb/app/models/manageiq/providers/base_manager/event_catcher/runner.rb:210:in `do_work'
	/var/www/miq/vmdb/app/models/miq_worker/runner.rb:340:in `block in do_work_loop'
	/var/www/miq/vmdb/app/models/miq_worker/runner.rb:337:in `loop'
	/var/www/miq/vmdb/app/models/miq_worker/runner.rb:337:in `do_work_loop'
	/var/www/miq/vmdb/app/models/miq_worker/runner.rb:160:in `run'
	/var/www/miq/vmdb/app/models/miq_worker/runner.rb:134:in `start'
	/var/www/miq/vmdb/app/models/miq_worker/runner.rb:21:in `start_worker'
	/var/www/miq/vmdb/app/models/miq_worker.rb:339:in `block in start_runner'
	/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
	/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
	/var/www/miq/vmdb/app/models/miq_worker.rb:337:in `start_runner'
	/var/www/miq/vmdb/app/models/miq_worker.rb:348:in `start'
	/var/www/miq/vmdb/app/models/miq_worker.rb:266:in `start_worker'
	/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:68:in `start_worker_for_ems'
	/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:46:in `block in sync_workers'
	/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:45:in `each'
	/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:45:in `sync_workers'
	/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:53:in `block in sync_workers'
	/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `each'
	/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `sync_workers'
	/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:22:in `monitor_workers'
	/var/www/miq/vmdb/app/models/miq_server.rb:349:in `block in monitor'
	/opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-8f7d63f441d5/lib/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
	/opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-8f7d63f441d5/lib/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
	/var/www/miq/vmdb/app/models/miq_server.rb:349:in `monitor'
	/var/www/miq/vmdb/app/models/miq_server.rb:371:in `block (2 levels) in monitor_loop'
	/opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-8f7d63f441d5/lib/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
	/opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-8f7d63f441d5/lib/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
	/var/www/miq/vmdb/app/models/miq_server.rb:371:in `block in monitor_loop'
	/var/www/miq/vmdb/app/models/miq_server.rb:370:in `loop'
	/var/www/miq/vmdb/app/models/miq_server.rb:370:in `monitor_loop'
	/var/www/miq/vmdb/app/models/miq_server.rb:253:in `start'
	/var/www/miq/vmdb/lib/workers/evm_server.rb:65:in `start'
	/var/www/miq/vmdb/lib/workers/evm_server.rb:91:in `start'
	/var/www/miq/vmdb/lib/workers/bin/evm_server.rb:4:in `<main>'

So we are likely looping through keys in a hash when we call key? and the object is a nilClass instead of the actual Hash object we expect it to be.

Looking at the event catcher:

	module ManageIQ::Providers::Openstack::EventCatcherMixin
	- - - - 8< - - - -
	  def queue_event(event)
	    _log.info "#{log_prefix} Caught event [#{event.payload["event_type"]}]"

	    event_hash = {}
	    # copy content
	    content = event.payload
	    event_hash[:content] = content.reject { |k, _v| k.start_with? "_context_" }

	    # copy context
	    event_hash[:context] = {}
	    content.select { |k, _v| k.start_with? "_context_" }.each_pair do |k, v|
	      event_hash[:context][k] = v
	    end

	    # copy attributes
	    event_hash[:user_id]      = event.metadata[:user_id]
	    event_hash[:priority]     = event.metadata[:priority]
	    event_hash[:content_type] = event.metadata[:content_type]
	    add_openstack_queue(event_hash)
	  end

We call add_openstack_queue with an event_hash as an argument:

	class ManageIQ::Providers::Openstack::CloudManager::EventCatcher::Runner < ManageIQ::Providers::BaseManager::EventCatcher::Runner
	  include ManageIQ::Providers::Openstack::EventCatcherMixin

	  def add_openstack_queue(event)
	    event_hash = ManageIQ::Providers::Openstack::CloudManager::EventParser.event_to_hash(event, @cfg[:ems_id])
	    EmsEvent.add_queue('add', @cfg[:ems_id], event_hash)
	  end
	end

We take the event_hash and call event_to_hash on it from our EventParser - we except in here then:

	module ManageIQ::Providers::Openstack::CloudManager::EventParser
	  def self.event_to_hash(event, ems_id)
	    log_header = "ems_id: [#{ems_id}] " unless ems_id.nil?

	    _log.debug("#{log_header}event: [#{event[:content]["event_type"]}]")

	    # attributes that are common to all notifications
	    event_hash = {
	      :event_type => event[:content]["event_type"],
	      :source     => "OPENSTACK",
	      :message    => event[:payload],
	      :timestamp  => event[:content]["timestamp"],
	      :username   => event[:content]["_context_user_name"],
	      :full_data  => event,
	      :ems_id     => ems_id
	    }

	    payload = event[:content]["payload"]
	    event_hash[:vm_ems_ref] = payload["instance_id"]                      if payload.key? "instance_id"
	    event_hash[:host_ems_ref] = payload["host"]                           if payload.key? "host"
	    event_hash[:availability_zone_ems_ref] = payload["availability_zone"] if payload.key? "availability_zone"
	    event_hash[:chain_id] = payload["reservation_id"]                     if payload.key? "reservation_id"
	    event_hash
	  end
	end

So in our payload.key? which is the first time we print a hash value, we check for the instance_id of the event and, as it is not a hash, it excepts and travels back up the stack to a rescue.

Good news, the debug logs in the OpenStack cloud provider will tell us about the event, it's content, and what type of event it is because of this line:

	    _log.debug("#{log_header}event: [#{event[:content]["event_type"]}]")



Version-Release number of selected component (if applicable):
Appliance is running 5.8.1.5

How reproducible:
Always

Comment 4 Tzu-Mainn Chen 2017-10-23 15:53:42 UTC

*** This bug has been marked as a duplicate of bug 1451437 ***


Note You need to log in before you can comment on or make changes to this bug.