Hide Forgot
In PROD, ApplicationsController#create sometimes throws: MCollective::UnknownRPCError: Cannot validate input fact: Unknown validator: 'typecheck'. Stacktrace: …/ruby193/root/usr/share/ruby/mcollective/rpc/ client.rb: 938:in `process_results_with_block' …/ruby193/root/usr/share/ruby/mcollective/rpc/ client.rb: 855:in `block in call_agent' …t/rh/ruby193/root/usr/share/ruby/mcollective/ client.rb: 156:in `block (2 levels) in req' …t/rh/ruby193/root/usr/share/ruby/mcollective/ client.rb: 151:in `loop' …t/rh/ruby193/root/usr/share/ruby/mcollective/ client.rb: 151:in `block in req' /opt/rh/ruby193/root/usr/share/ruby/ timeout.rb: 69:in `timeout' …t/rh/ruby193/root/usr/share/ruby/mcollective/ client.rb: 148:in `req' …/ruby193/root/usr/share/ruby/mcollective/rpc/ client.rb: 851:in `call_agent' …/ruby193/root/usr/share/ruby/mcollective/rpc/ client.rb: 244:in `method_missing' …b/openshift/ mcollective_application_container_proxy.rb:3213:in `block (2 levels) in rpc_get_fact' /opt/rh/ruby193/root/usr/share/ruby/ timeout.rb: 69:in `timeout' …b/openshift/ mcollective_application_container_proxy.rb:3212:in `block in rpc_get_fact' …b/openshift/ mcollective_application_container_proxy.rb:2326:in `rpc_exec' …b/openshift/ mcollective_application_container_proxy.rb:3206:in `rpc_get_fact' …b/openshift/ mcollective_application_container_proxy.rb:3021:in `rpc_find_available' …b/openshift/ mcollective_application_container_proxy.rb: 72:in `find_available_impl' …er-1.18.3/lib/openshift/ application_container_proxy.rb: 22:in `find_available' …/openshift-origin-controller-1.18.3/app/models/ gear.rb: 97:in `reserve_uid' …r-1.18.3/app/models/pending_ops/ reserve_gear_uid_op.rb: 8:in `execute' …n-controller-1.18.3/app/models/ pending_app_op_group.rb: 108:in `block in execute' …n-controller-1.18.3/app/models/ pending_app_op_group.rb: 97:in `each' …n-controller-1.18.3/app/models/ pending_app_op_group.rb: 97:in `execute' …ift-origin-controller-1.18.3/app/models/ application.rb:1511:in `run_jobs' …ift-origin-controller-1.18.3/app/models/ application.rb: 667:in `block in add_features' …ift-origin-controller-1.18.3/app/models/ application.rb:1581:in `run_in_application_lock' …ift-origin-controller-1.18.3/app/models/ application.rb: 651:in `add_features' …ift-origin-controller-1.18.3/app/models/ application.rb: 253:in `create_app' …ller-1.18.3/app/controllers/ applications_controller.rb: 157:in `create' …ntroller-1.18.3/lib/openshift/controller/ action_log.rb: 80:in `set_logged_request' …sr/share/gems/gems/journey-1.0.4/lib/journey/ router.rb: 68:in `block in call' …sr/share/gems/gems/journey-1.0.4/lib/journey/ router.rb: 56:in `each' …sr/share/gems/gems/journey-1.0.4/lib/journey/ router.rb: 56:in `call' …per-0.11.1/lib/mongo_mapper/middleware/ identity_map.rb: 10:in `call' …oid-3.0.21/lib/rack/mongoid/middleware/ identity_map.rb: 34:in `block in call' …e/gems/gems/mongoid-3.0.21/lib/mongoid/ unit_of_work.rb: 39:in `unit_of_work' …oid-3.0.21/lib/rack/mongoid/middleware/ identity_map.rb: 34:in `call' …3/root/usr/share/gems/gems/rack-1.4.1/lib/rack/ etag.rb: 23:in `call' …/share/gems/gems/rack-1.4.1/lib/rack/ conditionalget.rb: 35:in `call' …e/gems/gems/rack-1.4.1/lib/rack/session/abstract/ id.rb: 205:in `context' …e/gems/gems/rack-1.4.1/lib/rack/session/abstract/ id.rb: 200:in `call' …/share/gems/gems/rack-1.4.1/lib/rack/ methodoverride.rb: 21:in `call' …oot/usr/share/gems/gems/rack-1.4.1/lib/rack/ runtime.rb: 17:in `call' …3/root/usr/share/gems/gems/rack-1.4.1/lib/rack/ lock.rb: 15:in `call' …are/gems/gems/rack-cache-1.2/lib/rack/cache/ context.rb: 136:in `forward' …are/gems/gems/rack-cache-1.2/lib/rack/cache/ context.rb: 143:in `pass' …are/gems/gems/rack-cache-1.2/lib/rack/cache/ context.rb: 155:in `invalidate' …are/gems/gems/rack-cache-1.2/lib/rack/cache/ context.rb: 71:in `call!' …are/gems/gems/rack-cache-1.2/lib/rack/cache/ context.rb: 51:in `call' …r-3.0.21/lib/phusion_passenger/rack/ request_handler.rb: 97:in `process_request' …0.21/lib/phusion_passenger/ abstract_request_handler.rb: 521:in `accept_and_process_next_request' …0.21/lib/phusion_passenger/ abstract_request_handler.rb: 274:in `main_loop' …0.21/lib/phusion_passenger/rack/ application_spawner.rb: 206:in `start_request_handler' …0.21/lib/phusion_passenger/rack/ application_spawner.rb: 79:in `block in spawn_application' …s/gems/passenger-3.0.21/lib/phusion_passenger/ utils.rb: 470:in `safe_fork' …0.21/lib/phusion_passenger/rack/ application_spawner.rb: 64:in `spawn_application' …assenger-3.0.21/lib/phusion_passenger/ spawn_manager.rb: 264:in `spawn_rack_application' …assenger-3.0.21/lib/phusion_passenger/ spawn_manager.rb: 137:in `spawn_application' …assenger-3.0.21/lib/phusion_passenger/ spawn_manager.rb: 275:in `handle_spawn_application' …senger-3.0.21/lib/phusion_passenger/ abstract_server.rb: 357:in `server_main_loop' …senger-3.0.21/lib/phusion_passenger/ abstract_server.rb: 206:in `start_synchronously' Reference id for one occurrence: 32cd89ae595f22b97ce2566b6f1fd89b
See bug 958559
Moving to high, we believe this caused an outage over the weekend. Adding another error from the logs if it helps.
Created attachment 855812 [details] Sample error
Mike, Do you have a corresponding error from a node?
Created attachment 856828 [details] mco log from 312 Node 312 debug of mco log
Created attachment 856829 [details] corresponding node log for 312
I was curious to see if anything was occurring on the activemq servers. I was able to correspond a few of these mcollective restarts for operations to these messages in the logs. 2014-01-28 09:54:43,591 | WARN | Transport Connection to: tcp://10.180.134.221:15018 failed: java.io.IOException: Connection reset by peer | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ NIO Worker 6119819 2014-01-28 09:55:31,568 | WARN | Transport Connection to: tcp://10.180.134.221:26544 failed: java.io.IOException: Connection reset by peer | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ NIO Worker 6119919 I'm not sure if these are related but I've also seen some of these in the logs: 2014-01-28 06:57:13,046 | WARN | Exception occurred processing: :statusmsgI"OK:EF: data{: factI"active_capacity;F: content-type:text/plain; charset=UTF-8 valueI" destina3.90625;T:hash"%468d51d3f78134e8891b1c33cbbed826at.com_32002 content-length:203 ^H{ senderid""ex-c9-node47.prod.rhcloud.com...6e3162bbe4f7ollect.cloud.redhat.com_3145 Inactive for longer than 300000 ms - removing ... | org.apache.activemq.broker.region.Queue | ActiveMQ Broker[ex-msg2] Scheduler
Related bug filed by user: https://bugzilla.redhat.com/show_bug.cgi?id=1070635
*** Bug 1070635 has been marked as a duplicate of this bug. ***
Is there any information you'd like to collect us during an outage if it occurs again?
MCollective version 2.4 is being deployed in Sprint 2.0.42. Please reopen if issue is seen again.
Mcollective is upgraded to 2.4 on latest devenv, and after some testing, no regression issue found. Move the bug to verified.