Bug 1044638

Summary: [new relic] Cannot validate input fact: Unknown validator: 'typecheck'.
Product: OpenShift Online Reporter: Jessica Forrester <jforrest>
Component: ContainersAssignee: Jhon Honce <jhonce>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 1.xCC: bmeng, dmcphers, kwoodson, mmahut, mmcgrath, xtian
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-04-09 15:17:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1073161    
Attachments:
Description Flags
Sample error
none
mco log from 312
none
corresponding node log for 312 none

Description Jessica Forrester 2013-12-18 17:55:14 UTC
In PROD, ApplicationsController#create sometimes throws:  MCollective::UnknownRPCError: Cannot validate input fact: Unknown validator: 'typecheck'.

Stacktrace:
…/ruby193/root/usr/share/ruby/mcollective/rpc/
client.rb: 938:in `process_results_with_block'
…/ruby193/root/usr/share/ruby/mcollective/rpc/
client.rb: 855:in `block in call_agent'
…t/rh/ruby193/root/usr/share/ruby/mcollective/
client.rb: 156:in `block (2 levels) in req'
…t/rh/ruby193/root/usr/share/ruby/mcollective/
client.rb: 151:in `loop'
…t/rh/ruby193/root/usr/share/ruby/mcollective/
client.rb: 151:in `block in req'
         /opt/rh/ruby193/root/usr/share/ruby/
timeout.rb:  69:in `timeout'
…t/rh/ruby193/root/usr/share/ruby/mcollective/
client.rb: 148:in `req'
…/ruby193/root/usr/share/ruby/mcollective/rpc/
client.rb: 851:in `call_agent'
…/ruby193/root/usr/share/ruby/mcollective/rpc/
client.rb: 244:in `method_missing'
…b/openshift/
mcollective_application_container_proxy.rb:3213:in `block (2 levels) in rpc_get_fact'
         /opt/rh/ruby193/root/usr/share/ruby/
timeout.rb:  69:in `timeout'
…b/openshift/
mcollective_application_container_proxy.rb:3212:in `block in rpc_get_fact'
…b/openshift/
mcollective_application_container_proxy.rb:2326:in `rpc_exec'
…b/openshift/
mcollective_application_container_proxy.rb:3206:in `rpc_get_fact'
…b/openshift/
mcollective_application_container_proxy.rb:3021:in `rpc_find_available'
…b/openshift/
mcollective_application_container_proxy.rb:  72:in `find_available_impl'
…er-1.18.3/lib/openshift/
application_container_proxy.rb:  22:in `find_available'
…/openshift-origin-controller-1.18.3/app/models/
gear.rb:  97:in `reserve_uid'
…r-1.18.3/app/models/pending_ops/
reserve_gear_uid_op.rb:   8:in `execute'
…n-controller-1.18.3/app/models/
pending_app_op_group.rb: 108:in `block in execute'
…n-controller-1.18.3/app/models/
pending_app_op_group.rb:  97:in `each'
…n-controller-1.18.3/app/models/
pending_app_op_group.rb:  97:in `execute'
…ift-origin-controller-1.18.3/app/models/
application.rb:1511:in `run_jobs'
…ift-origin-controller-1.18.3/app/models/
application.rb: 667:in `block in add_features'
…ift-origin-controller-1.18.3/app/models/
application.rb:1581:in `run_in_application_lock'
…ift-origin-controller-1.18.3/app/models/
application.rb: 651:in `add_features'
…ift-origin-controller-1.18.3/app/models/
application.rb: 253:in `create_app'
…ller-1.18.3/app/controllers/
applications_controller.rb: 157:in `create'
…ntroller-1.18.3/lib/openshift/controller/
action_log.rb:  80:in `set_logged_request'
…sr/share/gems/gems/journey-1.0.4/lib/journey/
router.rb:  68:in `block in call'
…sr/share/gems/gems/journey-1.0.4/lib/journey/
router.rb:  56:in `each'
…sr/share/gems/gems/journey-1.0.4/lib/journey/
router.rb:  56:in `call'
…per-0.11.1/lib/mongo_mapper/middleware/
identity_map.rb:  10:in `call'
…oid-3.0.21/lib/rack/mongoid/middleware/
identity_map.rb:  34:in `block in call'
…e/gems/gems/mongoid-3.0.21/lib/mongoid/
unit_of_work.rb:  39:in `unit_of_work'
…oid-3.0.21/lib/rack/mongoid/middleware/
identity_map.rb:  34:in `call'
…3/root/usr/share/gems/gems/rack-1.4.1/lib/rack/
etag.rb:  23:in `call'
…/share/gems/gems/rack-1.4.1/lib/rack/
conditionalget.rb:  35:in `call'
…e/gems/gems/rack-1.4.1/lib/rack/session/abstract/
id.rb: 205:in `context'
…e/gems/gems/rack-1.4.1/lib/rack/session/abstract/
id.rb: 200:in `call'
…/share/gems/gems/rack-1.4.1/lib/rack/
methodoverride.rb:  21:in `call'
…oot/usr/share/gems/gems/rack-1.4.1/lib/rack/
runtime.rb:  17:in `call'
…3/root/usr/share/gems/gems/rack-1.4.1/lib/rack/
lock.rb:  15:in `call'
…are/gems/gems/rack-cache-1.2/lib/rack/cache/
context.rb: 136:in `forward'
…are/gems/gems/rack-cache-1.2/lib/rack/cache/
context.rb: 143:in `pass'
…are/gems/gems/rack-cache-1.2/lib/rack/cache/
context.rb: 155:in `invalidate'
…are/gems/gems/rack-cache-1.2/lib/rack/cache/
context.rb:  71:in `call!'
…are/gems/gems/rack-cache-1.2/lib/rack/cache/
context.rb:  51:in `call'
…r-3.0.21/lib/phusion_passenger/rack/
request_handler.rb:  97:in `process_request'
…0.21/lib/phusion_passenger/
abstract_request_handler.rb: 521:in `accept_and_process_next_request'
…0.21/lib/phusion_passenger/
abstract_request_handler.rb: 274:in `main_loop'
…0.21/lib/phusion_passenger/rack/
application_spawner.rb: 206:in `start_request_handler'
…0.21/lib/phusion_passenger/rack/
application_spawner.rb:  79:in `block in spawn_application'
…s/gems/passenger-3.0.21/lib/phusion_passenger/
utils.rb: 470:in `safe_fork'
…0.21/lib/phusion_passenger/rack/
application_spawner.rb:  64:in `spawn_application'
…assenger-3.0.21/lib/phusion_passenger/
spawn_manager.rb: 264:in `spawn_rack_application'
…assenger-3.0.21/lib/phusion_passenger/
spawn_manager.rb: 137:in `spawn_application'
…assenger-3.0.21/lib/phusion_passenger/
spawn_manager.rb: 275:in `handle_spawn_application'
…senger-3.0.21/lib/phusion_passenger/
abstract_server.rb: 357:in `server_main_loop'
…senger-3.0.21/lib/phusion_passenger/
abstract_server.rb: 206:in `start_synchronously'


Reference id for one occurrence: 32cd89ae595f22b97ce2566b6f1fd89b

Comment 2 Abhishek Gupta 2013-12-20 18:04:41 UTC
See bug 958559

Comment 3 Mike McGrath 2014-01-26 22:01:29 UTC
Moving to high, we believe this caused an outage over the weekend.  Adding another error from the logs if it helps.

Comment 4 Mike McGrath 2014-01-26 22:02:17 UTC
Created attachment 855812 [details]
Sample error

Comment 5 Dan McPherson 2014-01-26 22:27:32 UTC
Mike,

  Do you have a corresponding error from a node?

Comment 6 Kenny Woodson 2014-01-28 21:06:42 UTC
Created attachment 856828 [details]
mco log from 312

Node 312 debug of mco log

Comment 7 Kenny Woodson 2014-01-28 21:07:44 UTC
Created attachment 856829 [details]
corresponding node log for 312

Comment 8 Kenny Woodson 2014-01-28 21:17:43 UTC
I was curious to see if anything was occurring on the activemq servers.  I was able to correspond a few of these mcollective restarts for operations to these messages in the logs.

2014-01-28 09:54:43,591 | WARN  | Transport Connection to: tcp://10.180.134.221:15018 failed: java.io.IOException: Connection reset by peer | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ NIO Worker 6119819

2014-01-28 09:55:31,568 | WARN  | Transport Connection to: tcp://10.180.134.221:26544 failed: java.io.IOException: Connection reset by peer | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ NIO Worker 6119919


I'm not sure if these are related but I've also seen some of these in the logs:

2014-01-28 06:57:13,046 | WARN  | Exception occurred processing: 
:statusmsgI"OK:EF:      data{:  factI"active_capacity;F:
content-type:text/plain; charset=UTF-8
valueI"
destina3.90625;T:hash"%468d51d3f78134e8891b1c33cbbed826at.com_32002
content-length:203

^H{
senderid""ex-c9-node47.prod.rhcloud.com...6e3162bbe4f7ollect.cloud.redhat.com_3145 Inactive for longer than 300000 ms - removing ... | org.apache.activemq.broker.region.Queue | ActiveMQ Broker[ex-msg2] Scheduler

Comment 10 Xiaoli Tian 2014-02-27 11:13:30 UTC
Related bug filed by user: https://bugzilla.redhat.com/show_bug.cgi?id=1070635

Comment 11 Dan McPherson 2014-02-27 14:14:07 UTC
*** Bug 1070635 has been marked as a duplicate of this bug. ***

Comment 13 Marek Mahut 2014-03-07 15:10:07 UTC
Is there any information you'd like to collect us during an outage if it occurs again?

Comment 14 Jhon Honce 2014-03-17 21:40:55 UTC
MCollective version 2.4 is being deployed in Sprint 2.0.42.  Please reopen if issue is seen again.

Comment 15 Meng Bo 2014-03-18 09:59:16 UTC
Mcollective is upgraded to 2.4 on latest devenv, and after some testing, no regression issue found.

Move the bug to verified.