Bug 1479582

Summary: HTTP 429 Too Many Requests Error during Refresh of OpenShift Provider
Product: Red Hat CloudForms Management Engine Reporter: myoder
Component: ProvidersAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED DUPLICATE QA Contact: Einat Pacifici <epacific>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.7.0CC: agrare, bazulay, cben, erich, gblomqui, iheim, jfrey, jhardy, myoder, obarenbo
Target Milestone: GA   
Target Release: cfme-future   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: container
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-30 13:11:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: Bug
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: Container Management Target Upstream Version:
Embargoed:

Description myoder 2017-08-08 21:17:43 UTC
Description of problem:

During a refresh of OCP provider, getting a 429 'Too Many Requests' error.

[----] I, [2017-08-03T03:15:15.975151 #22132:843140]  INFO -- : MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager::Refresher#refresh) Refreshing all targets...
[----] I, [2017-08-03T03:15:15.975276 #22132:843140]  INFO -- : MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager::Refresher#refresh) EMS: [CTC Core], id: [10000000000001] Refreshing targets for EMS...
[----] I, [2017-08-03T03:15:15.975368 #22132:843140]  INFO -- : MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager::Refresher#refresh) EMS: [CTC Core], id: [10000000000001]   ManageIQ::Providers::OpenshiftEnterprise::Containe
rManager [CTC Core] id [10000000000001]
[----] I, [2017-08-03T03:15:15.975470 #22132:843140]  INFO -- : MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager::Refresher#refresh_targets_for_ems) EMS: [CTC Core], id: [10000000000001] Refreshing target ManageIQ::Provider
s::OpenshiftEnterprise::ContainerManager [CTC Core] id [10000000000001]...
[----] I, [2017-08-03T03:15:15.975559 #22132:843140]  INFO -- : MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager#with_provider_connection) Connecting through ManageIQ::Providers::OpenshiftEnterprise::ContainerManager: [CTC 
Core]
[----] E, [2017-08-03T03:15:16.670336 #22132:843140] ERROR -- : MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager::Refresher#refresh) EMS: [CTC Core], id: [10000000000001] Refresh failed
[----] E, [2017-08-03T03:15:16.670655 #22132:843140] ERROR -- : [KubeException]: 429 Too Many Requests  Method:[rescue in block in refresh]
[----] E, [2017-08-03T03:15:16.670791 #22132:843140] ERROR -- : /opt/rh/cfme-gemset/gems/kubeclient-2.3.0/lib/kubeclient/common.rb:107:in `rescue in handle_exception'
/opt/rh/cfme-gemset/gems/kubeclient-2.3.0/lib/kubeclient/common.rb:99:in `handle_exception'
/opt/rh/cfme-gemset/gems/kubeclient-2.3.0/lib/kubeclient/common.rb:438:in `fetch_entities'
/opt/rh/cfme-gemset/gems/kubeclient-2.3.0/lib/kubeclient/common.rb:428:in `load_entities'
/opt/rh/cfme-gemset/gems/kubeclient-2.3.0/lib/kubeclient/common.rb:111:in `discover'
/opt/rh/cfme-gemset/gems/kubeclient-2.3.0/lib/kubeclient/common.rb:78:in `method_missing'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresher_mixin.rb:16:in `block in fetch_entities'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresher_mixin.rb:14:in `each'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresher_mixin.rb:14:in `each_with_object'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresher_mixin.rb:14:in `fetch_entities'
/var/www/miq/vmdb/app/models/manageiq/providers/openshift/container_manager/refresher.rb:17:in `block in parse_legacy_inventory'
/var/www/miq/vmdb/app/models/ext_management_system.rb:362:in `with_provider_connection'
/var/www/miq/vmdb/app/models/manageiq/providers/openshift/container_manager/refresher.rb:16:in `parse_legacy_inventory'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:122:in `block in parse_targeted_inventory'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:122:in `parse_targeted_inventory'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:87:in `block in refresh_targets_for_ems'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:86:in `refresh_targets_for_ems'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:24:in `block (2 levels) in refresh'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:24:in `block in refresh'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:14:in `each'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:14:in `refresh'
/var/www/miq/vmdb/app/models/manageiq/providers/base_manager/refresher.rb:10:in `refresh'
/var/www/miq/vmdb/app/models/ems_refresh.rb:91:in `block in refresh'
/var/www/miq/vmdb/app/models/ems_refresh.rb:90:in `each'
/var/www/miq/vmdb/app/models/ems_refresh.rb:90:in `refresh'
/var/www/miq/vmdb/app/models/miq_queue.rb:347:in `block in deliver'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:91:in `block in timeout'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `block in catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:106:in `timeout'
/var/www/miq/vmdb/app/models/miq_queue.rb:343:in `deliver'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:106:in `deliver_queue_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:152:in `block in do_work'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `loop'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `do_work'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:334:in `block in do_work_loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `do_work_loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:153:in `run'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:128:in `start'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:21:in `start_worker'
/var/www/miq/vmdb/app/models/miq_worker.rb:343:in `block in start'
/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
/var/www/miq/vmdb/app/models/miq_worker.rb:341:in `start'
/var/www/miq/vmdb/app/models/miq_worker.rb:270:in `start_worker'
/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:68:in `start_worker_for_ems'
/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:46:in `block in sync_workers'
/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:45:in `each'
/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:45:in `sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:52:in `block in sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `each'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:22:in `monitor_workers'
/var/www/miq/vmdb/app/models/miq_server.rb:346:in `block in monitor'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
/var/www/miq/vmdb/app/models/miq_server.rb:346:in `monitor'
/var/www/miq/vmdb/app/models/miq_server.rb:368:in `block (2 levels) in monitor_loop'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
/var/www/miq/vmdb/app/models/miq_server.rb:368:in `block in monitor_loop'
/var/www/miq/vmdb/app/models/miq_server.rb:367:in `loop'
/var/www/miq/vmdb/app/models/miq_server.rb:367:in `monitor_loop'
/var/www/miq/vmdb/app/models/miq_server.rb:250:in `start'
/var/www/miq/vmdb/lib/workers/evm_server.rb:65:in `start'
/var/www/miq/vmdb/lib/workers/evm_server.rb:92:in `start'
/var/www/miq/vmdb/lib/workers/bin/evm_server.rb:4:in `<main>'
[----] E, [2017-08-03T03:15:16.670899 #22132:843140] ERROR -- : MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager::Refresher#refresh) EMS: [CTC Core], id: [10000000000001] Unable to perform refresh for the following targets:
[----] E, [2017-08-03T03:15:16.671046 #22132:843140] ERROR -- : MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager::Refresher#refresh)  --- ManageIQ::Providers::OpenshiftEnterprise::ContainerManager [CTC Core] id [10000000000
001]
[----] I, [2017-08-03T03:15:16.700872 #22132:843140]  INFO -- : MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager::Refresher#refresh) Refreshing all targets...Complete
[----] E, [2017-08-03T03:15:16.701124 #22132:843140] ERROR -- : MIQ(MiqQueue#deliver) Message id: [10000366715709], Error: [HTTP status code 429, 429 Too Many Requests]
[----] E, [2017-08-03T03:15:16.701282 #22132:843140] ERROR -- : [EmsRefresh::Refreshers::EmsRefresherMixin::PartialRefreshError]: HTTP status code 429, 429 Too Many Requests  Method:[rescue in deliver]
[----] E, [2017-08-03T03:15:16.701439 #22132:843140] ERROR -- : /var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:50:in `refresh'

/var/www/miq/vmdb/app/models/manageiq/providers/base_manager/refresher.rb:10:in `refresh'
/var/www/miq/vmdb/app/models/ems_refresh.rb:91:in `block in refresh'
/var/www/miq/vmdb/app/models/ems_refresh.rb:90:in `each'
/var/www/miq/vmdb/app/models/ems_refresh.rb:90:in `refresh'
/var/www/miq/vmdb/app/models/miq_queue.rb:347:in `block in deliver'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:91:in `block in timeout'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `block in catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:106:in `timeout'
/var/www/miq/vmdb/app/models/miq_queue.rb:343:in `deliver'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:106:in `deliver_queue_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:152:in `block in do_work'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `loop'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `do_work'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:334:in `block in do_work_loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `do_work_loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:153:in `run'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:128:in `start'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:21:in `start_worker'
/var/www/miq/vmdb/app/models/miq_worker.rb:343:in `block in start'
/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
/var/www/miq/vmdb/app/models/miq_worker.rb:341:in `start'
/var/www/miq/vmdb/app/models/miq_worker.rb:270:in `start_worker'
/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:68:in `start_worker_for_ems'
/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:46:in `block in sync_workers'
/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:45:in `each'
/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:45:in `sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:52:in `block in sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `each'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:22:in `monitor_workers'
/var/www/miq/vmdb/app/models/miq_server.rb:346:in `block in monitor'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
/var/www/miq/vmdb/app/models/miq_server.rb:346:in `monitor'
/var/www/miq/vmdb/app/models/miq_server.rb:368:in `block (2 levels) in monitor_loop'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
/var/www/miq/vmdb/app/models/miq_server.rb:368:in `block in monitor_loop'
/var/www/miq/vmdb/app/models/miq_server.rb:367:in `loop'
/var/www/miq/vmdb/app/models/miq_server.rb:367:in `monitor_loop'
/var/www/miq/vmdb/app/models/miq_server.rb:250:in `start'
/var/www/miq/vmdb/lib/workers/evm_server.rb:65:in `start'
/var/www/miq/vmdb/lib/workers/evm_server.rb:92:in `start'
/var/www/miq/vmdb/lib/workers/bin/evm_server.rb:4:in `<main>'

Version-Release number of selected component (if applicable):
CFME 5.7.1.3

How reproducible:
customer environment

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 5 Eric Rich 2017-08-11 14:08:38 UTC
This might be similar to https://bugzilla.redhat.com/show_bug.cgi?id=1472532

The fix could involve any of the following: https://bugzilla.redhat.com/show_bug.cgi?id=1472532#c1

Comment 6 Eric Rich 2017-08-11 14:36:22 UTC
(In reply to Eric Rich from comment #5)
See config changes posted with https://bugzilla.redhat.com/show_bug.cgi?id=1472532#c5

Comment 11 Beni Paskin-Cherniavsky 2017-08-16 12:10:20 UTC
I don't know if CFME refresh has backoff logic (none in the Openshift provider or client, there might be some in core refresh scheduling?).
But it's not doing that many requests — about 20 (serial) per refresh.
I only see 3 refreshes fail, 2 with ECONNREFUSED, 1 with 429 Too Many Requests.

However I see 169 start_event_monitor attempts failing ECONNREFUSED, retrying about once per second, which might contribute to load on Openshift API.

other errors:
- workers not starting because exceeded 60% of swap
- 5 times: RefreshWorker has not responded in 7215 seconds, restarting worker 
- MiqPriorityWorker 38 messages: timed out after 600 seconds
- tons of metrics collection errors - a mix of:
  - Hawkular metrics service unavailable: Connection reset by peer - SSL_connect
  - Hawkular metrics service unavailable: <html>...JBWEB000065: HTTP Status 403...Access to the specified resource has been forbidden.

(P.S. tip for searching the log: grep -v last_refresh_error)

=> I'd say CFME itself is not impacted – it tolerates the errors well by retrying – too well and ought to have some backoff.
In this specific case honoring 429 Retry-After header would not help, most errors are ECONNREFUSED (which might reflect Openshift being even more overloaded?).  Could do universal backoff on any error.

=> This appliance needs more RAM.  Being deep into swap can make it painfully slow and exacerbate timeouts...