Bug 1182581
Summary: | deployments run forever because foreman-tasks stops polling ( "IOError" in foreman's production.log ) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Mike Burns <mburns> | ||||||
Component: | ruby193-rubygem-dynflow | Assignee: | Ivan Necas <inecas> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Omri Hochman <ohochman> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | unspecified | CC: | aberezin, ajeain, dmacpher, dmaley, inecas, jstransk, juwu, mburns, ohochman, rhos-maint, sasha, sclewis, sgordon, sputhenp, yeylon | ||||||
Target Milestone: | ga | ||||||||
Target Release: | Installer | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | ruby193-rubygem-dynflow-0.7.3-3.el7ost | Doc Type: | Bug Fix | ||||||
Doc Text: |
When the foreman-tasks process disconnected with the database, polling would stop and deployments would run forever. This fix corrects the code to reconnect in the event of a disconnect. If the reconnect fails, the processes restart. Deployments now continue despite momentary issues in database connectivity.
|
Story Points: | --- | ||||||
Clone Of: | 1173634 | ||||||||
: | 1184630 (view as bug list) | Environment: | |||||||
Last Closed: | 2015-02-09 15:19:40 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1173634 | ||||||||
Bug Blocks: | 1177026, 1184630 | ||||||||
Attachments: |
|
Description
Mike Burns
2015-01-15 13:57:51 UTC
*** Bug 1182576 has been marked as a duplicate of this bug. *** Relevant output from /var/log/foreman/production.log: Started GET "/reports?search=eventful+%3D+true" for 10.10.50.242 at 2015-01-14 15:52:52 -0500 Processing by ReportsController#index as HTML Parameters: {"search"=>"eventful = true"} Rendered reports/_list.html.erb (15.9ms) Rendered reports/index.html.erb within layouts/application (17.0ms) Rendered common/_searchbar.html.erb (2.8ms) Rendered home/_user_dropdown.html.erb (1.2ms) Read fragment views/tabs_and_title_records-3 (0.1ms) Rendered home/_topbar.html.erb (1.8ms) Rendered layouts/base.html.erb (3.0ms) Completed 200 OK in 32ms (Views: 24.3ms | ActiveRecord: 1.9ms) stream closed (IOError) /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/adapters/postgres.rb:149:in `async_exec' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/adapters/postgres.rb:149:in `block in execute_query' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/database/logging.rb:33:in `log_yield' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/adapters/postgres.rb:149:in `execute_query' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/adapters/postgres.rb:136:in `block in execute' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/adapters/postgres.rb:115:in `check_disconnect_errors' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/adapters/postgres.rb:136:in `execute' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/adapters/postgres.rb:418:in `_execute' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/adapters/postgres.rb:246:in `block (2 levels) in execute' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/adapters/postgres.rb:430:in `check_database_errors' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/adapters/postgres.rb:246:in `block in execute' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/database/connecting.rb:236:in `block in synchronize' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/connection_pool/threaded.rb:104:in `hold' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/database/connecting.rb:236:in `synchronize' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/adapters/postgres.rb:246:in `execute' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/database/query.rb:79:in `execute_dui' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/dataset/actions.rb:861:in `execute_dui' /opt/rh/ruby193/root/usr/share/gems/gems/sequel-3.45.0/lib/sequel/dataset/actions.rb:774:in `update' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/persistence_adapters/sequel.rb:108:in `save' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/persistence_adapters/sequel.rb:56:in `save_step' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/persistence.rb:46:in `save_step' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/execution_plan/steps/abstract.rb:62:in `save' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/action.rb:266:in `save_state' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/action.rb:435:in `execute_run' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/action.rb:230:in `execute' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:9:in `block (2 levels) in execute' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/execution_plan/steps/abstract.rb:152:in `call' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/execution_plan/steps/abstract.rb:152:in `with_meta_calculation' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:8:in `block in execute' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:22:in `open_action' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:7:in `execute' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/executors/parallel/worker.rb:20:in `block in on_message' /opt/rh/ruby193/root/usr/share/gems/gems/algebrick-0.4.0/lib/algebrick.rb:859:in `block in assigns' /opt/rh/ruby193/root/usr/share/gems/gems/algebrick-0.4.0/lib/algebrick.rb:858:in `tap' /opt/rh/ruby193/root/usr/share/gems/gems/algebrick-0.4.0/lib/algebrick.rb:858:in `assigns' /opt/rh/ruby193/root/usr/share/gems/gems/algebrick-0.4.0/lib/algebrick.rb:138:in `match_value' /opt/rh/ruby193/root/usr/share/gems/gems/algebrick-0.4.0/lib/algebrick.rb:116:in `block in match' /opt/rh/ruby193/root/usr/share/gems/gems/algebrick-0.4.0/lib/algebrick.rb:115:in `each' /opt/rh/ruby193/root/usr/share/gems/gems/algebrick-0.4.0/lib/algebrick.rb:115:in `match' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/executors/parallel/worker.rb:17:in `on_message' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/micro_actor.rb:82:in `on_envelope' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/micro_actor.rb:72:in `receive' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/micro_actor.rb:99:in `block (2 levels) in run' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/micro_actor.rb:99:in `loop' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/micro_actor.rb:99:in `block in run' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/micro_actor.rb:99:in `catch' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/micro_actor.rb:99:in `run' /opt/rh/ruby193/root/usr/share/gems/gems/dynflow-0.7.3/lib/dynflow/micro_actor.rb:13:in `block in initialize' @dcleal -- can you get this the attention needed on the satellite/foreman side? Reproduced on BM environment: The puppet didn't run after the OS was provisioned. A host was "waituntilready" in dynamic flow. restarting the foreman-tasks service on that system manually caused the puppet to run on the controllers. The latest occurrence was doing a "WaitUntilReady" step. This step polls the host for the ssh port to be open. The host was completely up, we could ssh in, etc. Workaround used: restart foreman-tasks service. Resume the deployment (resume dynflow). Host was immediately recognized as up and the deployment succeeded. Reproduced again on a QE lab machine, this time a different action was running at the time of the exception - the PuppetRun action. Created attachment 981563 [details]
foreman production.log with IOError
Created attachment 981564 [details]
sosreport
A set of patches to help with the issue: https://github.com/Dynflow/dynflow/pull/135 With this changes, it should; 1. Retry several times when db error occurs 2. Terminate when the retries don't help: the foreman-tasks service will start the executor again automatically, switching the status of hang tasks to paused 3. Let the executor to finish just after the workers are done: don't wait for the whole task to finish. Unable to reproduce using: ruby193-rubygem-dynflow-0.7.3-3.el7ost.noarch rhel-osp-installer-0.5.5-2.el7ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0156.html |