Bug 1183802 - rubygem-staypuft: PuppetRun action executes but no request sent to Foreman Proxy (casue staypuft deployment to hang).
Summary: rubygem-staypuft: PuppetRun action executes but no request sent to Foreman Pr...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: foreman-proxy
Version: 6.0 (Juno)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ga
: Installer
Assignee: Dominic Cleal
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-19 21:08 UTC by Omri Hochman
Modified: 2023-02-22 23:02 UTC (History)
7 users (show)

Fixed In Version: ruby193-rubygem-pg
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1184633 (view as bug list)
Environment:
Last Closed: 2015-02-10 15:14:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
production.log (1.12 MB, text/plain)
2015-01-19 21:08 UTC, Omri Hochman
no flags Details
dynflow console (114.42 KB, text/html)
2015-01-20 14:55 UTC, Jiri Stransky
no flags Details
foreman proxy log (15.61 KB, text/plain)
2015-01-20 14:55 UTC, Jiri Stransky
no flags Details
dynflow_executor.output (152 bytes, text/plain)
2015-01-20 14:56 UTC, Jiri Stransky
no flags Details
sosreport (5.57 MB, application/x-xz)
2015-01-20 14:57 UTC, Jiri Stransky
no flags Details

Description Omri Hochman 2015-01-19 21:08:51 UTC
Created attachment 981615 [details]
production.log

rubygem-staypuft: puppet-agent is not being triggered on 1 out of 3 controllers in HA deployment (Pacemaker unable to connect to pcsd on target host) 

* It seems like it might be the same symptoms that we already saw on: Bug 1173634. 

Environment:
-------------
rhel-osp-installer-0.5.5-2.el7ost.noarch
foreman-1.6.0.49-4.el7ost.noarch
foreman-installer-1.6.0-0.2.RC1.el7ost.noarch
openstack-puppet-modules-2014.2.8-1.el7ost.noarch
puppet-3.6.2-2.el7.noarch
puppet-server-3.6.2-2.el7.noarch

Description:
------------- 
HA-Neutron-GRE deployment got hang forever - looking into the controllers machines, It seems that the puppet-agent wan't ever triggered on 1-out-of-3 controllers - due to this problem 'pcs was not installed' and pcmk from other controller coudldn't connect to it: 


/var/log/messages :
-------------------
(one of the 2 controllers where puppet-agent triggered on) :
-------------------------------------------------------------
Jan 19 14:40:15 maca25400702875 puppet-agent[10611]: (/Stage[main]/Pacemaker::Corosync/Exec[auth-successful-across-all-nodes]/returns) Error: unable to connec
t to pcsd on pcmk-maca25400702876
Jan 19 14:40:15 maca25400702875 puppet-agent[10611]: (/Stage[main]/Pacemaker::Corosync/Exec[auth-successful-across-all-nodes]/returns) Unable to connect to pc
mk-maca25400702876 ([Errno 111] Connection refused)
Jan 19 14:40:15 maca25400702875 puppet-agent[10611]: /usr/sbin/pcs cluster auth pcmk-maca25400702876 pcmk-maca25400702877 pcmk-maca25400702875 -u hacluster -p
 CHANGEME --force returned 1 instead of one of [0]
 

No Errors in production.log (file attached)  
--------------------------------------------
dynflow view :
--------------
52: Actions::Staypuft::Host::PuppetRun (success) [ 0.14s / 0.14s ]
54: Actions::Staypuft::Host::ReportWait (success) [ 4668.06s / 10.29s ]
57: Actions::Staypuft::Host::PuppetRun (success) [ 0.02s / 0.02s ]
59: Actions::Staypuft::Host::ReportWait (suspended) [ 8152.41s / 18.29s ]
62: Actions::Staypuft::Host::PuppetRun (pending)

------------------------------------------------------------
 59: Actions::Staypuft::Host::ReportWait (suspended) [ 8152.41s / 18.29s ]
Started at: 2015-01-19 18:29:51 UTC
Ended at: 2015-01-19 20:45:44 UTC
Real time: 8152.41s
Execution time (excluding suspended state): 18.29s
Input:
---
host_id: 2
after: '2015-01-19T13:29:51-05:00'
current_user_id: 3
Output:
---
status: false
poll_attempts:
  total: 1625
  failed: 0
-------------------------------------------------------
62: Actions::Staypuft::Host::PuppetRun (pending)
Started at:
Ended at:
Real time: 0.00s
Execution time (excluding suspended state): 0.00s
Input:
---
host_id: 4
name: maca25400702875.example.com
current_user_id: 3
Output:

Comment 2 Jiri Stransky 2015-01-20 14:53:11 UTC
Dynflow console says that the PuppetRun action was executed for all 3 hosts, but only 2 request reached Foreman Proxy it seems.

Comment 3 Jiri Stransky 2015-01-20 14:55:22 UTC
Created attachment 981862 [details]
dynflow console

Comment 4 Jiri Stransky 2015-01-20 14:55:47 UTC
Created attachment 981863 [details]
foreman proxy log

Comment 5 Jiri Stransky 2015-01-20 14:56:32 UTC
Created attachment 981865 [details]
dynflow_executor.output

Comment 6 Jiri Stransky 2015-01-20 14:57:22 UTC
Created attachment 981866 [details]
sosreport

Comment 7 Jiri Stransky 2015-01-20 15:47:08 UTC
Pull request to Foreman to log an exception message in case an error occurs during triggering the puppet run -- this should give us some more info in case the failure was actually caused by some exception in the `puppetrun!` method in Foreman.

https://github.com/theforeman/foreman/pull/2100

Comment 8 Jiri Stransky 2015-01-20 18:11:32 UTC
Pull request to Staypuft to add `puppetrun!` function result to the task output and fail the task if `puppetrun!` fails.

https://github.com/theforeman/staypuft/pull/408

Comment 10 Omri Hochman 2015-01-29 16:19:23 UTC
Unable to reproduce with : 
ruby193-rubygem-pg-0.18.1-2.el7ost.x86_64
ruby193-rubygem-dynflow-0.7.3-3.el7ost.noarch
rhel-osp-installer-0.5.5-2.el7ost.noarch

Comment 11 Scott Lewis 2015-02-10 15:14:13 UTC
This bug has been closed as a part of the RHEL-OSP 6 general availability release. For details, see https://rhn.redhat.com/errata/rhel7-rhos-6-errata.html


Note You need to log in before you can comment on or make changes to this bug.