Created attachment 981615 [details] production.log rubygem-staypuft: puppet-agent is not being triggered on 1 out of 3 controllers in HA deployment (Pacemaker unable to connect to pcsd on target host) * It seems like it might be the same symptoms that we already saw on: Bug 1173634. Environment: ------------- rhel-osp-installer-0.5.5-2.el7ost.noarch foreman-1.6.0.49-4.el7ost.noarch foreman-installer-1.6.0-0.2.RC1.el7ost.noarch openstack-puppet-modules-2014.2.8-1.el7ost.noarch puppet-3.6.2-2.el7.noarch puppet-server-3.6.2-2.el7.noarch Description: ------------- HA-Neutron-GRE deployment got hang forever - looking into the controllers machines, It seems that the puppet-agent wan't ever triggered on 1-out-of-3 controllers - due to this problem 'pcs was not installed' and pcmk from other controller coudldn't connect to it: /var/log/messages : ------------------- (one of the 2 controllers where puppet-agent triggered on) : ------------------------------------------------------------- Jan 19 14:40:15 maca25400702875 puppet-agent[10611]: (/Stage[main]/Pacemaker::Corosync/Exec[auth-successful-across-all-nodes]/returns) Error: unable to connec t to pcsd on pcmk-maca25400702876 Jan 19 14:40:15 maca25400702875 puppet-agent[10611]: (/Stage[main]/Pacemaker::Corosync/Exec[auth-successful-across-all-nodes]/returns) Unable to connect to pc mk-maca25400702876 ([Errno 111] Connection refused) Jan 19 14:40:15 maca25400702875 puppet-agent[10611]: /usr/sbin/pcs cluster auth pcmk-maca25400702876 pcmk-maca25400702877 pcmk-maca25400702875 -u hacluster -p CHANGEME --force returned 1 instead of one of [0] No Errors in production.log (file attached) -------------------------------------------- dynflow view : -------------- 52: Actions::Staypuft::Host::PuppetRun (success) [ 0.14s / 0.14s ] 54: Actions::Staypuft::Host::ReportWait (success) [ 4668.06s / 10.29s ] 57: Actions::Staypuft::Host::PuppetRun (success) [ 0.02s / 0.02s ] 59: Actions::Staypuft::Host::ReportWait (suspended) [ 8152.41s / 18.29s ] 62: Actions::Staypuft::Host::PuppetRun (pending) ------------------------------------------------------------ 59: Actions::Staypuft::Host::ReportWait (suspended) [ 8152.41s / 18.29s ] Started at: 2015-01-19 18:29:51 UTC Ended at: 2015-01-19 20:45:44 UTC Real time: 8152.41s Execution time (excluding suspended state): 18.29s Input: --- host_id: 2 after: '2015-01-19T13:29:51-05:00' current_user_id: 3 Output: --- status: false poll_attempts: total: 1625 failed: 0 ------------------------------------------------------- 62: Actions::Staypuft::Host::PuppetRun (pending) Started at: Ended at: Real time: 0.00s Execution time (excluding suspended state): 0.00s Input: --- host_id: 4 name: maca25400702875.example.com current_user_id: 3 Output:
Dynflow console says that the PuppetRun action was executed for all 3 hosts, but only 2 request reached Foreman Proxy it seems.
Created attachment 981862 [details] dynflow console
Created attachment 981863 [details] foreman proxy log
Created attachment 981865 [details] dynflow_executor.output
Created attachment 981866 [details] sosreport
Pull request to Foreman to log an exception message in case an error occurs during triggering the puppet run -- this should give us some more info in case the failure was actually caused by some exception in the `puppetrun!` method in Foreman. https://github.com/theforeman/foreman/pull/2100
Pull request to Staypuft to add `puppetrun!` function result to the task output and fail the task if `puppetrun!` fails. https://github.com/theforeman/staypuft/pull/408
Unable to reproduce with : ruby193-rubygem-pg-0.18.1-2.el7ost.x86_64 ruby193-rubygem-dynflow-0.7.3-3.el7ost.noarch rhel-osp-installer-0.5.5-2.el7ost.noarch
This bug has been closed as a part of the RHEL-OSP 6 general availability release. For details, see https://rhn.redhat.com/errata/rhel7-rhos-6-errata.html