Bug 1369532 - Intermittent puppet sync timeout installing RHV
Summary: Intermittent puppet sync timeout installing RHV
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Quickstart Cloud Installer
Classification: Red Hat
Component: Installation - RHEV
Version: 1.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 1.1
Assignee: Fabian von Feilitzsch
QA Contact: Dave Johnson
Dan Macpherson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-23 16:43 UTC by James Olin Oden
Modified: 2017-02-28 01:38 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-28 01:38:40 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:0335 0 normal SHIPPED_LIVE Red Hat Quickstart Installer 1.1 2017-02-28 06:36:13 UTC

Description James Olin Oden 2016-08-23 16:43:58 UTC
Description of problem:

I was doing a RHV(self-hosted, 4 nodes) + CFME + OSE (1 controller+2 workers) deployment.   It failed in deploying RHV when it said:

   Puppet run for heuristic-algo4.b.b puppet run reported as out of sync for the last 10 polls - something may have gone wrong

However when I looked in /var/log/messages* on the host that failed, there
were no puppet errors or failures.

It should also be noted that I was using the custom naming scheme for the hosts.
This probably did not contribute to the problem but was something different 
I did.

Version-Release number of selected component (if applicable):
QCI-1.0-RHEL-7-20160819.t.0

How reproducible:
Don't know.

Steps to Reproduce:
1.   Do a RHV(self-hosted, 4 nodes) + CFME + OSE (1 controller+2 workers) deployment using the custom naming scheme.

Actual results:
It failed deploying a host with 

   Puppet run for heuristic-algo4.b.b puppet run reported as out of sync for the last 10 polls - something may have gone wrong

Expected results:
It to succeed.

Additional info:

Comment 1 James Olin Oden 2016-08-23 16:48:04 UTC
We tried to resume the task, but it later timed out with the following error:

ERF42-7017 [Foreman::Exception]: You've reached the timeout set for this action. If the action is still ongoing, you can click on the "Resume Deployment" button to continue.

Comment 2 Tasos Papaioannou 2016-09-09 13:25:50 UTC
I am seeing this repeatedly on my deployments of RHV self-hosted. In my environment, the puppet run on the hypervisor system often takes an hour or more to complete. Resuming the Deploy Red Hat Virtualization task in dynflow after the puppet run completes, then resuming the Deploy task, will complete the deployment.

I think that either the poll_intervals and attempts_before_next_interval values in server/app/lib/actions/fusor/host/wait_for_puppet.rb need to be changed, or the Out of sync interval and Puppet interval settings in Administer > Settings > Puppet should be tweaked, or there should at least be a more user-friendly way to resume the deployment than going into multiple dynflow tasks and resuming them in the right order.

Comment 4 John Matthews 2016-10-14 17:08:00 UTC
Fixed during 1.1 development work

Comment 5 James Olin Oden 2016-10-14 18:15:29 UTC
This bug is now invalid because puppet is no longer part of our deployment.   Instead we use ansible, so I'm just going to mark as verified without any testing.   If the ansible code shows a similar behavior I'll write a new bug.

Comment 8 errata-xmlrpc 2017-02-28 01:38:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:0335


Note You need to log in before you can comment on or make changes to this bug.