Bug 1369532

Summary: Intermittent puppet sync timeout installing RHV
Product: Red Hat Quickstart Cloud Installer Reporter: James Olin Oden <joden>
Component: Installation - RHEVAssignee: Fabian von Feilitzsch <fabian>
Status: CLOSED ERRATA QA Contact: Dave Johnson <dajohnso>
Severity: medium Docs Contact: Dan Macpherson <dmacpher>
Priority: unspecified    
Version: 1.0CC: bthurber, tpapaioa, tsanders
Target Milestone: ---Keywords: Triaged
Target Release: 1.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-28 01:38:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description James Olin Oden 2016-08-23 16:43:58 UTC
Description of problem:

I was doing a RHV(self-hosted, 4 nodes) + CFME + OSE (1 controller+2 workers) deployment.   It failed in deploying RHV when it said:

   Puppet run for heuristic-algo4.b.b puppet run reported as out of sync for the last 10 polls - something may have gone wrong

However when I looked in /var/log/messages* on the host that failed, there
were no puppet errors or failures.

It should also be noted that I was using the custom naming scheme for the hosts.
This probably did not contribute to the problem but was something different 
I did.

Version-Release number of selected component (if applicable):
QCI-1.0-RHEL-7-20160819.t.0

How reproducible:
Don't know.

Steps to Reproduce:
1.   Do a RHV(self-hosted, 4 nodes) + CFME + OSE (1 controller+2 workers) deployment using the custom naming scheme.

Actual results:
It failed deploying a host with 

   Puppet run for heuristic-algo4.b.b puppet run reported as out of sync for the last 10 polls - something may have gone wrong

Expected results:
It to succeed.

Additional info:

Comment 1 James Olin Oden 2016-08-23 16:48:04 UTC
We tried to resume the task, but it later timed out with the following error:

ERF42-7017 [Foreman::Exception]: You've reached the timeout set for this action. If the action is still ongoing, you can click on the "Resume Deployment" button to continue.

Comment 2 Tasos Papaioannou 2016-09-09 13:25:50 UTC
I am seeing this repeatedly on my deployments of RHV self-hosted. In my environment, the puppet run on the hypervisor system often takes an hour or more to complete. Resuming the Deploy Red Hat Virtualization task in dynflow after the puppet run completes, then resuming the Deploy task, will complete the deployment.

I think that either the poll_intervals and attempts_before_next_interval values in server/app/lib/actions/fusor/host/wait_for_puppet.rb need to be changed, or the Out of sync interval and Puppet interval settings in Administer > Settings > Puppet should be tweaked, or there should at least be a more user-friendly way to resume the deployment than going into multiple dynflow tasks and resuming them in the right order.

Comment 4 John Matthews 2016-10-14 17:08:00 UTC
Fixed during 1.1 development work

Comment 5 James Olin Oden 2016-10-14 18:15:29 UTC
This bug is now invalid because puppet is no longer part of our deployment.   Instead we use ansible, so I'm just going to mark as verified without any testing.   If the ansible code shows a similar behavior I'll write a new bug.

Comment 8 errata-xmlrpc 2017-02-28 01:38:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:0335