Bug 1242396

Summary: No logs available for stuck puppet runs
Product: Red Hat OpenStack Reporter: Jiri Stransky <jstransk>
Component: openstack-tripleo-heat-templatesAssignee: Jiri Stransky <jstransk>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.0 (Kilo)CC: achernet, hbrock, jcoufal, jstransk, mandreou, mburns, mmagr, mtanino, nbarcet, ohochman, rhel-osp-director-maint, tsekiyam
Target Milestone: gaKeywords: ZStream
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-0.8.7-5.el7ost Doc Type: Bug Fix
Doc Text:
Previously, the os-collect-config utility only printed Puppet logs after Puppet had finished running. As a consequence, Puppet logs were not available for Puppet runs that were in progress. With this update, logs for Puppet runs are available even when a Puppet run is in progress. They can be found in the /var/run/heat-config/deployed/ directory.
Story Points: ---
Clone Of:
: 1243884 (view as bug list) Environment:
Last Closed: 2016-04-07 21:38:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1243884    
Bug Blocks:    

Description Jiri Stransky 2015-07-13 09:28:04 UTC
Description of problem:

When a puppet run gets stuck, we have no way of knowing what happened. It's caused by two things:

1. Puppet run output is printed into os-collect-config logs only when puppet has finished. If it doesn't finish, no output exists that we could use to guess why it didn't finish.

2. Even if we fix point 1, puppet itself only prints out steps that it finished performing by default, so a stuck step won't get printed at all anyway. This can be changed by running puppet in debug mode, so that it prints every command it's attempting to run on the system.

Comment 3 Jiri Stransky 2015-07-13 09:29:55 UTC
Martin Mágr has an initial patch for it, didn't pass upstream CI yet: https://review.openstack.org/#/c/188737/

Comment 4 Hugh Brock 2015-07-13 09:34:41 UTC
Requested blocker because this is going to be a major usability problem when we're trying to debug deployment failures in the field.

Comment 5 Jiri Stransky 2015-07-13 12:22:41 UTC
The CI jobs are failing upstream because the CI is broken. Trying to fix the CI by removing py26 jobs (which have been deprecated for some time). Pending feedback from Heat / OS infra folks. https://review.openstack.org/#/c/201105/

Comment 6 Mike Burns 2015-07-15 16:02:00 UTC
removing infra patch from tracking.  We don't need that downstream

Comment 7 Jiri Stransky 2015-07-16 12:09:08 UTC
I posted another patch which is needed to allow enabling puppet debug mode. It will only do it's job together with mmagr's patch.

Comment 8 Mike Burns 2015-07-17 17:41:26 UTC
@jistr -- do we get logs from puppet if it completes?  Is this only an issue when puppet gets stuck or crashes or gets into a loop?

Comment 9 Jiri Stransky 2015-07-20 08:07:53 UTC
Yes we get logs from puppet when it completes, the output is in os-collect-config log. We also get logs when puppet fails on some step and skips the rest. We'd probably also get logs from a crashed puppet run, although that hasn't happened for me.

This is only an issue when a puppet run gets stuck (e.g. in a retry loop inside puppet, or a shell command executed via puppet gets stuck) -- then we don't know which action it was trying to perform.

Comment 10 Marios Andreou 2015-07-20 14:47:07 UTC
@jistr hey man I tried to verify today I applied the 2 heat-templates & tht changes - it failed in Compute/Controller post deployment. Failure might be because of up/downstream issues (e.g. cloned and cherry picked to upstream tht and used that for roles) but 2 questions when i revisit:

1. can you think of easy repro to validate (e.g. i have to induce a puppet fail right? then check what, will it just be obvious in journalctl?)

2. after applying the heat-templates change shouldn't need to rebuild overcloud-full or anything right?


thanks!

Comment 11 Jiri Stransky 2015-07-21 15:18:15 UTC
Not sure if you tested the latest heat-templates patch as it was uploaded by mmagr just about 1 hour before you posted this comment. It will need to be changed anyway though because it didn't pass some of the CI jobs yet, so maybe now is not the right time to test yet.

1. You should see puppet run log files in /var/run/heat-config/deployed. When you enable debugging (via ConfigDebug variable, or the environment file included in the t-h-t patch), then the logs will be more verbose.

If you want to test the "getting stuck" part, you can add an exec [1] to one of the puppet manifests in t-h-t, which would loop and fail E.g. just "false" as a command with e.g. 10 sec try_sleep and 360 tries. If you have debugging enabled, you should see in the log how puppet repeatedly tries to execute the "false" command.

2. You need to rebuild the image or replace the affected files in the image manually. (for t-h-t it's not necessary, for h-t it is, because we use image elements from there)

[1] https://docs.puppetlabs.com/references/4.2.latest/type.html#exec

Comment 12 Mike Burns 2015-08-31 19:07:16 UTC
dropping the heat templates patch from this bug.  It will be included in bug 1243884

Comment 14 Jaromir Coufal 2015-11-30 12:06:42 UTC
Should be already part of OSP8 since it was merged in early October, correct?

Comment 17 Alexander Chuzhoy 2016-03-03 18:40:51 UTC
Verified:
Environment: openstack-tripleo-heat-templates-0.8.8-2.el7ost.noarch


During a deployment, I see that the puppet logs are appending under /var/run/heat-config/deployed.

Comment 19 errata-xmlrpc 2016-04-07 21:38:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0604.html