Bug 1242396 - No logs available for stuck puppet runs
Summary: No logs available for stuck puppet runs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ga
: 8.0 (Liberty)
Assignee: Jiri Stransky
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On: 1243884
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-13 09:28 UTC by Jiri Stransky
Modified: 2016-04-07 21:38 UTC (History)
12 users (show)

Fixed In Version: openstack-tripleo-heat-templates-0.8.7-5.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, the os-collect-config utility only printed Puppet logs after Puppet had finished running. As a consequence, Puppet logs were not available for Puppet runs that were in progress. With this update, logs for Puppet runs are available even when a Puppet run is in progress. They can be found in the /var/run/heat-config/deployed/ directory.
Clone Of:
: 1243884 (view as bug list)
Environment:
Last Closed: 2016-04-07 21:38:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 202465 0 None MERGED Allow enabling debug mode for config management (Puppet) 2020-10-07 13:09:06 UTC
Red Hat Product Errata RHEA-2016:0604 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 8 director Enhancement Advisory 2016-04-08 01:03:56 UTC

Description Jiri Stransky 2015-07-13 09:28:04 UTC
Description of problem:

When a puppet run gets stuck, we have no way of knowing what happened. It's caused by two things:

1. Puppet run output is printed into os-collect-config logs only when puppet has finished. If it doesn't finish, no output exists that we could use to guess why it didn't finish.

2. Even if we fix point 1, puppet itself only prints out steps that it finished performing by default, so a stuck step won't get printed at all anyway. This can be changed by running puppet in debug mode, so that it prints every command it's attempting to run on the system.

Comment 3 Jiri Stransky 2015-07-13 09:29:55 UTC
Martin Mágr has an initial patch for it, didn't pass upstream CI yet: https://review.openstack.org/#/c/188737/

Comment 4 Hugh Brock 2015-07-13 09:34:41 UTC
Requested blocker because this is going to be a major usability problem when we're trying to debug deployment failures in the field.

Comment 5 Jiri Stransky 2015-07-13 12:22:41 UTC
The CI jobs are failing upstream because the CI is broken. Trying to fix the CI by removing py26 jobs (which have been deprecated for some time). Pending feedback from Heat / OS infra folks. https://review.openstack.org/#/c/201105/

Comment 6 Mike Burns 2015-07-15 16:02:00 UTC
removing infra patch from tracking.  We don't need that downstream

Comment 7 Jiri Stransky 2015-07-16 12:09:08 UTC
I posted another patch which is needed to allow enabling puppet debug mode. It will only do it's job together with mmagr's patch.

Comment 8 Mike Burns 2015-07-17 17:41:26 UTC
@jistr -- do we get logs from puppet if it completes?  Is this only an issue when puppet gets stuck or crashes or gets into a loop?

Comment 9 Jiri Stransky 2015-07-20 08:07:53 UTC
Yes we get logs from puppet when it completes, the output is in os-collect-config log. We also get logs when puppet fails on some step and skips the rest. We'd probably also get logs from a crashed puppet run, although that hasn't happened for me.

This is only an issue when a puppet run gets stuck (e.g. in a retry loop inside puppet, or a shell command executed via puppet gets stuck) -- then we don't know which action it was trying to perform.

Comment 10 Marios Andreou 2015-07-20 14:47:07 UTC
@jistr hey man I tried to verify today I applied the 2 heat-templates & tht changes - it failed in Compute/Controller post deployment. Failure might be because of up/downstream issues (e.g. cloned and cherry picked to upstream tht and used that for roles) but 2 questions when i revisit:

1. can you think of easy repro to validate (e.g. i have to induce a puppet fail right? then check what, will it just be obvious in journalctl?)

2. after applying the heat-templates change shouldn't need to rebuild overcloud-full or anything right?


thanks!

Comment 11 Jiri Stransky 2015-07-21 15:18:15 UTC
Not sure if you tested the latest heat-templates patch as it was uploaded by mmagr just about 1 hour before you posted this comment. It will need to be changed anyway though because it didn't pass some of the CI jobs yet, so maybe now is not the right time to test yet.

1. You should see puppet run log files in /var/run/heat-config/deployed. When you enable debugging (via ConfigDebug variable, or the environment file included in the t-h-t patch), then the logs will be more verbose.

If you want to test the "getting stuck" part, you can add an exec [1] to one of the puppet manifests in t-h-t, which would loop and fail E.g. just "false" as a command with e.g. 10 sec try_sleep and 360 tries. If you have debugging enabled, you should see in the log how puppet repeatedly tries to execute the "false" command.

2. You need to rebuild the image or replace the affected files in the image manually. (for t-h-t it's not necessary, for h-t it is, because we use image elements from there)

[1] https://docs.puppetlabs.com/references/4.2.latest/type.html#exec

Comment 12 Mike Burns 2015-08-31 19:07:16 UTC
dropping the heat templates patch from this bug.  It will be included in bug 1243884

Comment 14 Jaromir Coufal 2015-11-30 12:06:42 UTC
Should be already part of OSP8 since it was merged in early October, correct?

Comment 17 Alexander Chuzhoy 2016-03-03 18:40:51 UTC
Verified:
Environment: openstack-tripleo-heat-templates-0.8.8-2.el7ost.noarch


During a deployment, I see that the puppet logs are appending under /var/run/heat-config/deployed.

Comment 19 errata-xmlrpc 2016-04-07 21:38:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0604.html


Note You need to log in before you can comment on or make changes to this bug.