1560478 – Embedded Ansible job_status .out files are not processed by logrotate

Bug 1560478 - Embedded Ansible job_status .out files are not processed by logrotate

Summary: Embedded Ansible job_status .out files are not processed by logrotate

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Appliance
Sub Component:
Version:	5.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	GA
Target Release:	5.10.0
Assignee:	Nick Carboni
QA Contact:	Dmitry Misharov
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1565140
TreeView+	depends on / blocked

Reported:	2018-03-26 09:23 UTC by Peter McGowan
Modified:	2019-02-11 14:06 UTC (History)
CC List:	7 users (show)
Fixed In Version:	5.10.0.0
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1565140 (view as bug list)
Environment:
Last Closed:	2019-02-11 14:06:46 UTC
Category:	---
Cloudforms Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
removed job service screenshot (57.43 KB, image/png) 2018-04-03 15:50 UTC, Nick Carboni	no flags	Details
View All

Description Peter McGowan 2018-03-26 09:23:17 UTC

Description of problem:
The embedded Ansible job output files /var/lib/awx/job_status/*.out are not handled by /etc/logrotate.d/miq_logs.conf. 

Version-Release number of selected component (if applicable):
5.9.0.22

How reproducible:
Every time

Steps to Reproduce:
1. Run a number of embedded Ansible jobs over the space of several days

Actual results:
Notice that the *.out files remain untouched in /var/lib/awx/job_status

Expected results:
The *.out files should be compressed/purged/rotated to stop the directory filling up

Additional info:

Comment 2 Joe Vlcek 2018-03-27 14:43:20 UTC

Peter,

Could do you have an appliance that I can login to that depicts this behavior?
If so could you send me the creds in a private comment/message?

Thank you!
JoeV

Comment 3 Joe Vlcek 2018-03-27 17:36:11 UTC

As discussed in our meeting with Nick Caroni, Logrotate may not be the best way to manage growth of the embedded Ansible job_status .out files.

We will investigate.

JoeV

Comment 4 Nick Carboni 2018-03-27 21:46:28 UTC

After looking into this issue, we've seen that ansible-tower provides a mechanism to remove jobs after a certain number of days. This can be done either through a management job which runs on a schedule [1] or manually using a command line tool [2].

The effects of doing this are unclear. If we remove the job from tower using either of these mechanisms we need to be sure that the links between tower job objects, CF inventory, and service instances are in a sane state.

With this information we have two more questions to answer before attempting a solution:

1) How do we configure the management job using the ansible tower API so that we can have a similar solution for ManageIQ and CloudForms appliances as well as our OpenShift-based application?

2) What is the intended behavior of services and CF inventory when the referenced job is removed from ansible tower?

[1] http://docs.ansible.com/ansible-tower/latest/html/administration/management_jobs.html#removing-old-job-history
[2] http://docs.ansible.com/ansible-tower/latest/html/administration/tower-manage.html#cleanup-of-old-data

Comment 5 Nick Carboni 2018-04-02 21:14:52 UTC

Drew, can you answer any of these questions?

Do you know if these "management jobs" are included in the API? And if not, what happens in the following cases:

- output file is removed
  - I think we will fail to fetch the stdout for the job
- ansible job run is removed
- CF service (which ran a job) is removed
  - Will this remove the tower job (allowing us to do our own retention system)?

Comment 6 Drew Bomhof 2018-04-02 21:38:24 UTC

Nick,

1. Do you know if these "management jobs" are included in the API?

I know you can access the management jobs via the system_jobs endpoint.

There is a result_stdout attribute attached to each record, though it seems that data may be pulled from another table, or the .out files you're talking about.

Here is an example from the main_systemjob table:

awx=# select * from main_systemjob where unifiedjob_ptr_id = 276;
 unifiedjob_ptr_id |   job_type   |   extra_vars    | system_job_template_id
-------------------+--------------+-----------------+------------------------
               276 | cleanup_jobs | {"days": "120"} |                      1

I don't have any knowledge on the .out files. Though, I think you're right about removing them causing the stdout to go with it.

Same with the job run being removed - then the data is gone.

If the CF service is removed we delete the job template ( both provision and retirement ) - but I do not think we remove jobs - if that is what you're asking?

Comment 7 Nick Carboni 2018-04-03 13:56:51 UTC

> I do not think we remove jobs - if that is what you're asking?

Generally, yes.

If we have nothing internally which will remove jobs, then it sounds like the best path forward will be to alter that "days" extra var in the cleanup_jobs system job.

If we can do that over the API, it should allow us to create a setting for embedded ansible which will allow the users to determine how long they want to keep jobs for.

As a part of this we will likely need a cleaner solution for displaying services when the job is no longer present on the tower side. I think we get a pretty ugly error message as it stands today.

Comment 8 Nick Carboni 2018-04-03 15:50:41 UTC

Created attachment 1416800 [details]
removed job service screenshot

For reference I've added a screenshot of the service provision details when the job is removed from the ansible tower side.

Comment 9 Nick Carboni 2018-04-03 18:01:13 UTC

By default ansible tower runs a system job weekly which removes all job results older than 120 days. Are you not seeing this run or is this too much data to keep by default?

Based on the usage you've seen what would be a good default value for us to configure at startup?

Comment 10 CFME Bot 2018-04-03 20:26:27 UTC

https://github.com/ansible/ansible_tower_client_ruby/pull/99

Comment 11 CFME Bot 2018-04-03 21:16:28 UTC

New commits detected on ansible/ansible_tower_client_ruby/master:

https://github.com/ManageIQ/ansible_tower_client/commit/7a60bc3ea6d15e2a9805c01ad3b81af4f03d5a76
commit 7a60bc3ea6d15e2a9805c01ad3b81af4f03d5a76
Author:     Nick Carboni <ncarboni>
AuthorDate: Tue Apr  3 15:22:57 2018 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Tue Apr  3 15:22:57 2018 -0400

    Add system_job_templates

    This will allow us to alter job templates for the system jobs which
    handle things like removing old jobs

    https://bugzilla.redhat.com/show_bug.cgi?id=1560478

 lib/ansible_tower_client.rb | 1 +
 lib/ansible_tower_client/api.rb | 8 +
 lib/ansible_tower_client/base_models/system_job_template.rb | 4 +
 3 files changed, 13 insertions(+)


https://github.com/ManageIQ/ansible_tower_client/commit/db88792d36172b2e74da4e56ee7dba7c467aa270
commit db88792d36172b2e74da4e56ee7dba7c467aa270
Author:     Nick Carboni <ncarboni>
AuthorDate: Tue Apr  3 15:43:03 2018 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Tue Apr  3 15:43:03 2018 -0400

    Add schedules

    Add the schedule base model and relate system job templates to the
    schedules.

    https://bugzilla.redhat.com/show_bug.cgi?id=1560478

 lib/ansible_tower_client.rb | 1 +
 lib/ansible_tower_client/api.rb | 8 +
 lib/ansible_tower_client/base_models/schedule.rb | 4 +
 lib/ansible_tower_client/base_models/system_job_template.rb | 3 +
 4 files changed, 16 insertions(+)

Comment 13 CFME Bot 2018-04-04 15:56:33 UTC

https://github.com/ansible/ansible_tower_client_ruby/pull/100

Comment 14 CFME Bot 2018-04-04 16:15:37 UTC

https://github.com/ManageIQ/manageiq/pull/17250

Comment 15 CFME Bot 2018-04-04 18:31:36 UTC

New commit detected on ansible/ansible_tower_client_ruby/master:

https://github.com/ManageIQ/ansible_tower_client/commit/f8c63b1dcc5a3833364e9e7a71700758092bec23
commit f8c63b1dcc5a3833364e9e7a71700758092bec23
Author:     Nick Carboni <ncarboni>
AuthorDate: Wed Apr  4 10:58:20 2018 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Wed Apr  4 10:58:20 2018 -0400

    Add SystemJobTemplate#launch and the SystemJob class

    This will allow us to trigger a system job template manually
    and return the created system job.

    https://bugzilla.redhat.com/show_bug.cgi?id=1560478

 lib/ansible_tower_client.rb | 1 +
 lib/ansible_tower_client/api.rb | 8 +
 lib/ansible_tower_client/base_models/system_job.rb | 4 +
 lib/ansible_tower_client/base_models/system_job_template.rb | 5 +
 spec/factories/responses.rb | 15 +-
 spec/support/mock_api/system_job_template.rb | 108 +
 spec/system_job_template_spec.rb | 18 +
 7 files changed, 152 insertions(+), 7 deletions(-)

Comment 16 CFME Bot 2018-04-05 18:31:19 UTC

New commit detected on ManageIQ/manageiq/master:

https://github.com/ManageIQ/manageiq/commit/9699ff2d328ad71ca41bb61a30af38f84823890b
commit 9699ff2d328ad71ca41bb61a30af38f84823890b
Author:     Nick Carboni <ncarboni>
AuthorDate: Wed Apr  4 12:01:37 2018 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Wed Apr  4 12:01:37 2018 -0400

    Add a setting for ansible job data retention

    This setting will control how many days of job data should be
    kept.

    When the setting is changed we call out to ansible tower to alter
    the setting for their weekly system job. If the user wants to run
    the job on-demand they can use the EmbeddedAnsible#run_job_data_retention
    method.

    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1560478

 app/models/embedded_ansible_worker/runner.rb | 1 +
 config/settings.yml | 1 +
 lib/embedded_ansible.rb | 14 +
 lib/vmdb/config/activator.rb | 7 +
 4 files changed, 23 insertions(+)


https://github.com/ManageIQ/manageiq/commit/7c3f31220585a628d79dab248a1c93a5710662ae
commit 7c3f31220585a628d79dab248a1c93a5710662ae
Author:     Nick Carboni <ncarboni>
AuthorDate: Wed Apr  4 17:22:46 2018 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Wed Apr  4 17:22:46 2018 -0400

    Set job data retention at runtime in EmbeddedAnsibleWorker::Runner#do_work

    We don't really want to do this in the Config::Activator because
    that will cause every worker to run through the same code as they
    sync their config.

    We keep a local cache (read: instance variable) of the last value
    we set in tower so that we're not constantly hitting the API to
    determine if the value changed.

    The EmbeddedAnsibleWorker can use ::Settings directly because it
    is threaded and the server process takes care of reloading it.

    https://bugzilla.redhat.com/show_bug.cgi?id=1560478

 app/models/embedded_ansible_worker/runner.rb | 9 +-
 lib/vmdb/config/activator.rb | 7 -
 spec/models/embedded_ansible_worker/runner_spec.rb | 13 +
 3 files changed, 21 insertions(+), 8 deletions(-)

Comment 18 Dmitry Misharov 2018-06-26 14:19:28 UTC

Verified in 5.10.0.1.20180619163011_900fdc4. Advanced settings contain job_data_retention_days option.

Note You need to log in before you can comment on or make changes to this bug.