Description of problem: The embedded Ansible job output files /var/lib/awx/job_status/*.out are not handled by /etc/logrotate.d/miq_logs.conf. Version-Release number of selected component (if applicable): 5.9.0.22 How reproducible: Every time Steps to Reproduce: 1. Run a number of embedded Ansible jobs over the space of several days Actual results: Notice that the *.out files remain untouched in /var/lib/awx/job_status Expected results: The *.out files should be compressed/purged/rotated to stop the directory filling up Additional info:
Peter, Could do you have an appliance that I can login to that depicts this behavior? If so could you send me the creds in a private comment/message? Thank you! JoeV
As discussed in our meeting with Nick Caroni, Logrotate may not be the best way to manage growth of the embedded Ansible job_status .out files. We will investigate. JoeV
After looking into this issue, we've seen that ansible-tower provides a mechanism to remove jobs after a certain number of days. This can be done either through a management job which runs on a schedule [1] or manually using a command line tool [2]. The effects of doing this are unclear. If we remove the job from tower using either of these mechanisms we need to be sure that the links between tower job objects, CF inventory, and service instances are in a sane state. With this information we have two more questions to answer before attempting a solution: 1) How do we configure the management job using the ansible tower API so that we can have a similar solution for ManageIQ and CloudForms appliances as well as our OpenShift-based application? 2) What is the intended behavior of services and CF inventory when the referenced job is removed from ansible tower? [1] http://docs.ansible.com/ansible-tower/latest/html/administration/management_jobs.html#removing-old-job-history [2] http://docs.ansible.com/ansible-tower/latest/html/administration/tower-manage.html#cleanup-of-old-data
Drew, can you answer any of these questions? Do you know if these "management jobs" are included in the API? And if not, what happens in the following cases: - output file is removed - I think we will fail to fetch the stdout for the job - ansible job run is removed - CF service (which ran a job) is removed - Will this remove the tower job (allowing us to do our own retention system)?
Nick, 1. Do you know if these "management jobs" are included in the API? I know you can access the management jobs via the system_jobs endpoint. There is a result_stdout attribute attached to each record, though it seems that data may be pulled from another table, or the .out files you're talking about. Here is an example from the main_systemjob table: awx=# select * from main_systemjob where unifiedjob_ptr_id = 276; unifiedjob_ptr_id | job_type | extra_vars | system_job_template_id -------------------+--------------+-----------------+------------------------ 276 | cleanup_jobs | {"days": "120"} | 1 I don't have any knowledge on the .out files. Though, I think you're right about removing them causing the stdout to go with it. Same with the job run being removed - then the data is gone. If the CF service is removed we delete the job template ( both provision and retirement ) - but I do not think we remove jobs - if that is what you're asking?
> I do not think we remove jobs - if that is what you're asking? Generally, yes. If we have nothing internally which will remove jobs, then it sounds like the best path forward will be to alter that "days" extra var in the cleanup_jobs system job. If we can do that over the API, it should allow us to create a setting for embedded ansible which will allow the users to determine how long they want to keep jobs for. As a part of this we will likely need a cleaner solution for displaying services when the job is no longer present on the tower side. I think we get a pretty ugly error message as it stands today.
Created attachment 1416800 [details] removed job service screenshot For reference I've added a screenshot of the service provision details when the job is removed from the ansible tower side.
By default ansible tower runs a system job weekly which removes all job results older than 120 days. Are you not seeing this run or is this too much data to keep by default? Based on the usage you've seen what would be a good default value for us to configure at startup?
https://github.com/ansible/ansible_tower_client_ruby/pull/99
New commits detected on ansible/ansible_tower_client_ruby/master: https://github.com/ManageIQ/ansible_tower_client/commit/7a60bc3ea6d15e2a9805c01ad3b81af4f03d5a76 commit 7a60bc3ea6d15e2a9805c01ad3b81af4f03d5a76 Author: Nick Carboni <ncarboni> AuthorDate: Tue Apr 3 15:22:57 2018 -0400 Commit: Nick Carboni <ncarboni> CommitDate: Tue Apr 3 15:22:57 2018 -0400 Add system_job_templates This will allow us to alter job templates for the system jobs which handle things like removing old jobs https://bugzilla.redhat.com/show_bug.cgi?id=1560478 lib/ansible_tower_client.rb | 1 + lib/ansible_tower_client/api.rb | 8 + lib/ansible_tower_client/base_models/system_job_template.rb | 4 + 3 files changed, 13 insertions(+) https://github.com/ManageIQ/ansible_tower_client/commit/db88792d36172b2e74da4e56ee7dba7c467aa270 commit db88792d36172b2e74da4e56ee7dba7c467aa270 Author: Nick Carboni <ncarboni> AuthorDate: Tue Apr 3 15:43:03 2018 -0400 Commit: Nick Carboni <ncarboni> CommitDate: Tue Apr 3 15:43:03 2018 -0400 Add schedules Add the schedule base model and relate system job templates to the schedules. https://bugzilla.redhat.com/show_bug.cgi?id=1560478 lib/ansible_tower_client.rb | 1 + lib/ansible_tower_client/api.rb | 8 + lib/ansible_tower_client/base_models/schedule.rb | 4 + lib/ansible_tower_client/base_models/system_job_template.rb | 3 + 4 files changed, 16 insertions(+)
https://github.com/ansible/ansible_tower_client_ruby/pull/100
https://github.com/ManageIQ/manageiq/pull/17250
New commit detected on ansible/ansible_tower_client_ruby/master: https://github.com/ManageIQ/ansible_tower_client/commit/f8c63b1dcc5a3833364e9e7a71700758092bec23 commit f8c63b1dcc5a3833364e9e7a71700758092bec23 Author: Nick Carboni <ncarboni> AuthorDate: Wed Apr 4 10:58:20 2018 -0400 Commit: Nick Carboni <ncarboni> CommitDate: Wed Apr 4 10:58:20 2018 -0400 Add SystemJobTemplate#launch and the SystemJob class This will allow us to trigger a system job template manually and return the created system job. https://bugzilla.redhat.com/show_bug.cgi?id=1560478 lib/ansible_tower_client.rb | 1 + lib/ansible_tower_client/api.rb | 8 + lib/ansible_tower_client/base_models/system_job.rb | 4 + lib/ansible_tower_client/base_models/system_job_template.rb | 5 + spec/factories/responses.rb | 15 +- spec/support/mock_api/system_job_template.rb | 108 + spec/system_job_template_spec.rb | 18 + 7 files changed, 152 insertions(+), 7 deletions(-)
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/9699ff2d328ad71ca41bb61a30af38f84823890b commit 9699ff2d328ad71ca41bb61a30af38f84823890b Author: Nick Carboni <ncarboni> AuthorDate: Wed Apr 4 12:01:37 2018 -0400 Commit: Nick Carboni <ncarboni> CommitDate: Wed Apr 4 12:01:37 2018 -0400 Add a setting for ansible job data retention This setting will control how many days of job data should be kept. When the setting is changed we call out to ansible tower to alter the setting for their weekly system job. If the user wants to run the job on-demand they can use the EmbeddedAnsible#run_job_data_retention method. Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1560478 app/models/embedded_ansible_worker/runner.rb | 1 + config/settings.yml | 1 + lib/embedded_ansible.rb | 14 + lib/vmdb/config/activator.rb | 7 + 4 files changed, 23 insertions(+) https://github.com/ManageIQ/manageiq/commit/7c3f31220585a628d79dab248a1c93a5710662ae commit 7c3f31220585a628d79dab248a1c93a5710662ae Author: Nick Carboni <ncarboni> AuthorDate: Wed Apr 4 17:22:46 2018 -0400 Commit: Nick Carboni <ncarboni> CommitDate: Wed Apr 4 17:22:46 2018 -0400 Set job data retention at runtime in EmbeddedAnsibleWorker::Runner#do_work We don't really want to do this in the Config::Activator because that will cause every worker to run through the same code as they sync their config. We keep a local cache (read: instance variable) of the last value we set in tower so that we're not constantly hitting the API to determine if the value changed. The EmbeddedAnsibleWorker can use ::Settings directly because it is threaded and the server process takes care of reloading it. https://bugzilla.redhat.com/show_bug.cgi?id=1560478 app/models/embedded_ansible_worker/runner.rb | 9 +- lib/vmdb/config/activator.rb | 7 - spec/models/embedded_ansible_worker/runner_spec.rb | 13 + 3 files changed, 21 insertions(+), 8 deletions(-)
Verified in 5.10.0.1.20180619163011_900fdc4. Advanced settings contain job_data_retention_days option.