Bug 1800870

Summary: "Ansible report timeout" setting is ignored as report origin is not properly set
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: Ansible - Configuration ManagementAssignee: Ondřej Pražák <oprazak>
Status: CLOSED ERRATA QA Contact: Lukas Pramuk <lpramuk>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.7.0CC: avroy, egolov, hakon.gislason, jjeffers, jpasqual, kkinge, mhulan, mkalyat, mshira, oezr, oprazak, ptrivedi, ramathas.s, sadas, zhunting
Target Milestone: 6.9.0Keywords: Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: tfm-rubygem-foreman_ansible-4.0.3.9-1,foreman-1.24.1.31-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1884033 (view as bug list) Environment:
Last Closed: 2021-04-21 13:12:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Moravec 2020-02-08 14:36:17 UTC
Description of problem:
"Ansible report timeout" settings should set the timeout of "Out of sync" host. But the setting does not seem to be honoured. Rather, _double_ value of "Out of sync interval" is used.


Version-Release number of selected component (if applicable):
Sat 6.6.1


How reproducible:
100%


Steps to Reproduce:
1. Assign an Ansible Role to a Host
2. Apply the Role(s) via REX to the Host and remember the time
3. Click to the Host details repeatedly
4. Meantime, modify "Ansible report timeout" (Ansible tab) and "Out of sync interval" (General tab) in Settings and check what settings makes the Host "Out of sync".


Actual results:
"Ansible report timeout" is ignored. Rather double value of "Out of sync interval" is applied.


Expected results:
"Ansible report timeout" to be applied/used.


Additional info:
I think there are two bugs in /usr/share/foreman/app/models/host_status/configuration_status.rb :

    def out_of_sync?
      if (host && !host.enabled?) || no_reports? || out_of_sync_disabled?
        false
      else
        !reported_at.nil? && reported_at < (Time.now.utc - expected_report_interval)
      end
    end

    def expected_report_interval
      (reported_origin_interval + default_report_interval).minutes
    end

    def reported_origin_interval
      if last_report.origin &&
         (interval = Setting[:"#{last_report.origin.downcase}_interval"])
        interval.to_i
      else
        default_report_interval
      end
    end

..

    def default_report_interval
      Setting[:outofsync_interval]
    end


Bug 1: why expected_origin_interval = reported_origin_interval + default_report_interval ? Why _adding_ default? That often means the default is counted *twice*

Bug 2: why default_report_interval loads outofsync_interval Setting, while "Ansible report timeout" should be loaded here? B'cos status.last_report.origin is nil while it should be 'ansible' (to load Setting['ansible_timeout']. Dunno where in code we forget to set it.

The Bug 2 also means that "out_of_sync_disabled?" is evaluated to false, since:

    def out_of_sync_disabled?
      if last_report.origin
        Setting[:"#{last_report.origin.downcase}_out_of_sync_disabled"]
      else
        false
      end
    end



Some supporting data from rake console from a reproducer with outofsync_interval set to 60mins:

irb(main):041:0> host = Host.find_by_name("pmoravec-rhel76.gsslab.brq2.redhat.com")
=> #<Host::Managed id: 5, name: "pmoravec-rhel76.gsslab.brq2.redhat.com", last_compile: "2020-02-08 12:17:14", last_report: "2020-02-08 12:18:00", updated_at: "2020-02-08 12:18:01", created_at: "2020-02-03 12:35:58", root_pass: nil, architecture_id: 1, operatingsystem_id: 1, environment_id: nil, ptable_id: nil, medium_id: nil, build: false, comment: "", disk: nil, installed_at: nil, model_id: 1, hostgroup_id: nil, owner_id: 4, owner_type: "User", enabled: true, puppet_ca_proxy_id: nil, managed: false, use_image: nil, image_file: nil, uuid: nil, compute_resource_id: nil, puppet_proxy_id: nil, certname: nil, image_id: nil, organization_id: 1, location_id: 2, type: "Host::Managed", otp: nil, realm_id: nil, compute_profile_id: nil, provision_method: nil, grub_pass: "", discovery_rule_id: nil, global_status: 2, lookup_value_matcher: "fqdn=pmoravec-rhel76.gsslab.brq2.redhat.com", pxe_loader: nil, initiated_at: nil, build_errors: nil, openscap_proxy_id: nil>
irb(main):042:0> status = HostStatus::Status.where(:host_id => host.id, :type => 'HostStatus::ConfigurationStatus').count
=> 1
irb(main):043:0> status = HostStatus::Status.where(:host_id => host.id, :type => 'HostStatus::ConfigurationStatus').first
=> #<HostStatus::ConfigurationStatus id: 67, type: "HostStatus::ConfigurationStatus", status: 352321536, host_id: 5, reported_at: "2020-02-08 12:18:00">
irb(main):044:0> status.last_report.origin
=> nil
irb(main):045:0> status.expected_report_interval
=> 120 minutes
irb(main):046:0> Setting[:outofsync_interval]
=> 60
irb(main):047:0>

Comment 3 Pavel Moravec 2020-02-08 14:56:40 UTC
FYI there is similar upstream bug https://projects.theforeman.org/issues/25463 that rather describes opposite problem..? (imho close #25463 or recycle it for this BZ)

Comment 4 Ondřej Ezr 2020-07-13 10:10:17 UTC
This is higly likely being the same issue as BZ#1825761 as if the report status is not reported, the interval is not taken from its source.

The adding of values is probably meant as valid_for_minutes + out_of_sync so it doesn't go out_of_sync right after the report is invalid. It should not fallback to the default then probably :)

Comment 5 Ondřej Ezr 2020-07-13 11:29:41 UTC
*** Bug 1825761 has been marked as a duplicate of this bug. ***

Comment 6 Bryan Kearney 2020-07-13 12:01:16 UTC
Upstream bug assigned to oprazak

Comment 7 Bryan Kearney 2020-07-13 12:01:20 UTC
Upstream bug assigned to oprazak

Comment 8 Bryan Kearney 2020-08-03 12:01:14 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/22804 has been resolved.

Comment 9 Ondřej Ezr 2020-08-03 19:45:00 UTC
This needs to pull in new version of the collection on the Capsule (eighter by updating ansible to upcoming release > 2.10.1) or manually.
The PR in question, that needs to get in https://github.com/theforeman/foreman-ansible-modules/pull/870#issuecomment-667656768

Comment 14 Lukas Pramuk 2020-10-22 11:41:20 UTC
FailedQA.

@Satellite 6.7.5 Snap1
tfm-rubygem-foreman_ansible-4.0.3.9-1.el7sat.noarch
foreman-1.24.1.31-1.el7sat.noarch

used reproducer described in comment#0:

>>> Host never gets out of sync now

>>> All ansible reports (from Play Roles, Ansible Command) have origin N/A

Comment 15 Ondřej Pražák 2020-10-22 13:27:09 UTC
Needs an update of Ansible. Alternatively, newer version of callback can be pulled in manually as mentioned in comment#9

Comment 16 Marek Hulan 2020-10-22 13:31:14 UTC
This must be tested with the callback from the new foreman-ansible-modules, this is not part of Satellite 6.7 as stated in comment 9 and 10. We can't backport the callback to older ansible 2.8 version.

In order to use the newer callback, do this

cd /usr/lib/python2.7/site-packages/ansible/plugins/callback/
mv foreman.py foreman.py.orig
wget https://raw.githubusercontent.com/theforeman/foreman-ansible-modules/develop/plugins/callback/foreman.py

and retest.

The origin should be set afterwards, therefore the status will be updated based on these Ansible reports.

This has been added to foreman-ansible-modules 1.1.0. 1.3.0 is published at automation hub if you want to get it from official source. I'm not sure, what exact version is built to newer ansible version. Moving back to ON_QA.

Comment 18 Evgeni Golov 2020-10-23 06:46:21 UTC
(In reply to Marek Hulan from comment #16)
> This must be tested with the callback from the new foreman-ansible-modules,
> this is not part of Satellite 6.7 as stated in comment 9 and 10. We can't
> backport the callback to older ansible 2.8 version.
> 
> In order to use the newer callback, do this
> 
> cd /usr/lib/python2.7/site-packages/ansible/plugins/callback/
> mv foreman.py foreman.py.orig
> wget
> https://raw.githubusercontent.com/theforeman/foreman-ansible-modules/develop/
> plugins/callback/foreman.py
> 
> and retest.

FWIW, this won't work as you can't use a collection callback on 2.8.
You'd have to "uncollection" it. (change the two theforeman.foreman.foreman names to "foreman").

Comment 22 Lukas Pramuk 2021-03-10 23:39:55 UTC
VERIFIED.

@Satellite 6.9.0 Snap16
foreman-2.3.1.14-1.el7sat.noarch
tfm-rubygem-foreman_ansible-6.1.1-1.el7sat.noarch
ansible-collection-redhat-satellite-2.0.1-1.el7sat.noarch
ansible-2.9.16-1.el7ae.noarch

by the manual reproducer based on description in https://bugzilla.redhat.com/show_bug.cgi?id=1884033#c0 and https://bugzilla.redhat.com/show_bug.cgi?id=1884033#c6:

1) Update foreman callback by installing ansible-collection-redhat-satellite and changing callback name from "foreman" to "redhat.satellite.foreman"

# satellite-maintain packages install ansible-collection-redhat-satellite
# sed -i 's/^callback_whitelist\s*=.*/callback_whitelist = redhat.satellite.foreman/' /etc/foreman-proxy/ansible.cfg

(expect this setting being overrun by an installer run)

>>> foreman callback is shipped by ansible so Satellite has no control over when it is updated
 
2) Change the Settings > General > Out of sync interval = 5 min , Ansible > Ansible report timeout = 10 mins
3) Assign an Ansible Role to a Host
4) Apply the Role(s) via REX to the Host

5) Refresh the Host details repeatedly to watch "Configuration" status turning from "Active" to "Out of sync"

>>> "Out of sync" status worn after "Out of sync interval" + "Ansible report timeout" time = 15 mins

6) Yet check the origin of the generated report by clicking [Reports] button at the host details page

>>> origin is set to "ansible" (A) icon, instead of "N/A"

>>> was hesitant to verify this BZ as the fix requires callback update which at the moment are manual steps since the foreman callback is currently shipped by ansible and not Satellite but I expect in near future "foreman" callback being replaced by "redhat.satellite.foreman" callback

Comment 25 errata-xmlrpc 2021-04-21 13:12:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.9 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1313