1647189 – Ansible remediations generated using openscap exit after any failure.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1647189 - Ansible remediations generated using openscap exit after any failure.

Summary: Ansible remediations generated using openscap exit after any failure.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	scap-security-guide
Sub Component:
Version:	7.6
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Watson Yuuma Sato
QA Contact:	Matus Marhefka
Docs Contact:	Lenka Špačková
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-11-06 20:20 UTC by Ryan Mullett
Modified:	2023-09-07 19:30 UTC (History)
CC List:	10 users (show)
Fixed In Version:	scap-security-guide-0.1.43-7.el7
Doc Type:	Bug Fix
Doc Text:	.Ansible playbooks from the SCAP Security Guide no longer fail due to common errors Ansible tasks included in the SCAP Security Guide content were previously unable to handle certain common cases, such as missing configuration files, non-existent files, or uninstalled packages. As a consequence, when using an Ansible playbook from the SCAP Security Guide or generated by the `oscap` command, the `ansible-playbook` command terminated with every error. With this update, the Ansible tasks have been updated to handle common cases, and Ansible playbooks from the SCAP Security Guide can be successfully executed even if common errors are encountered during the playbook execution.
Clone Of:
Environment:
Last Closed:	2019-08-06 13:04:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:2198	0	None	None	None	2019-08-06 13:04:20 UTC

Description Ryan Mullett 2018-11-06 20:20:06 UTC

Description of problem:

When generating an Ansible remediation using oscap the Ansible playbook halts as soon as there is any error (missing file, etc.). Usually when using openscap we expect that the remediation will not be 100% and there may be some manual intervention required. In this instance the playbook does not continue on any error, causing more time than if you automated the remediation using any of the other remediation options other than Ansible. The other remediation options will let you know in summary what was not resolved, while the Ansible remediations simply stop running upon any failures.

Version-Release number of selected component (if applicable):

-running the newest version of Ansible and scap-security-guide/openscap/openscap-scanner available.

-ansible-2.7.1-1.el7ae.noarch
-openscap-scanner-1.2.17-2.el7.x86_64
-openscap-1.2.17-2.el7.x86_64
-scap-security-guide-0.1.40-12.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Generate remediation playbook:
# oscap xccdf generate fix --profile xccdf_org.ssgproject.content_profile_stig-rhel7-disa --template urn:xccdf:fix:script:ansible /usr/share/xml/scap/ssg/content/ssg-rhel7-ds.xml &> disatest1.yml

2. Attempt to remediate with that playbook:
# ansible-playbook -i "localhost," -c local disatest1.yml

Actual results:

- Playbook exits on error

[root@localhost ~]# ansible-playbook -i "localhost," -c local disatest1.yml 

PLAY [all] ********************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************************************************************************************
ok: [localhost]

TASK [Ensure rsh-server is removed] *******************************************************************************************************************************************************************************
ok: [localhost] => (item=rsh-server)

TASK [Ensure telnet-server is removed] ****************************************************************************************************************************************************************************
ok: [localhost] => (item=telnet-server)

TASK [Ensure ypserv is removed] ***********************************************************************************************************************************************************************************
ok: [localhost] => (item=ypserv)

TASK [Ensure tftp-server is removed] ******************************************************************************************************************************************************************************
ok: [localhost] => (item=tftp-server)

TASK [Ensure vsftpd is removed] ***********************************************************************************************************************************************************************************
ok: [localhost] => (item=vsftpd)

TASK [Ensure group owner 0 on /etc/cron.allow] ********************************************************************************************************************************************************************
failed: [localhost] (item=/etc/cron.allow) => {"changed": false, "item": "/etc/cron.allow", "msg": "file (/etc/cron.allow) is absent, cannot continue", "path": "/etc/cron.allow", "state": "absent"}
	to retry, use: --limit @/root/disatest1.retry

PLAY RECAP ********************************************************************************************************************************************************************************************************
localhost                  : ok=6    changed=0    unreachable=0    failed=1   
[root@localhost ~]#


Expected results:

- playbook will continue running on error. In the following example I added "ignore_errors: yes" to the failed task on the previous update. This allows it to continue to the next task, which again causes the playbook to exit. I would expect that adding "ignore_errors: yes" on tasks may be an acceptable workaround, unless there is some unexpected consequence that I am not aware of.

[root@localhost ~]# ansible-playbook -i "localhost," -c local disatest1.yml 

PLAY [all] **********************************************************************************

TASK [Gathering Facts] **********************************************************************
ok: [localhost]

TASK [Ensure rsh-server is removed] *********************************************************
ok: [localhost] => (item=rsh-server)

TASK [Ensure telnet-server is removed] ******************************************************
ok: [localhost] => (item=telnet-server)

TASK [Ensure ypserv is removed] *************************************************************
ok: [localhost] => (item=ypserv)

TASK [Ensure tftp-server is removed] ********************************************************
ok: [localhost] => (item=tftp-server)

TASK [Ensure vsftpd is removed] *************************************************************
ok: [localhost] => (item=vsftpd)

TASK [Ensure group owner 0 on /etc/cron.allow] **********************************************
failed: [localhost] (item=/etc/cron.allow) => {"changed": false, "item": "/etc/cron.allow", "msg": "file (/etc/cron.allow) is absent, cannot continue", "path": "/etc/cron.allow", "state": "absent"}
...ignoring

TASK [Ensure owner 0 on /etc/cron.allow] ****************************************************
failed: [localhost] (item=/etc/cron.allow) => {"changed": false, "item": "/etc/cron.allow", "msg": "file (/etc/cron.allow) is absent, cannot continue", "path": "/etc/cron.allow", "state": "absent"}
	to retry, use: --limit @/root/disatest1.retry

PLAY RECAP **********************************************************************************
localhost                  : ok=7    changed=0    unreachable=0    failed=1   

[root@localhost ~]#


Additional info:
- The remediation generated in the above example should actually match /usr/share/scap-security-guide/ansible/ssg-rhel7-role-stig-rhel7-disa.xml, because no customizations were added. Confirmed with diff there are only a few differences that do not affect the results of testing.

Comment 2 Marek Haicman 2018-11-23 12:11:39 UTC

Hello Ryan, this behaviour is default on Ansible, and trying to workaround it would be against the spirit of it. What you perceive might be issue in particular rule of the ansible playbooks. And that's truly something we will look into.

I will close this one as not a bug, but please feel free to report any unexpected remediation failures as separate bugs.

Comment 3 Ryan Mullett 2018-11-23 21:43:14 UTC

Marek, I understand that it is the default behavior on Ansible. But my concern is that in the current iteration the playbooks are nearly useless compared to just using the --remediate flag because as soon as the playbook hits any issue at all it completely exits. 

I'm not sure what the solution is, but I'm curious on the thinking for even leaving this in if as soon we we hit any errors for a missing file, etc. it exits. This would exponentially increase the time to remediate a system. I'm also concerned that any time someone tries to use Ansible remediations they're going to realize how bad the actual usage is in comparison to the other automated options available in openscap and then they're going to open a case with support for it.

Comment 4 Kyle Walker 2018-11-27 18:08:10 UTC

I'm going to side with Ryan on this one and request we take a look at the playbook template behaviour. From his example:

<snip>
TASK [Ensure group owner 0 on /etc/cron.allow] **********************************************
failed: [localhost] (item=/etc/cron.allow) => {"changed": false, "item": "/etc/cron.allow", "msg": "file (/etc/cron.allow) is absent, cannot continue", "path": "/etc/cron.allow", "state": "absent"}
...ignoring

TASK [Ensure owner 0 on /etc/cron.allow] ****************************************************
failed: [localhost] (item=/etc/cron.allow) => {"changed": false, "item": "/etc/cron.allow", "msg": "file (/etc/cron.allow) is absent, cannot continue", "path": "/etc/cron.allow", "state": "absent"}
	to retry, use: --limit @/root/disatest1.retry
<snip>

The first entry is ignoring the error because the file is absent. The second one is not ignoring the failure and continuing on. There are a LARGE number of strategies available in Ansible to avoid these types of issues when writing playbooks, and I would expect any generated playbooks to use similar strategies in non-fatal error conditions.

For example - From a generated playbook:

    - name: Ensure owner 0 on /etc/cron.allow
      file:
        path: "{{ item }}"
        owner: 0
      with_items:
        - /etc/cron.allow
      tags:
        - file_owner_cron_allow
        - medium_severity
        - configure_strategy
        - low_complexity
        - low_disruption
        - CCE-80378-3
        - NIST-800-53-AC-6
        - DISA-STIG-RHEL-07-021110

This results in:

    # ansible-playbook -c local -i localhost, --tag file_owner_cron_allow disatest1.yml

    PLAY [all] *****************************************************************************************
 
    TASK [Gathering Facts] *****************************************************************************
    ok: [localhost]
 
    TASK [Ensure owner 0 on /etc/cron.allow] ***********************************************************
    failed: [localhost] (item=/etc/cron.allow) => {"changed": false, "item": "/etc/cron.allow", "msg": "file (/etc/cron.allow) is absent, cannot continue", "path": "/etc/cron.allow", "state": "absent"}
            to retry, use: --limit @/home/vagrant/disatest1.retry
 
    PLAY RECAP *****************************************************************************************
    localhost                  : ok=1    changed=0    unreachable=0    failed=1

Versus:

    - name: Ensure group owner 0 on /etc/cron.allow
      file:
        path: "{{ item }}"
        group: 0
      with_items:
        - /etc/cron.allow
      tags:
        - file_groupowner_cron_allow
        - medium_severity
        - configure_strategy
        - low_complexity
        - low_disruption
        - CCE-80379-1
        - NIST-800-53-AC-6
        - DISA-STIG-RHEL-07-021120
      ignore_errors: yes

Results:

    # ansible-playbook -c local -i localhost, --tag file_owner_cron_allow disatest1.yml

    PLAY [all] *****************************************************************************************

    TASK [Gathering Facts] *****************************************************************************
    ok: [localhost]

    TASK [Ensure owner 0 on /etc/cron.allow] ***********************************************************
    failed: [localhost] (item=/etc/cron.allow) => {"changed": false, "item": "/etc/cron.allow", "msg": "file (/etc/cron.allow) is absent, cannot continue", "path": "/etc/cron.allow", "state": "absent"}
    ...ignoring

    TASK [XCCDF Value var_sssd_ldap_tls_ca_dir] ********************************************************
    ok: [localhost]

    TASK [XCCDF Value sshd_approved_macs] **************************************************************
    ok: [localhost]

    <snip>

This can be even more gracefully avoided with:

    - name: Check to see if /etc/cron.allow exists
      stat:
        path: "/etc/cron.allow"
      tags:
        - file_owner_cron_allow
      register: cron_allow

    - name: Ensure owner 0 on /etc/cron.allow
      file:
        path: "{{ item }}"
        owner: 0
      with_items:
        - /etc/cron.allow
      tags:
        - file_owner_cron_allow
        - medium_severity
        - configure_strategy
        - low_complexity
        - low_disruption
        - CCE-80378-3
        - NIST-800-53-AC-6
        - DISA-STIG-RHEL-07-021110
      when: cron_allow.stat.exists == True and cron_allow.stat.isreg == True

Results:

    # ansible-playbook -c local -i localhost, --tag file_owner_cron_allow disatest1.yml | head -25

    PLAY [all] *****************************************************************************************

    TASK [Gathering Facts] *****************************************************************************
    ok: [localhost]

    TASK [Check to see if /etc/cron.allow exists] ******************************************************
    ok: [localhost]

    TASK [Ensure owner 0 on /etc/cron.allow] ***********************************************************
    skipping: [localhost] => (item=/etc/cron.allow)

    TASK [XCCDF Value var_sssd_ldap_tls_ca_dir] ********************************************************
    ok: [localhost]

    TASK [XCCDF Value sshd_approved_macs] **************************************************************
    ok: [localhost]

    TASK [XCCDF Value sshd_idle_timeout_value] *********************************************************
    ok: [localhost]

    TASK [XCCDF Value inactivity_timeout_value] ********************************************************
    ok: [localhost]

    TASK [XCCDF Value rsyslog_remote_loghost_address] **************************************************
    <snip>

If we are going to go to the trouble of creating Ansible playbooks as a remediation method, we should make sure that the results are functional...

- Kyle Walker

Comment 5 Watson Yuuma Sato 2019-03-19 08:42:38 UTC

The following patches fix errors from tasks in OSPP profile
- https://github.com/ComplianceAsCode/content/pull/4041
- https://github.com/ComplianceAsCode/content/pull/4036

And following patches fix errors in tasks not selected in any profile:
- https://github.com/ComplianceAsCode/content/pull/4054
- https://github.com/ComplianceAsCode/content/pull/4109
- https://github.com/ComplianceAsCode/content/pull/4198

Comment 12 jspringe 2019-06-12 15:17:38 UTC

The primary issue here is that certain modules don't account for the possibility of files missing. It only expects that it exists, even though that isn't a requirement of the guideline itself. 

For example, if you don't have GNOME configured, it will fail at checking or disabling GNOME if you don't have it installed. One method would be to disable that particular security guideline, but for an enterprise solution (where all/some/none of hosts could have that file), you'd want to cover the gap to account for any servers with $FILE, so the best method for handling would be to perform an audit function prior to executing the task if that security guideline is deemed applicable by their AO.

- Josh

Comment 13 Watson Yuuma Sato 2019-06-26 14:15:47 UTC

Hello Josh,

That is what most of the Ansible remediations should be doing now.
If a file that should be configured according to a rule is not present, we skip the task that would configure the file.

Are you facing issues with Ansible remediations related to GNOME?
Does it also happens with scap-security-guide-0.1.43-7.el7 ?

Thanks

Comment 18 errata-xmlrpc 2019-08-06 13:04:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2198

Note You need to log in before you can comment on or make changes to this bug.