Bug 1566089

Summary: oscap --remediate hangs at autofs
Product: Red Hat Enterprise Linux 7 Reporter: Thomas Jones <redhat>
Component: openscapAssignee: Matus Marhefka <mmarhefk>
Status: CLOSED CANTFIX QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.5CC: jcerny, loren, mhaicman, openscap-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-18 09:41:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Exemplar SOS report none

Description Thomas Jones 2018-04-11 13:47:28 UTC
Description of problem:

Since updating to RHEL 7.5, the oscap utility fails when using remediation mode (--remediate flag)

Version-Release number of selected component (if applicable):

Command-line reports:

# oscap --version

OpenSCAP command line tool (oscap) 1.2.16

How reproducible:

Always:

Steps to Reproduce:
1. Launch new RHEL 7.5 AMI from AWS marketplace
2. Login to instance
3. Escalate privileges to `root` (via sudo)
4. Execute `oscap xccdf eval --remediate --profile xccdf_org.ssgproject.content_profile_C2S /root/scap/content/openscap/ssg-rhel7-ds.xml `

Actual results:

Hangs at:

```
Title   Disable the Automounter

Rule    xccdf_org.ssgproject.content_rule_service_autofs_disabled

Ident   CCE-27498-5
```

Expected results:

The utility runs to completion, noting actions and errors and fixing items.


Additional info:

Tried looking through `strace` output to see what/where it was hanging. Did not see anything that looked like it was "stuck", output simply stops rather than spinning on an operation.

Comment 2 Loren Gordon 2018-04-11 14:10:20 UTC
FWIW, here's the same command but using the SCAP content distributed by the scap-security-guide RPM, resulting in the same issue.

Run:

```
oscap xccdf eval --remediate --profile xccdf_org.ssgproject.content_profile_C2S /usr/share/xml/scap/ssg/content/ssg-rhel7-ds.xml
```

Hangs in the same place:

```
Title   Disable the Automounter
Rule    xccdf_org.ssgproject.content_rule_service_autofs_disabled
Ident   CCE-27498-5
```

Comment 3 Matus Marhefka 2018-04-12 15:54:47 UTC
Hello,

I was unable to reproduce this issue. I suppose you are using datastream from scap-security-guide-0.1.36-7.el7.noarch? Can you try running remediation only for this single rule:

$ sudo oscap xccdf eval --remediate --profile xccdf_org.ssgproject.content_profile_C2S --rule xccdf_org.ssgproject.content_rule_service_autofs_disabled /usr/share/xml/scap/ssg/content/ssg-rhel7-ds.xml

so we are 100% sure that previous rules/remediations do not have any influence in that hanging issue.
Also based on your reports it is not clear if the hanging issue appears during the scan or after the scan during remediation process.

When looking at the remediation in ssg-rhel7-ds.xml for this rule, it runs the following command:

systemctl disable autofs

Can you try to execute the following commands manually and report their output:
$ sudo systemctl status autofs
$ sudo systemctl disable autofs
$ sudo systemctl status autofs

Comment 4 Loren Gordon 2018-04-12 18:55:12 UTC
Thanks, yes, we are using the datastream from scap-security-guide-0.1.36-7.el7.noarch.

The hanging issue appears during the scan portion, before any remediation begins.

The issue occurs even just running the single rule, using the command you provided.

autofs doesn't look to be installed...

```
# systemctl status autofs
Unit autofs.service could not be found.
# rpm -q autofs
package autofs is not installed
```

I went ahead and installed autofs, but that did not resolve the problem. It was installed in the disabled state, so I enabled it and tried again. Still no dice. It was enabled but dead, so I started it. Continued to hang.

Still searching for a cause here...

Comment 5 Thomas Jones 2018-04-13 15:54:12 UTC
(In reply to Matus Marhefka from comment #3)
> Hello,
> 
> I was unable to reproduce this issue. I suppose you are using datastream
> from scap-security-guide-0.1.36-7.el7.noarch? Can you try running
> remediation only for this single rule:
> 
> $ sudo oscap xccdf eval --remediate --profile
> xccdf_org.ssgproject.content_profile_C2S --rule
> xccdf_org.ssgproject.content_rule_service_autofs_disabled
> /usr/share/xml/scap/ssg/content/ssg-rhel7-ds.xml

Did some additional testing on a new test-rig. Looks like running `oscap xccdf eval --remediate --profile xccdf_org.ssgproject.content_profile_C2S --rule xccdf_org.ssgproject.content_rule_service_autofs_disabled /usr/share/xml/scap/ssg/content/ssg-rhel7-ds.xml` doesn't so much hang/fail as it attempts to make a call-out to external content, pauses while that times out, but then ultimately returns:

```
# strace -D -f -ff -t -y -yy -o oscap-remediate.trace-$(date "+%Y%m%d%H%M") oscap xccdf eval --remediate --profile xccdf_org.ssgproject.content_profile_C2S --rule xccdf_org.ssgproject.content_rule_service_autofs_disabled /usr/share/xml/scap/ssg/content/ssg-rhel7-ds.xml
WARNING: This content points out to the remote resources. Use `--fetch-remote-resources' option to download them.
WARNING: Skipping https://www.redhat.com/security/data/oval/com.redhat.rhsa-RHEL7.xml.bz2 file which is referenced from XCCDF content
Title   Disable the Automounter
Rule    xccdf_org.ssgproject.content_rule_service_autofs_disabled
Ident   CCE-27498-5
Result  pass


 --- Starting Remediation ---
WARNING: This content points out to the remote resources. Use `--fetch-remote-resources' option to download them.
WARNING: Skipping https://www.redhat.com/security/data/oval/com.redhat.rhsa-RHEL7.xml.bz2 file which is referenced from XCCDF content
```

Comment 6 Thomas Jones 2018-04-13 16:03:37 UTC
Note: new test rig does not have autofs installed. While *this* test-EC2 isn't failing on the requested, stand-alone oscap-task, going to try to attach an sos report any way (our test EC2s will generally align very closely to this current one).

Comment 7 Thomas Jones 2018-04-13 16:05:41 UTC
Created attachment 1421436 [details]
Exemplar SOS report

Capturing SOS for test-EC2 (our EC2s are auto-built so should be just about identical if we need to launch a new one for further testing-exercises)

Comment 8 Thomas Jones 2018-04-13 22:19:10 UTC
Did a lot more digging around, today. Looks like our issue's a bit more complicated than first blush.

The AMI we use is derived from the Red Hat published one. Specifically, to meet our customers' compliance-needs, the AMI needs to be partitioned and have FIPS-mode enabled. The derivation-method allows our customized AMI to use the RHUI entitlements that the Red Hat published upstream AMI provides.

We were initially testing against an RHEL 7.4 instance that _became_ (well, tried to, really) 7.5 when the remediation was run. Ultimately, it looks like there's an issue when moving from the 7.4 (March 2018) version of DBUS to 7.5's verion. During yum update, we started getting DBUS hangs and "quark 39" errors. 

Upon attempting a post-update reboot, the reboot process would take a very long time. In looking at the logs, the slowness was because a number of services were getting connection refused errors from DBUS and having to be timeout-killed. Upon finally coming back online, DBUS continued to be deranged: even attempting to do `timedatectl status` resulted in "connection refused" errors and /var/log/messages was logging NetworkManager-related DBUS errors a couple times per minute.

There didn't seem to be a recovery path for the test-EC2, so we terminated it and launched a new one. Prior to doing the `yum update`, we:

* installed the versionlock yum plugin
* added "exclude=dbus dbus-libs teamd libteam wpa_supplicant" to the /etc/yum.conf
* added openscap-1.2.14 and openscap-scanner-1.2.14 to /etc/yum/pluginconf.d/versionlock.list.

The actual issue seems to be the DBUS update when going 7.4 to 7.5. The openscap-scanner, itself, has a DBUS dependency that, absent version-locking it, will also cause an attempted update of DBUS.

After running yum update and rebooting, the original `openscap --remediate` that was hanging on prior testing EC2 instances succeeded.

Comment 9 Thomas Jones 2018-04-13 22:19:20 UTC
Did a lot more digging around, today. Looks like our issue's a bit more complicated than first blush.

The AMI we use is derived from the Red Hat published one. Specifically, to meet our customers' compliance-needs, the AMI needs to be partitioned and have FIPS-mode enabled. The derivation-method allows our customized AMI to use the RHUI entitlements that the Red Hat published upstream AMI provides.

We were initially testing against an RHEL 7.4 instance that _became_ (well, tried to, really) 7.5 when the remediation was run. Ultimately, it looks like there's an issue when moving from the 7.4 (March 2018) version of DBUS to 7.5's verion. During yum update, we started getting DBUS hangs and "quark 39" errors. 

Upon attempting a post-update reboot, the reboot process would take a very long time. In looking at the logs, the slowness was because a number of services were getting connection refused errors from DBUS and having to be timeout-killed. Upon finally coming back online, DBUS continued to be deranged: even attempting to do `timedatectl status` resulted in "connection refused" errors and /var/log/messages was logging NetworkManager-related DBUS errors a couple times per minute.

There didn't seem to be a recovery path for the test-EC2, so we terminated it and launched a new one. Prior to doing the `yum update`, we:

* installed the versionlock yum plugin
* added "exclude=dbus dbus-libs teamd libteam wpa_supplicant" to the /etc/yum.conf
* added openscap-1.2.14 and openscap-scanner-1.2.14 to /etc/yum/pluginconf.d/versionlock.list.

The actual issue seems to be the DBUS update when going 7.4 to 7.5. The openscap-scanner, itself, has a DBUS dependency that, absent version-locking it, will also cause an attempted update of DBUS.

After running yum update and rebooting, the original `openscap --remediate` that was hanging on prior testing EC2 instances succeeded.

Comment 10 Matus Marhefka 2018-04-16 10:00:11 UTC
Hello Thomas,

thank you for a detailed report. Have you also tried if the problem occurs on the fresh RHEL-7.5 installation or your situation requires upgrade from 7.4 to 7.5?

Anyway, I think we should either report a new bug against dbus or change this bug's component to dbus. Do you agree?

Comment 11 Thomas Jones 2018-04-17 10:54:17 UTC
The situation only seems to trigger on the 7.4 to 7.5 migration. The two specific scenarios we've observed are:

* Updating an instance built from a 7.4 AMI to the most recent patch set
* Attempting to create a "downstream" 7.5 AMI from a 7.4 upstream AMI

The latter is mostly a matter of being inconvenient: we have to update our Packer jobs' starting AMI ID rather than being able to use an arbitrary AMI as we are able to do with our RHEL 6 AMI-generation routines.

The former is the more critical problem as, right now, it seems like the tenants we provide tailored AMIs for may need to be told "you have two choices: re-deploy from the new, 7.5 AMI; or, use the following multi-step, manual procedure to version-pin your DBUS so that you execute your normal `yum update` procedures".

Given that it's apparent this is an issue with DBUS — possibly our particular method of (ab)using it — and that the apparent `oscap` issues were symptomatic of DBUS issues rather than `oscap` issues, it likely makes sense to close this issue in favor of opening a DBUS issue.

Do you need me to open the issue or can you open the issue (backreferencing this one)?

Comment 12 Matus Marhefka 2018-04-18 09:41:09 UTC
Yes, please open the new issue for DBUS. I am closing this one, but you can still reference it in the new issue.