Bug 999903 - aborting openscap results in RPMDB corruption
aborting openscap results in RPMDB corruption
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: openscap (Show other bugs)
6.4
All Linux
medium Severity medium
: rc
: ---
Assigned To: Daniel Kopeček
Lukas "krteknet" Novy
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-22 06:57 EDT by Sergio Freire
Modified: 2013-11-21 04:43 EST (History)
10 users (show)

See Also:
Fixed In Version: openscap-0.9.12-1.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-21 04:43:42 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch to open the RPMDB in readonly (2.29 KB, patch)
2013-08-22 12:17 EDT, Sergio Freire
no flags Details | Diff

  None (edit)
Description Sergio Freire 2013-08-22 06:57:33 EDT
Description of problem:
openscap apparently opens RPMDB in write mode whenever evaluating  rpminfo/rpmverify OVAL checks, and it should only open it in readonly. By opening it in write mode, if aborting (because the process can take some time) the RPMDB gets corrupted.

Version-Release number of selected component (if applicable):
0.9.11

How reproducible:
always

Steps to Reproduce:
1. start oscap eval (xccdf or oval)
2. cancel using control+c or by sending SIGINT
3. consult the RPM db, using rpm -qa for example

Actual results:
RPM db gets corrupted 

Expected results:
RPM db should not be corrupted.


Additional info:
Comment 2 Tomas Heinrich 2013-08-22 11:36:53 EDT
I'm not convinced that this is a bug in open scap.

I conclude there are stale locks on the db because the scan was abruptly terminated. This shouldn't be caused by a RW access alone and needs to be resolved manually[1] (e.g. by running $ rm /var/lib/rpm/__*).

I've tried to change the access mode to RO by calling rpmtsSetDBMode() (which is likely a desired change anyway) but this alone didn't prevent the issue from occurring.

This fallout of interrupting a running scan is pretty inconvenient and perhaps the application should make steps clean the db state up if possible, but this still won't cover non-graceful termination.

[1] http://rpm.org/wiki/Docs/RpmRecovery
Comment 3 Sergio Freire 2013-08-22 12:17:52 EDT
Created attachment 789268 [details]
patch to open the RPMDB in readonly

probably not the best patch but it works
Comment 4 Sergio Freire 2013-08-22 12:20:13 EDT
the openscap probes depend on rpmtsInitIterator() to open the "ts".
if you open the "ts" readonly before using rpmtsOpenDB, it will open the ts an the rpmtsInitIterator will simple reuse it, in readonly.
see previous attachment.
Comment 5 Tomas Heinrich 2013-08-23 09:21:03 EDT
Thanks for the patch. I agree that the db should be opened readonly and we should get this changed upstream soon.

But: the patch you've provided didn't prevent the symptoms as far as I can see.

I can see that rpmtsInitIterator() calls rpmtsOpenDB(ts, ts->dbmode) and I've tried to set the mode to RO with rpmtsSetDBMode() after the ts is created. This didn't help either.

I'm told that the readonly locks should be cleaned up automatically so either of the attempted changes should have helped (assuming we didn't screw something up in the patches).

I'd like to emphasize that the issue you're seeing is not reproducable reliably. I had to put the system under some load and interrupt the tool very early after it started printing the results to see the symptoms. Maybe you just didn't have enough "luck" after patching.

The issue is even more puzzling because there should already be some code handling the sudden interrupts to oscap. I'll have to do some digging around and ask somebody more knowledgable about librmp.

Thanks so far.
Comment 6 Tomas Heinrich 2013-08-27 09:04:08 EDT
I have some new information:

The default mode should already be RO. What makes you think it is opened RW?

But even for RO, when running as root, the corruption happens if the probe is suddenly terminated. In the case of hitting ctrl-c, this should be properly handled and needs to be fixed.

You didn't provide the output you're getting in step 3. after the corruption happens so I'm only guessing your symptoms are the same as mine.
Comment 7 Sergio Freire 2013-08-27 13:07:23 EDT
Well, about it being opened in RW it was just a guess, since the applied patch seemed to resolve the issue. Although oscap runs as root, it should use RPMDB only in RO. The error in fact does not occur always; in a slow VM it occurs almost always, though. oscap was aborted using control+c.
here's the output

[root@seis64 ~]# oscap xccdf eval --profile production-base PTIN-rhel-xccdf.xml
Title   Ensure /tmp Located On Separate Partition
Rule    partition_for_tmp
Ident   CCE-26435-8
Result  fail

Title   Ensure /var/log Located On Separate Partition
Rule    partition_for_var_log
Ident   CCE-26215-4
Result  pass

Title   Ensure /home Located On Separate Partition
Rule    partition_for_home
Ident   CCE-26557-9
Result  pass

Title   Ensure Red Hat GPG Key Installed
Rule    ensure_redhat_gpgkey_installed
Ident   CCE-26506-6
Result  pass

Title   Verify File Hashes with RPM
Rule    rpm_verify_hashes
Ident   CCE-27223-7
Result  ^C
[root@seis64 ~]# rpm -qi openscap
rpmdb: Thread/process 13300/140619998598912 failed: Thread died in Berkeley DB library
error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db3 -  (-30974)
error: cannot open Packages database in /var/lib/rpm
rpmdb: Thread/process 13300/140619998598912 failed: Thread died in Berkeley DB library
error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages database in /var/lib/rpm
package openscap is not installed
[root@seis64 ~]#
Comment 8 Tomas Heinrich 2013-08-28 04:54:39 EDT
Thanks, that's the same error that I see. We'll need to look into handling the C-c case cleanly.
Comment 9 Steve Grubb 2013-08-30 09:18:36 EDT
Perhaps oscap should set a signal handler and issue a command to probes to terminate cleanly?
Comment 10 Tomas Heinrich 2013-08-30 10:19:25 EDT
IIRC, there should already be such a code, but something is apparently amiss.

librpm has it's own way of handling signals which can possibly interfere with our handlers but after a quick glance over the code this doesn't seem to be the case.
Comment 11 Daniel Kopeček 2013-09-03 08:27:39 EDT
(In reply to Steve Grubb from comment #9)
> Perhaps oscap should set a signal handler and issue a command to probes to
> terminate cleanly?

There is a signal handler implemented as a thread that calls sigwaitinfo. Signals in all other threads are blocked. On Linux, we use prctl(PR_SET_PDEATHSIG, SIGTERM) in probes to make sure that they get a shutdown signal if the parent dies and doesn't close the comm. channels to them explicitly.

However, if you hit ^C during an evaluation of a object which touches the RPM database, then the associated probe_main() thread in the probe process gets cancelled by the shutdown procedure (using pthread_cancel). It looks like that the DB corruption is caused by DB iterators created inside the probe_main() thread which were not correctly destroyed because of killing the thread.

I've tried to add pthread_setcancelstate calls to create a critical section around the code that uses these iterators and it seems that it fixes the problem.

I've also tried to use rpmtsDBVerify() and rpmtsDBRebuild() in the probe_fini() function (which we call after we kill the probe_main() thread and before we exit()) but that didn't do anything. rpmCheckTerminate() didn't help either.

The last option I can think of is using pthread_cleanup_push to register a cleanup function for the opened iterators.
Comment 12 Steve Grubb 2013-09-03 14:44:30 EDT
I was looking in the oscap source and could not find any calls to sigaction() to install a signal handler (sigterm and sigchild). What I was thinking was that it would catch the signal, call a shutdown function in the library, it would send sigterm or a communication message that says shutdown, the probe would wrap things up safely and quit, the oscap program would reap the child processes and exit.
Comment 20 errata-xmlrpc 2013-11-21 04:43:42 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1590.html

Note You need to log in before you can comment on or make changes to this bug.