999903 – aborting openscap results in RPMDB corruption

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 999903 - aborting openscap results in RPMDB corruption

Summary: aborting openscap results in RPMDB corruption

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	openscap
Sub Component:
Version:	6.4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Daniel Kopeček
QA Contact:	Lukas "krteknet" Novy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-22 10:57 UTC by Sergio Freire
Modified:	2013-11-21 09:43 UTC (History)
CC List:	10 users (show)
Fixed In Version:	openscap-0.9.12-1.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-11-21 09:43:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
patch to open the RPMDB in readonly (2.29 KB, patch) 2013-08-22 16:17 UTC, Sergio Freire	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:1590	0	normal	SHIPPED_LIVE	openscap bug fix and enhancement update	2013-11-20 21:39:32 UTC

Description Sergio Freire 2013-08-22 10:57:33 UTC

Description of problem:
openscap apparently opens RPMDB in write mode whenever evaluating  rpminfo/rpmverify OVAL checks, and it should only open it in readonly. By opening it in write mode, if aborting (because the process can take some time) the RPMDB gets corrupted.

Version-Release number of selected component (if applicable):
0.9.11

How reproducible:
always

Steps to Reproduce:
1. start oscap eval (xccdf or oval)
2. cancel using control+c or by sending SIGINT
3. consult the RPM db, using rpm -qa for example

Actual results:
RPM db gets corrupted 

Expected results:
RPM db should not be corrupted.


Additional info:

Comment 2 Tomas Heinrich 2013-08-22 15:36:53 UTC

I'm not convinced that this is a bug in open scap.

I conclude there are stale locks on the db because the scan was abruptly terminated. This shouldn't be caused by a RW access alone and needs to be resolved manually[1] (e.g. by running $ rm /var/lib/rpm/__*).

I've tried to change the access mode to RO by calling rpmtsSetDBMode() (which is likely a desired change anyway) but this alone didn't prevent the issue from occurring.

This fallout of interrupting a running scan is pretty inconvenient and perhaps the application should make steps clean the db state up if possible, but this still won't cover non-graceful termination.

[1] http://rpm.org/wiki/Docs/RpmRecovery

Comment 3 Sergio Freire 2013-08-22 16:17:52 UTC

Created attachment 789268 [details]
patch to open the RPMDB in readonly

probably not the best patch but it works

Comment 4 Sergio Freire 2013-08-22 16:20:13 UTC

the openscap probes depend on rpmtsInitIterator() to open the "ts".
if you open the "ts" readonly before using rpmtsOpenDB, it will open the ts an the rpmtsInitIterator will simple reuse it, in readonly.
see previous attachment.

Comment 5 Tomas Heinrich 2013-08-23 13:21:03 UTC

Thanks for the patch. I agree that the db should be opened readonly and we should get this changed upstream soon.

But: the patch you've provided didn't prevent the symptoms as far as I can see.

I can see that rpmtsInitIterator() calls rpmtsOpenDB(ts, ts->dbmode) and I've tried to set the mode to RO with rpmtsSetDBMode() after the ts is created. This didn't help either.

I'm told that the readonly locks should be cleaned up automatically so either of the attempted changes should have helped (assuming we didn't screw something up in the patches).

I'd like to emphasize that the issue you're seeing is not reproducable reliably. I had to put the system under some load and interrupt the tool very early after it started printing the results to see the symptoms. Maybe you just didn't have enough "luck" after patching.

The issue is even more puzzling because there should already be some code handling the sudden interrupts to oscap. I'll have to do some digging around and ask somebody more knowledgable about librmp.

Thanks so far.

Comment 6 Tomas Heinrich 2013-08-27 13:04:08 UTC

I have some new information:

The default mode should already be RO. What makes you think it is opened RW?

But even for RO, when running as root, the corruption happens if the probe is suddenly terminated. In the case of hitting ctrl-c, this should be properly handled and needs to be fixed.

You didn't provide the output you're getting in step 3. after the corruption happens so I'm only guessing your symptoms are the same as mine.

Comment 7 Sergio Freire 2013-08-27 17:07:23 UTC

Well, about it being opened in RW it was just a guess, since the applied patch seemed to resolve the issue. Although oscap runs as root, it should use RPMDB only in RO. The error in fact does not occur always; in a slow VM it occurs almost always, though. oscap was aborted using control+c.
here's the output

[root@seis64 ~]# oscap xccdf eval --profile production-base PTIN-rhel-xccdf.xml
Title   Ensure /tmp Located On Separate Partition
Rule    partition_for_tmp
Ident   CCE-26435-8
Result  fail

Title   Ensure /var/log Located On Separate Partition
Rule    partition_for_var_log
Ident   CCE-26215-4
Result  pass

Title   Ensure /home Located On Separate Partition
Rule    partition_for_home
Ident   CCE-26557-9
Result  pass

Title   Ensure Red Hat GPG Key Installed
Rule    ensure_redhat_gpgkey_installed
Ident   CCE-26506-6
Result  pass

Title   Verify File Hashes with RPM
Rule    rpm_verify_hashes
Ident   CCE-27223-7
Result  ^C
[root@seis64 ~]# rpm -qi openscap
rpmdb: Thread/process 13300/140619998598912 failed: Thread died in Berkeley DB library
error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db3 -  (-30974)
error: cannot open Packages database in /var/lib/rpm
rpmdb: Thread/process 13300/140619998598912 failed: Thread died in Berkeley DB library
error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages database in /var/lib/rpm
package openscap is not installed
[root@seis64 ~]#

Comment 8 Tomas Heinrich 2013-08-28 08:54:39 UTC

Thanks, that's the same error that I see. We'll need to look into handling the C-c case cleanly.

Comment 9 Steve Grubb 2013-08-30 13:18:36 UTC

Perhaps oscap should set a signal handler and issue a command to probes to terminate cleanly?

Comment 10 Tomas Heinrich 2013-08-30 14:19:25 UTC

IIRC, there should already be such a code, but something is apparently amiss.

librpm has it's own way of handling signals which can possibly interfere with our handlers but after a quick glance over the code this doesn't seem to be the case.

Comment 11 Daniel Kopeček 2013-09-03 12:27:39 UTC

(In reply to Steve Grubb from comment #9)
> Perhaps oscap should set a signal handler and issue a command to probes to
> terminate cleanly?

There is a signal handler implemented as a thread that calls sigwaitinfo. Signals in all other threads are blocked. On Linux, we use prctl(PR_SET_PDEATHSIG, SIGTERM) in probes to make sure that they get a shutdown signal if the parent dies and doesn't close the comm. channels to them explicitly.

However, if you hit ^C during an evaluation of a object which touches the RPM database, then the associated probe_main() thread in the probe process gets cancelled by the shutdown procedure (using pthread_cancel). It looks like that the DB corruption is caused by DB iterators created inside the probe_main() thread which were not correctly destroyed because of killing the thread.

I've tried to add pthread_setcancelstate calls to create a critical section around the code that uses these iterators and it seems that it fixes the problem.

I've also tried to use rpmtsDBVerify() and rpmtsDBRebuild() in the probe_fini() function (which we call after we kill the probe_main() thread and before we exit()) but that didn't do anything. rpmCheckTerminate() didn't help either.

The last option I can think of is using pthread_cleanup_push to register a cleanup function for the opened iterators.

Comment 12 Steve Grubb 2013-09-03 18:44:30 UTC

I was looking in the oscap source and could not find any calls to sigaction() to install a signal handler (sigterm and sigchild). What I was thinking was that it would catch the signal, call a shutdown function in the library, it would send sigterm or a communication message that says shutdown, the probe would wrap things up safely and quit, the oscap program would reap the child processes and exit.

Comment 13 Daniel Kopeček 2013-09-10 16:15:39 UTC

Fixed upstream:
 https://git.fedorahosted.org/cgit/openscap.git/commit/?id=b648d6752a7a804d02c75138ea5e6069edb1e30c
 https://git.fedorahosted.org/cgit/openscap.git/commit/?id=180552c8f85b52c75757b6200a4a42bbb6312551

Comment 20 errata-xmlrpc 2013-11-21 09:43:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1590.html

Note You need to log in before you can comment on or make changes to this bug.