Bug 999903
| Summary: | aborting openscap results in RPMDB corruption | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Sergio Freire <sergio-s-freire> | ||||
| Component: | openscap | Assignee: | Daniel Kopeček <dkopecek> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Lukas "krteknet" Novy <lnovy> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 6.4 | CC: | dkopecek, ebenes, lnovy, openscap-maint, plautrba, pvrabec, sergio-s-freire, sgrubb, slukasik, theinric | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | openscap-0.9.12-1.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-11-21 09:43:42 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Sergio Freire
2013-08-22 10:57:33 UTC
I'm not convinced that this is a bug in open scap. I conclude there are stale locks on the db because the scan was abruptly terminated. This shouldn't be caused by a RW access alone and needs to be resolved manually[1] (e.g. by running $ rm /var/lib/rpm/__*). I've tried to change the access mode to RO by calling rpmtsSetDBMode() (which is likely a desired change anyway) but this alone didn't prevent the issue from occurring. This fallout of interrupting a running scan is pretty inconvenient and perhaps the application should make steps clean the db state up if possible, but this still won't cover non-graceful termination. [1] http://rpm.org/wiki/Docs/RpmRecovery Created attachment 789268 [details]
patch to open the RPMDB in readonly
probably not the best patch but it works
the openscap probes depend on rpmtsInitIterator() to open the "ts". if you open the "ts" readonly before using rpmtsOpenDB, it will open the ts an the rpmtsInitIterator will simple reuse it, in readonly. see previous attachment. Thanks for the patch. I agree that the db should be opened readonly and we should get this changed upstream soon. But: the patch you've provided didn't prevent the symptoms as far as I can see. I can see that rpmtsInitIterator() calls rpmtsOpenDB(ts, ts->dbmode) and I've tried to set the mode to RO with rpmtsSetDBMode() after the ts is created. This didn't help either. I'm told that the readonly locks should be cleaned up automatically so either of the attempted changes should have helped (assuming we didn't screw something up in the patches). I'd like to emphasize that the issue you're seeing is not reproducable reliably. I had to put the system under some load and interrupt the tool very early after it started printing the results to see the symptoms. Maybe you just didn't have enough "luck" after patching. The issue is even more puzzling because there should already be some code handling the sudden interrupts to oscap. I'll have to do some digging around and ask somebody more knowledgable about librmp. Thanks so far. I have some new information: The default mode should already be RO. What makes you think it is opened RW? But even for RO, when running as root, the corruption happens if the probe is suddenly terminated. In the case of hitting ctrl-c, this should be properly handled and needs to be fixed. You didn't provide the output you're getting in step 3. after the corruption happens so I'm only guessing your symptoms are the same as mine. Well, about it being opened in RW it was just a guess, since the applied patch seemed to resolve the issue. Although oscap runs as root, it should use RPMDB only in RO. The error in fact does not occur always; in a slow VM it occurs almost always, though. oscap was aborted using control+c. here's the output [root@seis64 ~]# oscap xccdf eval --profile production-base PTIN-rhel-xccdf.xml Title Ensure /tmp Located On Separate Partition Rule partition_for_tmp Ident CCE-26435-8 Result fail Title Ensure /var/log Located On Separate Partition Rule partition_for_var_log Ident CCE-26215-4 Result pass Title Ensure /home Located On Separate Partition Rule partition_for_home Ident CCE-26557-9 Result pass Title Ensure Red Hat GPG Key Installed Rule ensure_redhat_gpgkey_installed Ident CCE-26506-6 Result pass Title Verify File Hashes with RPM Rule rpm_verify_hashes Ident CCE-27223-7 Result ^C [root@seis64 ~]# rpm -qi openscap rpmdb: Thread/process 13300/140619998598912 failed: Thread died in Berkeley DB library error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Packages index using db3 - (-30974) error: cannot open Packages database in /var/lib/rpm rpmdb: Thread/process 13300/140619998598912 failed: Thread died in Berkeley DB library error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Packages database in /var/lib/rpm package openscap is not installed [root@seis64 ~]# Thanks, that's the same error that I see. We'll need to look into handling the C-c case cleanly. Perhaps oscap should set a signal handler and issue a command to probes to terminate cleanly? IIRC, there should already be such a code, but something is apparently amiss. librpm has it's own way of handling signals which can possibly interfere with our handlers but after a quick glance over the code this doesn't seem to be the case. (In reply to Steve Grubb from comment #9) > Perhaps oscap should set a signal handler and issue a command to probes to > terminate cleanly? There is a signal handler implemented as a thread that calls sigwaitinfo. Signals in all other threads are blocked. On Linux, we use prctl(PR_SET_PDEATHSIG, SIGTERM) in probes to make sure that they get a shutdown signal if the parent dies and doesn't close the comm. channels to them explicitly. However, if you hit ^C during an evaluation of a object which touches the RPM database, then the associated probe_main() thread in the probe process gets cancelled by the shutdown procedure (using pthread_cancel). It looks like that the DB corruption is caused by DB iterators created inside the probe_main() thread which were not correctly destroyed because of killing the thread. I've tried to add pthread_setcancelstate calls to create a critical section around the code that uses these iterators and it seems that it fixes the problem. I've also tried to use rpmtsDBVerify() and rpmtsDBRebuild() in the probe_fini() function (which we call after we kill the probe_main() thread and before we exit()) but that didn't do anything. rpmCheckTerminate() didn't help either. The last option I can think of is using pthread_cleanup_push to register a cleanup function for the opened iterators. I was looking in the oscap source and could not find any calls to sigaction() to install a signal handler (sigterm and sigchild). What I was thinking was that it would catch the signal, call a shutdown function in the library, it would send sigterm or a communication message that says shutdown, the probe would wrap things up safely and quit, the oscap program would reap the child processes and exit. Fixed upstream: https://git.fedorahosted.org/cgit/openscap.git/commit/?id=b648d6752a7a804d02c75138ea5e6069edb1e30c https://git.fedorahosted.org/cgit/openscap.git/commit/?id=180552c8f85b52c75757b6200a4a42bbb6312551 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1590.html |