Red Hat Bugzilla – Bug 1132459
Need more correct cleanup behaviour when /var/spool/abrt is full
Last modified: 2016-11-03 23:05:00 EDT
Description of problem: The original description taken from #sfdc case 01176977 ---- Aug 19 11:25:22 el1191 abrtd: Directory 'ccpp-2014-08-19-11:25:19-2206' creation detected Aug 19 11:25:22 el1191 abrtd: Size of '/var/spool/abrt' >= 1000 MB, deleting 'ccpp-2014-08-19-11:24:35-56940' Aug 19 11:25:22 el1191 abrt[4235]: /var/spool/abrt is 1348804852 bytes (more than 1279MiB), deleting 'ccpp-2014-08-19-11:24:35-56940' ---- I can imagine that when abrtd is removing directories because the size goes over a specific treshold, but at the same time abrt processes are working with files/directories, the abrt scripts fail at different locations depending on specific timing issues. So I do understand that there is a mechanism to prevent the filesystem to get full, but I think the abrt python scripts should not generated unhandled exceptions causing new crashes. It might be as simple as catching these specific exceptions, and logging the event to syslog before exiting with an error and error message. All in all that wouldn't be a dramatic change to what is happening on right now, with the added value that these events are being logged and no new crashes are being initiated. Version-Release number of selected component (if applicable): abrt-2.0.8-21.el6.x86_64 abrt-addon-ccpp-2.0.8-21.el6.x86_64 abrt-addon-kerneloops-2.0.8-21.el6.x86_64 abrt-addon-python-2.0.8-21.el6.x86_64 abrt-cli-2.0.8-21.el6.x86_64 abrt-libs-2.0.8-21.el6.x86_64 abrt-tui-2.0.8-21.el6.x86_64 How reproducible: Always, when size of /var/spool/abrt directory is bigger than predefined threshold. Steps to Reproduce: 1. Create so many crashes that dir size will be bigger than predefined threshold. 2. Try to crash app again. Actual results: The abrt process will try to delete actual crashdump directory and run sosreport at the same time. Expected results: If we need a cleanup for that directory just delete oldest crasdump, not the latest one.
I have developed a test reproducing the unwanted behavior and I found out that this bugzilla bug is still not fixed: https://github.com/abrt/abrt/commit/4837a0a1aeb8c78b4bc4e07326de48daf8e63f8b
Fixed upstream: https://github.com/abrt/abrt/pull/1164
Created attachment 1186746 [details] Patch 1/2: daemon: trigger dump location cleanup after detection
Created attachment 1186747 [details] Patch 2/2: handle-event: stop creating post-create lock
*** Bug 1369433 has been marked as a duplicate of this bug. ***
There is a little problem with the patches. If abrt-server receives a path that is not in canonical form, abrtd fails to remove such a process from its queue if the relevant problem directory is removed due to the size limit restrictions.
Upstream commit fixes the problem with canonical paths: https://github.com/abrt/abrt/pull/1176/commits/d0a35daa628652aa83e7b890051b32bd35402ec8
Created attachment 1195193 [details] Patch 1/1: daemon: send base names from abrt-server to abrtd
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2307.html