Bug 902398 - no space left on the device / EXT4 inode depletion caused by flooding abrtd with too frequent abrt-dump-oops
Summary: no space left on the device / EXT4 inode depletion caused by flooding abrtd w...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: abrt
Version: 18
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
Assignee: Denys Vlasenko
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-21 15:15 UTC by Jaromír Cápík
Modified: 2016-02-01 01:57 UTC (History)
9 users (show)

Fixed In Version: gnome-abrt-0.3.1-1.fc18
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-10-02 06:28:54 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
example backtrace (3.85 KB, text/plain)
2013-01-21 15:31 UTC, Jaromír Cápík
no flags Details
cat messages | grep abrt | grep "Jan 21 10" > /tmp/messages-grep-abrt.txt (468.72 KB, text/plain)
2013-01-21 15:33 UTC, Jaromír Cápík
no flags Details

Description Jaromír Cápík 2013-01-21 15:15:54 UTC
Description of problem:
Hello. I recently experienced an issue with no space left on the device caused by ext4 inode depletion. There were almost 100 000 directories starting with "oops-2013-01-20" created in /var/spool/abrt. I asked Jakub Filak to do the problem analysis directly on my system. He flooded abrtd with abrt-dump-oops executed 1000 times in a 'for' cycle. /var/log/messages show, that abrtd wasn't processing the directories at all during the flooding and until it logged the last oops directory creation.


Version-Release number of selected component (if applicable):
abrt-2.0.20-1.fc18.armv7hl

How reproducible:
always

Steps to Reproduce:
while : ; do cat backtrace | abrt-dump-oops -D; done

Comment 1 Jaromír Cápík 2013-01-21 15:25:29 UTC
NOTE : I'm checking the CPU utilization and it seems that abrtd is eating the whole CPU during the detection of new 'oops' directories.

Comment 2 Jaromír Cápík 2013-01-21 15:31:24 UTC
Created attachment 684372 [details]
example backtrace

Comment 3 Jaromír Cápík 2013-01-21 15:33:29 UTC
Created attachment 684373 [details]
cat  messages | grep abrt | grep "Jan 21 10" > /tmp/messages-grep-abrt.txt

Comment 4 Jaromír Cápík 2013-01-24 14:08:10 UTC
The following two files demonstrate what happens in GUI (xfce) ... it becomes unusable:

http://jcapik.fedorapeople.org/files/abrt/Bug_902398/20130124_001.jpg

http://jcapik.fedorapeople.org/files/abrt/Bug_902398/20130124_002.mp4

Comment 5 Jaromír Cápík 2013-01-24 14:23:46 UTC
example koopses:

http://jcapik.fedorapeople.org/files/abrt/Bug_902398/oops.tgz

Comment 6 Jakub Filak 2013-01-24 14:39:07 UTC
(In reply to comment #4)
> The following two files demonstrate what happens in GUI (xfce) ... it
> becomes unusable:
> 
> http://jcapik.fedorapeople.org/files/abrt/Bug_902398/20130124_001.jpg
> 
> http://jcapik.fedorapeople.org/files/abrt/Bug_902398/20130124_002.mp4

Thank you for taking the time to provide these additional files. You have correctly recognized that this bug may cause the flooding of your desktop by ABRT notifications but it is a different issue and it happens only in particular circumstances (e.g. when abrt can't record kernel's package version data). If you have time and want to help us, please file a new bugzilla ticket.

Comment 7 Jaromír Cápík 2013-01-24 16:27:56 UTC
Well. I thought both issues could have a common solution. Something like interval based throttling could solve the problem. I don't know the internal abrt design well, but I believe that something like limiting the number of allowed abrt-dump-oops executions to 10 per minute could solve the problem. Maybe the throttling needs to be implemented in a different layer in order to block the screen flooding with notification. In that case abrtd would have enough time to process all the oops directories and to find duplicities. You can't expect users to report more than 10 issues produced at the same time, but you can expect, that users want to continue using their workstation and that's impossible, when abrt generates tons of notifications, eats the whole CPU and makes the computer slow as hell. In that case any tries to process as many backtraces as possible doesn't make any sense and you should ignore them because you can't reliably process them because of the lack of resources.

Comment 8 Denys Vlasenko 2013-08-20 11:36:57 UTC
Looks like kernel is flooding the log with wrong, or just over-eager WARNINGs. It may be related to bug 888388.

I watched your video and abrt behavior indeed looks bad.

I checked abrt source which was current circa January 2013 and it was generating maximum 16 oops reports per second - which is about what we see with your video.

Current code generates at most 6 oopses per second, and will pause for one additional second for every oops it sees above six.

This should be better than January 2013 behavior, but maybe it's still too aggressive.

How about we add an 1 second sleep after each problem dir creation (of those six we create per log scan)?

This will effectively reduce reporting rate to 1 report/sec.

> You can't expect users to report more than 10 issues produced at the same time, but you can expect, that users want to continue using their workstation and that's impossible, when abrt generates tons of notifications, eats the whole CPU and makes the computer slow as hell.

I fully agree. I in fact was saying this all along: making abrt try to report every problem is counter-productive: "report flood" is much worse than missing some problems.

Comment 9 Jaromír Cápík 2013-08-20 15:08:27 UTC
Hi Denys.
I'm ok with that. One per second is an acceptable rate.

Comment 10 Denys Vlasenko 2013-08-23 12:14:31 UTC
Patches are ready and are under review in abrt ML.

Comment 11 Denys Vlasenko 2013-08-27 12:00:19 UTC
Fixed in abrt git:

commit e45bd71678032333820d47d8f3730b33f2b7690b
Author: Denys Vlasenko <dvlasenk>
Date:   Wed Aug 21 15:20:33 2013 +0200

    abrt-dump-oops: add -t option which slows down problem creation. rhbz#902398.

    Use this option in watch logger services.

Comment 12 Fedora Update System 2013-09-13 13:17:22 UTC
gnome-abrt-0.3.1-1.fc18,abrt-2.1.7-1.fc18,libreport-2.1.7-1.fc18,satyr-0.9-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/gnome-abrt-0.3.1-1.fc18,abrt-2.1.7-1.fc18,libreport-2.1.7-1.fc18,satyr-0.9-1.fc18

Comment 13 Fedora Update System 2013-09-14 02:28:50 UTC
Package gnome-abrt-0.3.1-1.fc18, abrt-2.1.7-1.fc18, libreport-2.1.7-1.fc18, satyr-0.9-1.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing gnome-abrt-0.3.1-1.fc18 abrt-2.1.7-1.fc18 libreport-2.1.7-1.fc18 satyr-0.9-1.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-16676/gnome-abrt-0.3.1-1.fc18,abrt-2.1.7-1.fc18,libreport-2.1.7-1.fc18,satyr-0.9-1.fc18
then log in and leave karma (feedback).

Comment 14 Fedora Update System 2013-10-02 06:28:54 UTC
gnome-abrt-0.3.1-1.fc18, abrt-2.1.7-1.fc18, libreport-2.1.7-1.fc18, satyr-0.9-1.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.