Bug 2177153 - MaxCrashReportsSize claims to be 5000 MB but is 1000 MB
Summary: MaxCrashReportsSize claims to be 5000 MB but is 1000 MB
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: abrt
Version: 38
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: abrt
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F38FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2023-03-10 08:15 UTC by lnie
Modified: 2023-04-01 00:17 UTC (History)
15 users (show)

Fixed In Version: abrt-2.16.1-1.fc38
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-04-01 00:17:18 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
screencast (735.73 KB, video/webm)
2023-03-10 08:15 UTC, lnie
no flags Details
journal (362.79 KB, text/plain)
2023-03-10 08:16 UTC, lnie
no flags Details
screencast (1021.41 KB, video/webm)
2023-03-16 01:34 UTC, lnie
no flags Details
journal (318.33 KB, text/plain)
2023-03-16 01:35 UTC, lnie
no flags Details
screenshot (136.84 KB, image/png)
2023-03-20 04:51 UTC, lnie
no flags Details
screenshot (173.75 KB, image/png)
2023-03-20 04:53 UTC, lnie
no flags Details
journal for screenshot2 (748.97 KB, text/plain)
2023-03-20 04:54 UTC, lnie
no flags Details
reproduce screencast on a newly created machine(maxsize is the default 5G) (2.37 MB, video/webm)
2023-03-20 07:00 UTC, lnie
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github abrt abrt pull 1633 0 None open Fix integer overflow for MaxCrashReportsSize 2023-03-27 08:01:29 UTC

Description lnie 2023-03-10 08:15:28 UTC
Created attachment 1949495 [details]
screencast

//EDIT: Please see comment 25 for a discovered root cause of this.


Created attachment 1949495 [details]
screencast

Description of problem:
As shown in the attached screencast,there are coreutils problem and gnome-shell listed,but after I created one known gnome-shell crash,the coreutils problem is erased,and the count number for gnome-shell crash is 1,which should be 2.    

Version-Release number of selected component (if applicable):
gnome-abrt-1.4.2-4.fc38.x86_64


How reproducible:
always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 lnie 2023-03-10 08:16:00 UTC
Created attachment 1949496 [details]
journal

Comment 2 Fedora Blocker Bugs Application 2023-03-14 13:40:10 UTC
Proposed as a Blocker for 38-final by Fedora user lnie using the blocker tracking app because:

 violates this criteria:
For all release-blocking desktop / arch combinations, the following applications must start successfully and withstand a basic functionality test.

Plus:Erase data sounds like a serious problem

Comment 3 Kamil Páral 2023-03-14 14:42:47 UTC
$ grep abrt journal.txt | grep -i size
Mar 10 02:59:57 fedora abrtd[693]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'ccpp-2023-03-10-02:33:24.72823-10404'
Mar 10 03:00:49 fedora abrtd[693]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'ccpp-2023-03-10-02:59:56.614799-13965'
Mar 10 03:04:20 fedora abrtd[693]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'ccpp-2023-03-10-03:00:48.754161-12450'

Have you tried to increase the default size of 5GB for crashes to something bigger? The older reports seem to be removed because your /var/spool/abrt is oversize. That's expected behavior.

Comment 4 Michael Catanzaro 2023-03-14 16:34:06 UTC
I think this fails the basic functionality test, but it's not new behavior: ABRT has worked like this for a decade or longer.

It's reasonable for ABRT to clear old core dumps after some size limit is exceeded, but it's not reasonable for it to completely delete all data from old reports and then hide the old report from the ABRT UI without informing the user that older reports will be disappearing. That's just confusing.

Comment 5 lnie 2023-03-15 03:00:28 UTC
(In reply to Kamil Páral from comment #3)
> $ grep abrt journal.txt | grep -i size
> Mar 10 02:59:57 fedora abrtd[693]: Size of '/var/spool/abrt' >= 5000 MB
> (MaxCrashReportsSize), deleting old directory
> 'ccpp-2023-03-10-02:33:24.72823-10404'
> Mar 10 03:00:49 fedora abrtd[693]: Size of '/var/spool/abrt' >= 5000 MB
> (MaxCrashReportsSize), deleting old directory
> 'ccpp-2023-03-10-02:59:56.614799-13965'
> Mar 10 03:04:20 fedora abrtd[693]: Size of '/var/spool/abrt' >= 5000 MB
> (MaxCrashReportsSize), deleting old directory
> 'ccpp-2023-03-10-03:00:48.754161-12450'
> 
> Have you tried to increase the default size of 5GB for crashes to something
> bigger? The older reports seem to be removed because your /var/spool/abrt is
> oversize. That's expected behavior.

Nope,but just like Michael said,I don't think it's reasonable for it to completely delete *ALL* data from old reports,
as users may not get a chance to report the bugs before bug data is erased, and I do think if the old data is created in,like, one day,
users should be informed that it is going to be deleted when the contents of that path is oversize.

> I think this fails the basic functionality test, but it's not new behavior: ABRT has worked like this for a decade or longer.

I haven't seen this before,I mean,all old data are deleted at once.

Comment 6 Kamil Páral 2023-03-15 10:14:35 UTC
> I don't think it's reasonable for it to completely delete *ALL* data from old reports

But it's not doing that. I just tested it. I set a limit of 2 GB and slowly crashed applications one by one, seeing /var/spool/abrt grow. Any time it hits the max size, it deletes one directory (or more, if it needs more space). It doesn't delete all of them. It seems to pick the largest directories first. So it's behaving quite reasonably, I think.

I'm not sure why in Lili's case the directory is oversize at the very moment a new crash happens. Lili, look into the directory, is there something weird occupying a lot of space? Is there some directory present which isn't shown in abrt gui or abrt-cli output?

Or perhaps when the whole session crashed including firefox, the crash data were so large? Try to crash just a gnome-calculator or something similarly small, it should behave as expected. Can you confirm?

Comment 7 lnie 2023-03-16 01:33:24 UTC
>  Lili, look into the directory, is there something weird occupying a lot of space? Is there some directory present which isn't shown in abrt gui or abrt-cli output?
 
I have reinstalled the test system to test other cases,but I remember clearly that there is only one dir left in /var/spool/abrt/.

I think your crash-data-so-large hypothesis is reasonable, so I tried to prove it.But as you can see from the new screencast I attached,the crash data is only about 800M,
and during that process I found the today's firefox bug data  is erased,while the totem crash data created several days ago is kept. 
I reported several gnome-abrt bugs last week,and I do feel gnome-abrt of this version is pretty un-robust.

Comment 8 lnie 2023-03-16 01:34:08 UTC
Created attachment 1951159 [details]
screencast

Comment 9 lnie 2023-03-16 01:35:16 UTC
Created attachment 1951160 [details]
journal

Comment 10 Kamil Páral 2023-03-16 10:35:48 UTC
It's hard to judge when you don't show the directory sizes (the `ls -l` output doesn't display directory sizes, you need to use `du` for that).

Comment 11 lnie 2023-03-17 03:17:33 UTC
Here is the output of the du command,I didn't know that gnome-abrt calculates actually used space.
Just in case,I have firefox crashed in the same way as shown in the screencast,and the data size is only about 363M,way smaller that 5G:

[root@fedora abrt]# du -h 
4.0K	./ccpp-2023-03-10-09:02:27.163881-4767/.libreport
142M	./ccpp-2023-03-10-09:02:27.163881-4767
4.0K	./ccpp-2023-03-16-08:28:03.826891-1756/.libreport
69M	./ccpp-2023-03-16-08:28:03.826891-1756
4.0K	./ccpp-2023-03-16-11:31:31.634567-3081/.libreport
363M	./ccpp-2023-03-16-11:31:31.634567-3081
572M	.
[root@fedora abrt]# cat ccpp-2023-03-16-11\:31\:31.634567-3081/cmdline 
[root@fedora abrt]# cat ccpp-2023-03-16-08:28:03.826891-1756/cmdline
/usr/libexec/gnome-shell-calendar-server[root@fedora abrt]#

Comment 12 Michael Catanzaro 2023-03-17 14:16:23 UTC
Proposal: crashes should remain visible in gnome-abrt UI for at least 6 months after the crash occurs.

It's OK to free disk space by removing core dumps or big logs, even though that makes it impossible to report a crash. It's not OK to hide the crash from the gnome-abrt UI when this happens. That's really confusing and makes it impossible to use gnome-abrt to track crashes.

In my experience, ABRT rarely ever shows more than one crash at a time.

Comment 13 Kamil Páral 2023-03-17 14:54:55 UTC
It's interesting how different our experience is. I very often have 10-20 crashes displayed in ABRT. It's true that I've increased the default 5GB max size to 10GB. But that's just doubling the size, and our experience is different by a larger factor.

Comment 14 Michael Catanzaro 2023-03-17 16:11:51 UTC
So according to coredumpctl I had 8 crashes yesterday and today. All still have core dumps (the big part) present in coredumpctl. Let's see how big they are:

$ coredumpctl info 24683 | grep Size
  Size on Disk: 14.4M
$ coredumpctl info 24861 | grep Size
  Size on Disk: 14.3M
$ coredumpctl info 27318 | grep Size
  Size on Disk: 9.3M
$ coredumpctl info 2721 | grep Size
  Size on Disk: 8.2M
$ coredumpctl info 34581 | grep Size
  Size on Disk: 23.2M
$ coredumpctl info 67917 | grep Size
  Size on Disk: 9.9M
$ coredumpctl info 92457 | grep Size
  Size on Disk: 9.0M
$ coredumpctl info 91138 | grep Size
  Size on Disk: 6.7M

So I don't know what all ABRT is storing, but the cumulative size of these core dumps in coredumpctl is only 95 MB.

Now, six of these crashes were not packaged executables, so I guess it's reasonable for ABRT to not display them. Two of them were packaged by Fedora: /usr/libexec/xdg-desktop-portal-gnome first, and /usr/bin/gnome-text-editor second. I see only the gnome-text-editor crash in gnome-abrt, so I assume the xdg-desktop-portal-gnome crash has already been deleted by ABRT as if it had never existed. (There's not really any way to know whether ABRT ever processed it, is there? I just assume it was processed and then deleted.)

It's reasonable to delete large data if needed, but not to delete all evidence that the crash occurred.

Comment 15 lnie 2023-03-20 04:51:09 UTC
To prove Kamil's many-individual-crashes hypothesis mentioned in blocker ticket,I set maxsize to 30G,and reproduce the firefox and gnome-shell crash I mentioned in comment7,
found that firefox crash data is  Not deleted after whole session crash,but the total crash data is only 1.1G(screenshot1),much smaller than the default 5G.
So I don't think abrt should delete firefox crash data on the default 5G situation.
And during that process I also ran into a similar situation as Michael mentioned in comment14,abrt starts delete data when there are more than 5 crashes with maxsize set to 30G(screenshot2)
Actually,abrt delete firefox crash data when the third(or maybe fourth) crash is created.

Comment 16 lnie 2023-03-20 04:51:53 UTC
Created attachment 1951895 [details]
screenshot

Comment 17 lnie 2023-03-20 04:53:05 UTC
Created attachment 1951896 [details]
screenshot

Comment 18 lnie 2023-03-20 04:54:09 UTC
Created attachment 1951898 [details]
journal for screenshot2

Comment 19 lnie 2023-03-20 06:58:16 UTC
Abrt starts to delete data when there is many space left issue is reproducible 100% to me,would you please confirm it, Kamil?On no-gnome-shell-crash situation,abrt does delete crash data one or two at a time just as you said in #comment6,but look into  /var/spool/abrt directory you will find that it starts to delete data when there is much space left.

Comment 20 lnie 2023-03-20 07:00:16 UTC
Created attachment 1951916 [details]
reproduce screencast on a newly created machine(maxsize is the default 5G)

Comment 21 Kamil Páral 2023-03-20 11:03:37 UTC
Hey @msrb , can you please have a look on what's going on here?

Comment 22 Adam Williamson 2023-03-20 15:46:58 UTC
aren't reports *moved* from /var/spool/abrt to...somewhere else...quite quickly after creation? I wonder if there's some problem with that on lnie's system?

Comment 23 Adam Williamson 2023-03-20 23:44:43 UTC
Discussed at 2023-03-20 blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2023-03-20/f38-blocker-review.2023-03-20-16.00.html . We agreed that, for now, this is rejected as a blocker as there doesn't seem to be clear reproduction of a sufficiently serious problem to violate the release criteria. If further testing provides a better indication that there's a real problem here that may hit multiple users, we can reconsider the decision.

Comment 24 lnie 2023-03-21 07:35:58 UTC
>aren't reports *moved* from /var/spool/abrt to...somewhere else...quite quickly after creation? 

I don't think that reasonable,but just in case I searched the disappeared ccpp dir,no,it's not moved but deleted.

>I wonder if there's some problem with that on lnie's system?

I tested on the two local bare metal machines with Fedora-Workstation-Live-x86_64-38-20230318.n.0.iso installed,it's 100% reproducible.

here is the reproduce steps:
1)pkill -SEGV firefox 
2)pkill -SEGV sleep
3)pkill -SEGV nautilus
4)pkill -SEGV gnome-photos
5)pkill -SEGV gnome-calendar
6)pkill -SEGV gnome-software
7)pkill -SEGV gnome-clocks
You can have the apps crashed in the turn you like,of course.
You will see firefox crash data is removed on step 3) or maybe 4)
start from step 5) you will find abrt starts to delete crash data on default or large number situation mentioned below.

We do have several  problems here:
on default max size or large number(say more than 25G,I didn't try smaller to avoid the argument about space not enough):
1)abrt starts to delete data when there are tons of space left,and the old crash data+ new crash data < way smaller than the max size
2)abrt delete *all* the data if the new crash data is larger than 1G(I don't know the exact number,gnome-photos crash data on my t490s system is 1.2G sometimes,and I'm able to reproduce this issue all the time with that crash),and I think that maybe the original issue I ran into,

on small size situation(say 1G,or 2G)  abrt seems work well,just as mentioned in #comment6
but even on that situation,there is one small problems
3)abrt will delete the newest  crash data if its size is larger than all the other  crash data(and the new+old data > max size),which means users will not be able to report the new issue
4)okay,actually a little similar with the above issue,abrt will delete largest old crash data even it is created later than the second largest one,and just a little bigger,say 60M.
I'm not sure about this one,I don't think abrt should delete second old data instead of the oldest one.

I do think it is a big problem when users try to keep more crash data by setting the maxsize larger but they get a opposite result.

Adam,Kamil,and maybe others, would you please confirm?Thanks.

Hi @msrb,  would you please check?Thanks.

Comment 25 Kamil Páral 2023-03-21 12:51:14 UTC
OK, I believe I've found the problem. In an out-of-box scenario, where /etc/abrt/abrt.conf doesn't contain any value configuration, the MaxCrashReportsSize is supposed to be 5000 MB, but it actually is 1000 MB. Even though the abrtd printouts in journal still claim 5000 MB.

I can consistently reproduce that by crashing one program after another, /var/spool/abrt overall size raises towards the 1000 MB limit, but when it would exceed this value, abrtd starts to delete older crashes, in order to keep it under 1000 MB. There's some complex determination which directory to delete [1], but in reality it usually deletes the largest one, and if that's not enough, another one, etc, until everything fits under 1000 MB again.

But if I edit /etc/abrt/abrt.conf and explicitly add:
MaxCrashReportsSize = 2000

then my /var/spool/abrt can have 1.9 GB in size, and only then some directory gets deleted, as expected. If the overall size is e.g. 1.5 GB and I comment out MaxCrashReportsSize from abrt.conf, and cause another crash (an extremely small one, e.g. crashing `sleep` or `cat`), it immediately removes as many dirs as needed to go under 1000 MB.

The default is clearly not 5000 MB, as documented.

For the record, my testing was done in a VM with 20 GB disk containing 14 GB free space, in case it matters.


[1] https://github.com/abrt/libreport/blob/865db9ecd8f85e4ac50c87c63ee2062546ec74e8/src/lib/dirsize.c#L122

Comment 26 Kamil Páral 2023-03-21 13:11:12 UTC
I can reproduce the same problem on my desktop with 160 GB of free space, so this is not affected by a limited space in a VM.

PLEASE NOTE: Make sure to restart abrtd.service after editing abrt.conf, otherwise the changes are sometimes not detected.

Reproposing for a blocker discussion. A 1 GB space for crashes is really quite low, some crashes consume 300-600 MB, especially web browsers or apps that rely on web tech. It might happen that you only fit 2 crashes in that space, or just 1, and the second one will make the first one deleted. If the whole session crashes, it might happen that follow-up app crashes (due to the environment disappearing) overwrite the primary crash. We can then really discuss whether this violates a basic app functionality.

The easiest fix here is to provide the value in abrt.conf until the root cause is found and fixed.

Comment 27 lnie 2023-03-21 13:14:35 UTC
Thanks Kamil for confirm this^^,TBO,I also considered the possibility that maybe the default value is much smaller than documented,
but the next second I denied it myself,as I set maxsize to 30G on a VM with 120G disk space when I reproduce the issue I mentioned on #comment15,
Just to be clear, I reproduced the four issues I mentioned in #comment24 on bare metal machine which has more than 200G disk space.

Comment 28 Michal Srb 2023-03-22 10:20:18 UTC
Thanks for the thorough analysis everyone!

I just experienced this strange behavior on my laptop as well. The default value of 5000 MB is hardcoded in the source code, but it doesn't seem to be properly honored. I suspect that the problem might be an integer overflow somewhere (5gb in bytes doesn't fit into 32 bit unsigned integer). But that is just a wild guess. I will take a closer look later this week ;)

Comment 29 Jonathan Haas 2023-03-24 10:32:14 UTC
That sounds likely. Here are some locations where the 5000 (or whatever the setting is) is multiplied by (1024*1024): 

https://github.com/abrt/abrt/blob/ca4e79b1e2e1016e282e5d27bea0ce8c31f814d7/src/daemon/abrtd.c#L176
https://github.com/abrt/abrt/blob/ca4e79b1e2e1016e282e5d27bea0ce8c31f814d7/src/dbus/abrt-dbus.c#L621

which can probably overflow, as there are only ints involved.

Comment 30 Michal Srb 2023-03-27 08:03:13 UTC
(In reply to Jonathan Haas from comment #29)
> That sounds likely. Here are some locations where the 5000 (or whatever the
> setting is) is multiplied by (1024*1024): 
> 
> https://github.com/abrt/abrt/blob/ca4e79b1e2e1016e282e5d27bea0ce8c31f814d7/
> src/daemon/abrtd.c#L176
> https://github.com/abrt/abrt/blob/ca4e79b1e2e1016e282e5d27bea0ce8c31f814d7/
> src/dbus/abrt-dbus.c#L621
> 
> which can probably overflow, as there are only ints involved.

Yep. Spot on. I opened a pull request.

Comment 31 Geoffrey Marr 2023-03-28 05:20:35 UTC
Discussed during the 2023-03-27 blocker review meeting: [0]

The decision to delay the classification of this as a blocker bug was made as we don't have a clear enough vote here. We'll delay the decision in the hopes of getting more votes and/or a fix.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2023-03-27/f38-blocker-review.2023-03-27-16.00.txt

Comment 32 okokbb 2023-03-28 21:38:59 UTC Comment hidden (spam)
Comment 33 Fedora Update System 2023-03-30 10:12:35 UTC
FEDORA-2023-eb09dd6406 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-eb09dd6406

Comment 34 Kamil Páral 2023-03-30 10:53:39 UTC
(In reply to Fedora Update System from comment #33)
> FEDORA-2023-eb09dd6406 has been submitted as an update to Fedora 38.
> https://bodhi.fedoraproject.org/updates/FEDORA-2023-eb09dd6406

Fixes the problem, the allocated disk space is now actual 5GB.

Comment 35 Fedora Update System 2023-03-31 01:46:16 UTC
FEDORA-2023-eb09dd6406 has been pushed to the Fedora 38 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-eb09dd6406

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 36 Fedora Update System 2023-04-01 00:17:18 UTC
FEDORA-2023-eb09dd6406 has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.