Bug 1043670

Summary: abrt-plugin-ccpp uses inefficient journalctl grepping to find 'relevant' log messages, can take up a lot of CPU time
Product: [Fedora] Fedora Reporter: Amit Shah <amit.shah>
Component: abrtAssignee: abrt <abrt-devel-list>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: abrt-devel-list, amit, awilliam, dvlasenk, hub+rhbz, iprikryl, johannbg, kparal, lnykryn, mclasen, msekleta, msuchy, peljasz, plautrba, samuel-rhbugs, systemd-maint, vondruch, vpavlin, zbyszek, zman0900
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-30 13:58:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch against abrt git master to try and make this more efficient (*untested*) none

Description Amit Shah 2013-12-16 21:09:53 UTC
Description of problem:

I'm running systemd-208-8.fc20.x86_64.

My last reboot on the laptop was 10 days ago.  In that time, I have had several abrt bugs pop up, and several SELINUX AVCs as well.  I keep suspending / resuming the laptop, which allows me to have this much uptime.  However, that also means the logs from the current session keep growing a lot.

Recently, I've been having at least one bug trigger every suspend/resume cycle, which means on every resume, journalctl takes up 100% of CPU time for a long while as ABRT greps through the logs for the most recent crash.

Can something be done to avoid this scenario?  Like the -b switch, a 'current + 1 session' switch, which only includes information across the suspend/resume cycle is dumped, which should be sufficient for both, sealert as well as ABRT?

Comment 1 Zbigniew Jędrzejewski-Szmek 2013-12-17 04:40:26 UTC
We provide --after-cursor= and --show-cursor. Together they can be used to avoid reading the same messages twice. Tentatively reassigning to abrt.

Comment 2 Jan Pokorný [poki] 2014-01-02 21:05:50 UTC
Might be related: [bug 1035359]

Comment 3 Adam Williamson 2014-07-15 02:18:50 UTC
Robert Day has been reporting something similar on test@ in recent days:

https://lists.fedoraproject.org/pipermail/test/2014-July/121940.html
https://lists.fedoraproject.org/pipermail/test/2014-July/121960.html

I took a look at what exactly abrt is doing here, and it seems to be being extremely inefficient. The relevant commands are in /etc/libreport/events.d/ccpp_event.conf , the code looks like this:

uid=`cat uid` &&
                log="[System Logs]:\n" &&
                log=$log`journalctl -b --system | grep -F -e "$base_executable" | tail -99` &&
                log=$log"\n[User Logs]:\n" &&
                log=$log`journalctl _UID="$uid" -b | grep -F -e "$base_executable" | tail -99` &&
                log=`echo -e "$log"`

it really isn't necessary to read the entire journal for the current boot and then grep for the executable name! 'man systemd.journal-fields' lists a whole bunch of 'TRUSTED JOURNAL FIELDS', which you can use as filters for journalctl output: for instance, 'journalctl -b _SYSTEMD_UNIT=foo.service' shows only log messages from the service foo.service (in the current boot). The one we could obviously use here is _PID:

journalctl -b --system _PID=$pid
journalctl -b _UID=$uid _PID=$pid

obviously you'd have to set $pid by reading the file 'pid' from the problem report. So, something like the attached patch (untested, please check!) should I think do the trick.

Comment 4 Adam Williamson 2014-07-15 02:21:39 UTC
Created attachment 918047 [details]
patch against abrt git master to try and make this more efficient (*untested*)

Comment 5 Adam Williamson 2014-07-15 02:31:53 UTC
hrm, looks like that doesn't catch as much stuff as just grepping for the executable name, unfortunately:

[adamw@adam abrt (master %)]$ sudo journalctl -b _PID=11452
-- Logs begin at Tue 2014-02-25 16:20:11 PST, end at Mon 2014-07-14 19:31:09 PDT
[adamw@adam abrt (master %)]$ sudo journalctl -b | grep gcm-viewer
Jul 14 10:15:12 adam.happyassassin.net org.gnome.ControlCenter.SearchProvider[2127]: (gcm-viewer:11452): GLib-GIO-CRITICAL **: g_file_new_for_path: assertion 'path != NULL' failed
Jul 14 10:15:12 adam.happyassassin.net org.gnome.ControlCenter.SearchProvider[2127]: (gcm-viewer:11452): libcolord-CRITICAL **: cd_icc_load_file: assertion 'G_IS_FILE (file)' failed
Jul 14 10:15:12 adam.happyassassin.net org.gnome.ControlCenter.SearchProvider[2127]: (gcm-viewer:11452): GLib-GObject-CRITICAL **: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
Jul 14 10:15:12 adam.happyassassin.net kernel: gcm-viewer[11452]: segfault at 8 ip 00000000004095fb sp 00007fff03d1cac0 error 4 in gcm-viewer[400000+14000]
Jul 14 10:15:13 adam.happyassassin.net abrt-hook-ccpp[11455]: Saved core dump of pid 11452 (/usr/bin/gcm-viewer) to /var/tmp/abrt/ccpp-2014-07-14-10:15:13-11452 (28209152 bytes)

obviously those useful messages didn't actually come *from* PID 11452. Drat. Still, there has to be a more efficient way of doing this.

Comment 6 Zbigniew Jędrzejewski-Szmek 2014-07-15 13:44:03 UTC
The problem here is that gnome logging is all mixed into a single stream. This is something to work on, but I don't think there's an easy solution.

I want to add a journalctl switch to filter by user session. Something like --boot, but it would filter by the logind sessions of a user. I haven't figured out the details, e.g. how to deal with overlapping sessions. But it might help here.

Comment 7 Adam Williamson 2014-07-15 14:51:37 UTC
isn't there a more efficient way to parse the journal than literally grepping the output of journalctl? Maybe not, I guess it just *feels* like there should be :)

Comment 8 Zbigniew Jędrzejewski-Szmek 2014-07-15 14:54:28 UTC
(In reply to Adam Williamson (Red Hat) from comment #7)
> isn't there a more efficient way to parse the journal than literally
> grepping the output of journalctl? Maybe not, I guess it just *feels* like
> there should be :)
Yes, there is. You can use the metadata, i.e. the additional fields. But once you lose the metadata, there's no going back.

Comment 9 Matthias Clasen 2014-07-16 17:56:06 UTC
(In reply to Adam Williamson (Red Hat) from comment #7)
> isn't there a more efficient way to parse the journal than literally
> grepping the output of journalctl? Maybe not, I guess it just *feels* like
> there should be :)

Not if you insist that your logging backend is a line of shell in a config file...

Comment 10 Zbigniew Jędrzejewski-Szmek 2014-07-16 20:52:34 UTC
*** Bug 1120405 has been marked as a duplicate of this bug. ***

Comment 11 Jakub Filak 2014-10-23 06:10:20 UTC
Maybe we can add --since=$START_DATE to journalctl command line arguments. ABRT can parse /proc/$pid/stat and compute the start date from starttime field (22). But, I am not sure whether it would help.

We definitely have to sort this out somehow, because the grepping does not work either. It does not find Xorg log lines because those lines look like the following: 

Oct 23 01:19:18 fedora-rawhide gdm-Xorg-:0[23015]:X.Org X Server 1.16.1

and ABRT greps 'Xorg.bin'. (Adam's patch fixes this issue).

Comment 12 Fedora End Of Life 2015-05-29 10:00:30 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Fedora End Of Life 2016-11-24 11:05:18 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 14 Adam Williamson 2016-11-24 22:43:28 UTC
This seems to have changed a bit in current abrt. It's still doing journal grepping and it's still kinda horrible, but it now only looks at the last 3 minutes, I think:

                user_log=`journalctl -b --since=-3m -n 99 _COMM="$base_executable" _UID="$uid"` &&
                test -n "$user_log" && printf "User Logs:\n%s\n" "$user_log" >$DUMP_DIR/var_log_messages
                # Do not use '&&' here because if $user_log is the empty string
                # then the script does not continue to get the system logs
                {
                    # Remove the line below if you don't mind sharing data from the
                    # system logs with unprivileged users -> bugzilla.redhat.com/1212868
                    false &&
                    system_log=$log`journalctl -b --since=-3m --system -n 99 _COMM="$base_executable"` &&
                    test -n "$system_log" && printf "System Logs:\n%s\n" "$system_log" >>$DUMP_DIR/var_log_messages

so we could probably close this? (of course, this seems like weird code, since people frequently don't report crashes within three minutes of them occurring...)

Comment 15 Samuel Sieb 2016-11-25 06:48:24 UTC
Isn't that called by abrt when the crash happens and stored for later?  The problem was that when abrt was triggered, this grepping would happen and the computer slows to a crawl during that time.

Comment 16 Zbigniew Jędrzejewski-Szmek 2016-11-25 14:49:45 UTC
Let's keep this open until some fix appears.

Comment 17 Miroslav Suchý 2020-06-30 13:58:13 UTC
ccpp hook has been removed. This has been resolved by the age of this ticket.