The way we are collecting information about a crash can cause the system to hang by consuming too many resources. There are a number of problems here. 1. The process consumes too much CPU 2. The process consumes too much memory and frequently causes the system to swap 3. The process takes far too long and the user is notified many minutes after the problem occurred. This is not acceptable for user visible crashes. 4. In some cases the entire system becomes unresponsive due to swapping and the system is likely to be powered off and data loss may occur. Top output: 29205 root 20 0 857448 1652 1316 R 14.9 0.0 0:31.32 journalctl 61 root 20 0 0 0 0 D 10.6 0.0 1:57.78 kswapd0 PS output: ps -eaf|grep 29204 root 29204 29143 0 13:06 ? 00:00:00 /bin/sh -c if grep '^TracerPid:[[:space:]]*[123456789]' proc_pid_status >/dev/null 2>&1; then # We see 'TracerPid: <nonzero>" in /proc/PID/status # Process is ptraced (gdb, strace, ltrace) # Debuggers have wide variety of bugs where they leak SIGTRAP # to traced process and nuke it. Ignore this crash. echo "The crashed process was ptraced - not saving the crash" exit 1 # abrt will remove the problem directory fi if grep -q ^ABRT_IGNORE_ALL=1 environ \ || grep -q ^ABRT_IGNORE_CCPP=1 environ \ ; then echo "ABRT_IGNORE variable is 1 - not saving the crash" # abrtd will delete the problem directory when we exit nonzero: exit 1 fi # Try generating backtrace, if it fails we can still use # the hash generated by abrt-action-analyze-c ##satyr migration: #satyr abrt-create-core-stacktrace "$DUMP_DIR" abrt-action-generate-core-backtrace # Run GDB plugin to see if crash looks exploitable abrt-action-analyze-vulnerability # Generate hash abrt-action-analyze-c && abrt-action-list-dsos -m maps -o dso_list && ( # Try to save relevant log lines. # Can't do it as analyzer step, non-root can't read log. executable=`cat executable` && base_executable=${executable##*/} && # Test if the current version of journalctl has --system switch journalctl --system -n1 >/dev/null if [ $? -ne 0 ]; then # It's not an error if /var/log/messages isn't readable: test -f /var/log/messages || exit 0 test -r /var/log/messages || exit 0 log=`grep -F -e "$base_executable" /var/log/messages | tail -99` else uid=`cat uid` && log="[System Logs]:\n" && log=$log`journalctl -b --system | grep -F -e "$base_executable" | tail -99` && log=$log"\n[User Logs]:\n" && log=$log`journalctl _UID="$uid" -b | grep -F -e "$base_executable" | tail -99` && log=`echo -e "$log"` fi if test -n "$log"; then printf "%s\n" "$log" >var_log_messages # echo "Element 'var_log_messages' saved" fi ) root 29205 29204 18 13:06 ? 00:00:44 journalctl _UID=0 -b root 29206 29204 0 13:06 ? 00:00:00 grep -F -e plymouth root 29207 29204 0 13:06 ? 00:00:00 tail -99
duplicate of bug 1015922 ?
Doesn't look like a dup of that to me. More like this being terribly inefficient: https://github.com/abrt/abrt/blob/master/src/plugins/ccpp_event.conf
(In reply to Jon McCann from comment #0) > The way we are collecting information about a crash can cause the system to > hang by consuming too many resources. > > There are a number of problems here. > > 1. The process consumes too much CPU > 2. The process consumes too much memory and frequently causes the system to > swap > 3. The process takes far too long and the user is notified many minutes > after the problem occurred. This is not acceptable for user visible crashes. - saving the coredump takes the most of the processing time. ABRT can display the popup before the dump is complete, but the user won't be able to report it until the dump is complete, would that really be a better from the UX perspective? > 4. In some cases the entire system becomes unresponsive due to swapping and > the system is likely to be powered off and data loss may occur. > Can you be more specific? What application crashed and made abrt behave like that?
Saving the coredump may very well take a lot of time. But what we are doing with grepping through a dump of the journal is grossly inefficient. We should certainly try to fix it. There are APIs to search the journal directly and should probably be done from C. In these situations I am seeing journalctl and the ccpp_event.conf script consuming most of the CPU in the output of top.
And I don't think that ABRT needs to search through the whole journal. How about just the last 24 hours?
Hello, there is an ongoing discussion of the "grepping" problem on bug #1043670. Please post your comments and suggestions there.
This message is a reminder that Fedora 20 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 20. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '20'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 20 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.