Bug 1252382

Summary: abrt-addon-kernel-oops keeps attempting to report the same oops
Product: [Fedora] Fedora Reporter: Berend De Schouwer <berend>
Component: abrtAssignee: abrt <abrt-devel-list>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 22CC: abrt-devel-list, berend, dvlasenk, frank, iprikryl, jfilak, mhabrnal, michal.toman, mmilata
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-19 17:28:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Berend De Schouwer 2015-08-11 09:42:37 UTC
Description of problem:

abrt-dump-journal-oops keeps reporting an oops.  It's an old oops.  It's stuck in an infinite loop.

It keeps logging the oops, writing files to /var/spool/abrt/, which abrtd is deleting (I've set DropNotReportableOopses)


Version-Release number of selected component (if applicable):

abrt-addon-kerneloops-2.6.1-2.fc22.x86_64


How reproducible:

Dunno.

The oops was generated by kernel 4.1.3-201, but I've since regressed to 4.0.8-300 to prevent the infinite loop (which didn't help.)

The oops doesn't have enough information, which is why abrtd is deleting them continuously.


Steps to Reproduce:
1. generate oops (?)
2. wait a bit for abrt-dump-journal-oops to start
3.

Actual results:

Report oops once to /var/spool/abrt/


Expected results:

Infinite oops-es.


Additional info:

It started with 4.1.3-201.  oops*/kernel for new reports still says 4.1.3-201.  

I have rebooted into 4.0.8-300 to check if they were real oops-es.  The reports in /var/spool/abrt still say 4.1.3-201.

I have removed /var/lib/abrt/abrt-dump-journal-oops.state to see if it would stop, it doesn't help.

I have saved a copy of abrt-dump-journal-oops.state.  A new one is created with the exact same information (broken journal cursor ???)

running abrt-dump-journal-oops manually keeps specifying two oops-es.  -v-v-v doesn't ever say: delete.

I've looked for Oops-es using
journalctl SYSLOG_IDENTIFIER=kernel
and I can't find any.

There are several thousand of these.  All for a kernel paging failure in exactly the same place.  All with minimal information (no stacktrace, backtrace, or anything else.)

On reboot, abrtd tries to run several thousand abrt-handle-event processes simultaneously (until I set DropNotReportableOopses)

I've now run 'systemctl disable abrt-oops' to prevent this from continuing, but that's not a long-term solution.

Comment 1 Berend De Schouwer 2015-08-11 09:44:43 UTC
Sorry:


Actual results:

Infinite oops-es.


Expected results:

Report oops once to /var/spool/abrt/

Comment 2 Berend De Schouwer 2015-08-11 09:45:18 UTC
"watch cat abrt-dump-journal-oops.state" never changes.

Comment 3 Jakub Filak 2015-08-11 10:55:46 UTC
Can you please try to stop abrt-oops service, remove /var/lib/abrt/abrt-dump-journal-oops.state and start abrt-oops service. The tool should start processing journal from the end, so it should skip all the previous kernel oopses. Please report the results here.

Comment 4 Berend De Schouwer 2015-08-11 13:12:05 UTC
It creates an identical abrt-dump-journal-oops.state (same contents)

It continues reporting oops-es

They're still from the previous kernel.  abrt-server tries to match it with kernel-core-booted.

ok, a few hours later, it's stopped.

Comment 5 Berend De Schouwer 2015-08-12 07:32:14 UTC
Possibly related:

journalctl doesn't go to the end of the log ??  Paging doesn't reach the end.  Possibly corrupt files.

Snippet:

-- Logs begin at Sun 2015-05-31 15:18:16 SAST, end at Wed 2015-08-12 09:20:45 SAST. --
May 31 15:18:16 localhost.localdomain systemd[1645]: Reached target Sockets.
May 31 15:18:16 localhost.localdomain systemd[1645]: Starting Sockets.

but paging to the end using less (journalctl by default uses less):

Jul 22 14:44:06 sieve-deschouwer-co-za gdb[26608]: detected unhandled Python exception
Jul 22 14:44:20 sieve-deschouwer-co-za gnome-session[2094]: (gnome-settings-daemon:2228): GnomeDesktop-WARNING **: Error setting property 'PowerSaveMode' on interface org.gnome.Mutter.Displa

No August...


journalctl -f or -e do show August.

/var/log/journal indicates the journal was rotated yesterday, and has files for august.

There is a file that was rotated at Jul 22 14:45, so I guess that journalctl can't read past the rotation.

Comment 6 Frank Crawford 2015-12-25 00:11:37 UTC
I've also been bitten by this issue a few times.  Investigating the latest one, it seemed to related to kernel oops, which eventually hung the machine and forced me to do a hard reset.

The actual oops was in a journal file ending with a ".journal~", which indicates that it wasn't closed properly, and so I guess couldn't properly set the cursor position for it.

The only way I could stop the continual reports was to remove the journal file with the oops entry.

This was on F23 with abrt-2.7.1-1.fc23.x86_64 and abrt-addon-kerneloops-2.7.1-1.fc23.x86_64.

Comment 7 Fedora End Of Life 2016-07-19 17:28:19 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.