Bug 562696 - Keeps repeating dump of kerneloops in /var/cache/abrt
Summary: Keeps repeating dump of kerneloops in /var/cache/abrt
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: abrt
Version: 12
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Jiri Moskovcak
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-02-08 02:09 UTC by Joergen Thomsen
Modified: 2015-02-01 22:51 UTC (History)
9 users (show)

Fixed In Version: abrt-1.0.7-1.fc12
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-08-03 14:15:07 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
/var/log/messages (2.18 MB, text/plain)
2010-02-10 16:43 UTC, Joergen Thomsen
no flags Details
first trace (17.56 KB, text/plain)
2010-02-11 19:49 UTC, Joergen Thomsen
no flags Details
second trace (18.84 KB, text/plain)
2010-02-11 19:49 UTC, Joergen Thomsen
no flags Details

Description Joergen Thomsen 2010-02-08 02:09:13 UTC
Description of problem:

There are 3 kerneloops in /var/log/messages
Every 120 seconds they are entered into /var/cache/abrt as new dumps.
The only difference is the time.
I have uploaded them with the GUI and they are apparently identified as identical and uploaded, but still reproducing themselves.

Version-Release number of selected component (if applicable):
1.0.4-1.fc12

How reproducible:

Keeps going on after several reboots also with another kernel version.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Denys Vlasenko 2010-02-08 10:28:40 UTC
I think you can blame me on this, but on the bright side, it is fixed (again) now. Please try abrt-1.0.6 from

http://koji.fedoraproject.org/koji/buildinfo?buildID=154271

Please also attach your /var/log/messages to this bug.

Comment 2 Joergen Thomsen 2010-02-10 16:43:58 UTC
Created attachment 390053 [details]
/var/log/messages

This file makes abrtd generate thousands of identical reports in /var/cache/abtr

Comment 3 Joergen Thomsen 2010-02-10 16:46:00 UTC
I have installed abrt-1.0.6 from the testing repository and it started by deleting the reports, but now I have 3900 reports in /var/cache/abrt !

Comment 4 Denys Vlasenko 2010-02-10 17:18:05 UTC
You can delete /var/cache/abrt, abrtd will recreate ot on restart.

However, our oops dup detector in CAnalyzerKerneloops::GetLocalUUID() is really *way* too simplistic:

        unsigned len = oops.length();
        unsigned hash = len;
        for (unsigned i = 0; i < len; i++)
        {
                hash = ((hash << 5) ^ (hash >> 27)) ^ oops[i];
        }
        hash &= 0x7FFFFFFF;

Looking at your oops, it's obvious code above will generate different hashes for each new oops: pid, addresses etc will be different:

BUG: Dentry ffff88006037e0c0{i=2,n=/} still in use (3) [unmount of cifs cifs]
------------[ cut here ]------------
kernel BUG at fs/dcache.c:669!
invalid opcode: 0000 [#3] SMP.
last sysfs file: /sys/module/hwmon_vid/refcnt
CPU 1.
Modules linked in: cpufreq_ondemand cifs nls_utf8 fuse sunrpc powernow_k8 freq_table ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 microcode ext2 uinput snd_hd
Pid: 3259, comm: umount.cifs Tainted: G      D    2.6.31.12-174.2.3.fc12.x86_64 #1 GA-MA790GPT-UD3H
RIP: 0010:[<ffffffff8110c78c>]  [<ffffffff8110c78c>] shrink_dcache_for_umount_subtree+0x135/0x1f9
RSP: 0018:ffff88019e913e08  EFLAGS: 00010292
RAX: 0000000000000054 RBX: ffff88006037e0c0 RCX: 00000000000037a2
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff88019e913e38 R08: 0000000000000000 R09: ffff88019d93b2e0
R10: ffffffff814569e0 R11: ffff88019d93b2e0 R12: 0000000000000302
R13: ffff88006037e0c0 R14: ffff88011ddd6c60 R15: ffff88019e913f18
FS:  00007fae82f80700(0000) GS:ffff880028051000(0000) knlGS:00000000f77496c0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fae82ad9090 CR3: 0000000195539000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process umount.cifs (pid: 3259, threadinfo ffff88019e912000, task ffff88011439af00)
Stack:
ffff880084cb3288 ffffffff81201614 ffff880084cb3000 ffffffffa034e550
 ffff880084cb3000 ffffffff816f9160 ffff88019e913e58 ffffffff8110c88c
 ffffffff8166e9c0 ffff880084cb3000 ffff88019e913e78 ffffffff810fea44
Call Trace:
[<ffffffff81201614>] ? __down_read_trylock+0x46/0x4e
[<ffffffff8110c88c>] shrink_dcache_for_umount+0x3c/0x4c
[<ffffffff810fea44>] generic_shutdown_super+0x1f/0xc9
[<ffffffff810feb43>] kill_anon_super+0x16/0x4f
[<ffffffff810ff248>] deactivate_super+0x56/0x6e
[<ffffffff81111d3c>] mntput_no_expire+0xb4/0xec
[<ffffffff8111230f>] sys_umount+0x2de/0x30d
[<ffffffff81104575>] ? path_put+0x22/0x27
[<ffffffff81011cf2>] system_call_fastpath+0x16/0x1b
Code: 50 30 4c 8b 0a 31 d2 48 85 f6 74 04 48 8b 56 40 48 05 88 02 00 00 48 89 de 48 c7 c7 6e cc 57 81 48 89 04 24 31 c0 e8 42 ed 30 00 <0f> 0b eb fe 4c 8b 6b 28 4c 39 eb 75 05
RIP  [<ffffffff8110c78c>] shrink_dcache_for_umount_subtree+0x135/0x1f9
RSP <ffff88019e913e08>

Comment 5 Joergen Thomsen 2010-02-10 21:26:33 UTC
Well, as the files in the different kerneloops directories were identical, the simplistic checking should be OK for not generating new reports from the same error in the logfile. There must be a bug, so that you are actually not comparing the 'new' against the old.

Comment 6 Joergen Thomsen 2010-02-10 21:29:39 UTC
Please, observe, that the 4000 reports were generated from only 3 messages in 
/var/log/messages.

Comment 7 Denys Vlasenko 2010-02-11 12:23:44 UTC
I did copy the file you provided to my /var/log/messages, and for me, abrtd created 3 oopses. It does not create more oopses as time passes.

I believe the fix for the bug where abrtd files new oopses at every log scan pass was applied to git in this commit:

commit 640af192338643b3c9e6fbe0304726e951239c2b
Author: Denys Vlasenko <vda.linux>
Date:   Wed Nov 11 19:32:19 2009 +0100

    KerneloopsSysLog: fix breakage in code which detects abrt marker

and 1.0.6 is much newer than this.

If you do observe new oopses being filed at every log scan pass (i.e. every 2 mins or so), please run "killall abrtd", then "abrtd -dvvv 2>&1 | tee LOG", wait for a few kerneloops scans (you will see the corresponding messages appearing), then ^C it and attach resulting LOG file to this bug.

Comment 8 Joergen Thomsen 2010-02-11 19:45:39 UTC
well, hard to make it run constantly now

abrtd: Loaded .conf
abrtd: can't load '/usr/lib64/abrt/lib.so': /usr/lib64/abrt/lib.so: cannot open shared object file: No such file or directory

Comment 9 Joergen Thomsen 2010-02-11 19:49:27 UTC
Created attachment 390330 [details]
first trace

Comment 10 Joergen Thomsen 2010-02-11 19:49:50 UTC
Created attachment 390332 [details]
second trace

Comment 11 Jiri Moskovcak 2010-02-11 20:06:07 UTC
This should be fixed in 1.0.6 which is already in stable repo. SO I'd recommend to update to the latest version.

Jirka

Comment 12 Joergen Thomsen 2010-02-11 21:06:44 UTC
Which I already did.

154112  3 feb 16:09 /usr/sbin/abrtd

That version should immediately be withdrawn.

abrtd: Loaded .conf
abrtd: can't load '/usr/lib64/abrt/lib.so': /usr/lib64/abrt/lib.so: cannot open shared object file: No such file or directory

Comment 13 Jiri Moskovcak 2010-02-11 21:19:11 UTC
You're right, it's fixed after 1.0.6 :-/ The only known workaround for this is to remove the contents of /var/cache/abrt.

Jirka.

Comment 14 Joergen Thomsen 2010-02-11 21:31:45 UTC
With 1.0.6 there is no problem. It terminates itself after 120 seconds due to the missing lib.so file, so only one set of duplicates are generated ;-)

Comment 15 Fedora Update System 2010-02-15 14:08:31 UTC
abrt-1.0.7-1.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/abrt-1.0.7-1.fc12

Comment 16 Fedora Update System 2010-02-18 22:32:16 UTC
abrt-1.0.7-1.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update abrt'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2010-1598

Comment 17 Joergen Thomsen 2010-02-21 20:02:09 UTC
Now repeatedly incrementing the bug count in the database instead of creating directories. It still does not recognize old bugs.

Comment 18 Denys Vlasenko 2010-02-22 13:35:28 UTC
abrt-1.0.9 will have different oops hashing algorithm, making it less prone to generate different hash on similar oopses.

Comment 19 Joergen Thomsen 2010-02-22 22:08:16 UTC
well, I think more focus should be payed to why the ONE and SAME oops is not recognized. We are not talking about similar oopses here. We are talking about the same oops only listed ONCE counted over and over again. The current algorithm seems OK for handling that.

Comment 20 Fedora Update System 2010-02-23 05:38:19 UTC
abrt-1.0.7-1.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 21 Jiri Moskovcak 2010-02-23 09:04:57 UTC
Reopening since there seems to be a problem with oops marker and we detect the same oops repeatedly even if it's listed in the /var/log/messages only once.

Comment 22 Denys Vlasenko 2010-02-23 13:14:11 UTC
(In reply to comment #19)
> well, I think more focus should be payed to why the ONE and SAME oops is not
> recognized. We are not talking about similar oopses here. We are talking about
> the same oops only listed ONCE counted over and over again. The current
> algorithm seems OK for handling that.

Yes. I want to do this. But (as I already said):

I did copy the file you provided to my /var/log/messages, and for me, abrtd
created 3 oopses. It does not create more oopses as time passes.

I am looking at your "abrtd -dvvv" logs. I see that you was bitten by dreaded "can't load '/usr/lib64/abrt/lib.so'" bug.

This is caused by abrtd trying to load plugin with name "" (empty string).
I still do not know what is triggering it, but we already added the code which makes abrtd survive that (and log a lot more information about this bug, which hopefully will allow it to be caught).

Can you retest with newer version? For F12, latest is:

http://koji.fedoraproject.org/koji/buildinfo?buildID=156498

Comment 23 Joergen Thomsen 2010-06-21 09:05:45 UTC
abrt-1.0.9-2.fc12.x86_64

Keeps counting kernel oops as a new crash showing the red icon in Gnome.

In /var/cache/abrt (count has now reached 15 from only one oops)

-rw-r--r-- 1 root root 8192 21 jun 10:38 abrt-db
drwxr-x--- 2 abrt root 4096 21 jun 01:17 kerneloops-1277075799-1
-rw------- 1 root root   13 20 jun 19:18 last-ccpp

kerneloops-1277075799-1 remains unchanged (it has been reported)

Comment 24 Joergen Thomsen 2010-06-21 16:57:23 UTC
-rw-r--r-- 1 root root 8192 21 jun 18:40 abrt-db

We have now reached 26 and still counting.
Amazing how difficult it seems to be to insert

if A = A then do nothing;

;-)

Comment 25 Jiri Moskovcak 2010-07-26 12:33:28 UTC
Hi,
do you still see this problem on your machine with the latest abrt.

J.

Comment 26 Joergen Thomsen 2010-08-03 14:12:33 UTC
version 1.1.1-2.fc12

I haven't seen this problem recently

Comment 27 Jiri Moskovcak 2010-08-03 14:15:07 UTC
(In reply to comment #26)
> version 1.1.1-2.fc12
> 
> I haven't seen this problem recently    

Thank you, closing.


Note You need to log in before you can comment on or make changes to this bug.