Bug 956730

Summary: [abrt] BUG: unable to handle kernel NULL pointer dereference at (null)
Product: [Fedora] Fedora Reporter: Rodney <bugzilla>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: bugzilla, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: abrt_hash:99bb16dc588d13820b558ebaf679d563888dae0d
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-27 16:09:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
File: dmesg none

Description Rodney 2013-04-25 13:48:57 UTC
Description of problem:
This happened while running apache nutch (a java program).  In this particular
case, the solrindex option was being executed.

I cannot reproduce this problem at will; however, nutch usually seems to
trigger it within 24 hours.

Java from java-1.7.0-openjdk is being used here.  I just installed the latest
version of that package this morning (after this particular instance of this
problem), but the problem has continued through a few recent versions of the
java package.

When this problem happens, it hangs one of the linux processes running an
instance of the java jvm.  I don't know whether it's useful information, but
the overall procedure involves starting up a jvm, executing for a while 
(usually no more than 10 or 15 minues), shutting down the jvm, and repeating.

This is possibly getting way off topic, but nutch is capable of running in a
hadoop cluster.  I have not explicitly configured any such thing and I do not
know how similarly the default configuration behaves to a hadoop configuration
nor whether it is even relevant to this problem.

This instance of linux is being run as a guest on a fedora 17 host.  The
entire host has locked up completely a few times since I've been running nutch
as well, but that doesn't usually happen when this kernel bug happens, so I
don't know whether it's related.

Additional info:
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8118f4f4>] do_huge_pmd_wp_page+0x684/0xc20
PGD 11893d067 PUD 119242067 PMD 0 
Oops: 0000 [#1] SMP 
Modules linked in: ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd i2c_piix4 soundcore virtio_balloon i2c_core virtio_net microcode virtio_blk
CPU 1 
Pid: 23373, comm: java Not tainted 3.8.8-202.fc18.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffff8118f4f4>]  [<ffffffff8118f4f4>] do_huge_pmd_wp_page+0x684/0xc20
RSP: 0018:ffff880117dd7cd8  EFLAGS: 00010246
RAX: ffff880110d07000 RBX: ffff8801167a8000 RCX: 0000000000a131fa
RDX: 0000000000a131f9 RSI: 00000000000000d0 RDI: ffff880110d07000
RBP: ffff880117dd7d68 R08: 0000000000016c20 R09: 00007f42a2b25000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880117834e60
R13: ffff880110d07000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f42a4dfd700(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000118a32000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process java (pid: 23373, threadinfo ffff880117dd6000, task ffff8801194b8000)
 ffff880117dd7d08 00007f42a2b25000 ffff880118a2b8a8 ffff880110d07000
 ffff880118a2b8a8 ffff880110d07000 00007f42a2b25000 ffffffff8118ff2d
 8000000031c000a5 00007f42a2a00000 00007f42a38aae88 ffffea00045828c0
Call Trace:
 [<ffffffff8118ff2d>] ? do_huge_pmd_anonymous_page+0x34d/0x450
 [<ffffffff8115f17e>] handle_mm_fault+0x17e/0x650
 [<ffffffff81093f93>] ? try_to_wake_up+0x203/0x2d0
 [<ffffffff81657181>] __do_page_fault+0x181/0x4f0
 [<ffffffff816574fe>] do_page_fault+0xe/0x10
 [<ffffffff81656c85>] do_async_page_fault+0x35/0x90
 [<ffffffff81653b48>] async_page_fault+0x28/0x30
Code: c0 48 89 45 98 4c 8b 8d 78 ff ff ff 0f 84 f7 04 00 00 48 8b 7d 98 45 31 d2 4c 89 75 90 4c 89 4d a0 45 89 d6 48 89 7d 88 49 89 fd <4d> 8b 07 48 8b 4d a0 31 f6 4c 89 e2 bf da 00 82 00 49 c1 e8 37 
RIP  [<ffffffff8118f4f4>] do_huge_pmd_wp_page+0x684/0xc20
 RSP <ffff880117dd7cd8>
CR2: 0000000000000000

Comment 1 Rodney 2013-04-25 13:49:01 UTC
Created attachment 739905 [details]
File: dmesg

Comment 2 Dave Jones 2013-04-25 23:02:06 UTC
we've seen a few reports similar to this trace, with no real explanation for it yet.

Is there anything sensitive in the VM that prevents you from sharing it ?
If not, any chance you could prepare a slimmed down virt image and put it up somewhere so I could try to reproduce this with ?

Comment 3 Rodney 2013-04-26 14:18:24 UTC
I can probably prepare a smaller image.  I'll get back to you when I have something.

Comment 4 Rodney 2013-05-28 15:58:00 UTC
If there's still interest, I finally have a smaller image.  It's about 1 GB compressed.  The problem seemed to go away when I cut further than that.

I don't have an internet-accessible place to put this long term, but could probably stash it in an S3 bucket for a short period.

Comment 5 Justin M. Forbes 2013-10-18 21:04:12 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs.

Fedora 18 has now been rebased to 3.11.4-101.fc18.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19.

If you experience different issues, please open a new bug report for those.

Comment 6 Justin M. Forbes 2013-11-27 16:09:50 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  

It has been over a month since we asked you to test the 3.11 kernel updates and let us know if your issue has been resolved or is still a problem. When this happened, the bug was set to needinfo.  Because the needinfo is still set, we assume either this is no longer a problem, or you cannot provide additional information to help us resolve the issue.  As a result we are closing with insufficient data. If this is still a problem, we apologize, feel free to reopen the bug and provide more information so that we can work towards a resolution

If you experience different issues, please open a new bug report for those.

Comment 7 Rodney 2014-06-16 14:01:42 UTC
I offered a VM, but got no response.  At this point, this is all probably irrelevant, so I'm happy to have this bug closed, but I recently started getting notifications about information being required, so I'm trying to eliminate those.  If this response isn't sufficient, what do I need to do?