Bug 924916

Summary: [abrt]: BUG: unable to handle kernel paging request at ffff87ffffffffff
Product: [Fedora] Fedora Reporter: Ian Pilcher <ipilcher>
Component: kernelAssignee: Marcelo Tosatti <mtosatti>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: dgilbert, esalvati, gansalmon, ipilcher, itamar, jonathan, jpeeler, kernel-maint, krzysztof.taraszka, madhu.chinakonda, robert.keersse
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: abrt_hash:3e418626a9dee12133e33f64869ea02387e48f89
Fixed In Version: kernel-3.12.7-300.fc20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-14 08:34:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kerneloops none

Description Ian Pilcher 2013-03-22 19:54:52 UTC
Additional info:
BUG: unable to handle kernel paging request at ffff87ffffffffff
IP: [<ffffffffa0295641>] __direct_map.isra.104+0xa1/0x210 [kvm]
PGD 0 
Oops: 0000 [#1] SMP 
Modules linked in: bnep bluetooth rfkill ip6table_filter ip6_tables ebtable_nat ebtables openvswitch xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack w83627ehf hwmon_vid xfs btrfs libcrc32c zlib_deflate snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd soundcore vhost_net iTCO_wdt tun iTCO_vendor_support macvtap coretemp macvlan e1000e kvm_intel lpc_ich i2c_i801 kvm mei mfd_core serio_raw uinput microcode nfsd auth_rpcgss nfs_acl lockd sunrpc raid1 i915 video crc32c_intel i2c_algo_bit drm_kms_helper firewire_ohci ghash_clmulni_intel drm firewire_core crc_itu_t i2c_core
CPU 0 
Pid: 4221, comm: qemu-kvm Not tainted 3.7.9-201.fc18.x86_64 #1                  /DQ67SW
RIP: 0010:[<ffffffffa0295641>]  [<ffffffffa0295641>] __direct_map.isra.104+0xa1/0x210 [kvm]
RSP: 0018:ffff8807f70bbb58  EFLAGS: 00010293
RAX: 0000000000000004 RBX: 000ffffffffff000 RCX: 0000000000000027
RDX: 00000000d2012000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: ffff8807f70bbc08 R08: ffff87ffffffffff R09: 00000000001167f2
R10: ffff87ffffffffff R11: 0000000000000000 R12: ffff8807f9108000
R13: 0000000000000001 R14: ffff880000000000 R15: 0000000000000001
FS:  00007ff202205700(0000) GS:ffff88083e200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff87ffffffffff CR3: 00000007f9200000 CR4: 00000000000427e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-kvm (pid: 4221, threadinfo ffff8807f70ba000, task ffff8807e34bc560)
Stack:
 0000000000000000 0000000000000001 ffff8807f70bbb78 ffffffffa0275550
 ffff8807f70bbb88 ffffffffa0275579 00ff8807f70bbc08 00000000001167f2
 0000000200000001 00000000000d2012 ffff8807f70bbbd8 00000000a02750a1
Call Trace:
 [<ffffffffa0275550>] ? __gfn_to_pfn+0x60/0x70 [kvm]
 [<ffffffffa0275579>] ? gfn_to_pfn_prot+0x19/0x20 [kvm]
 [<ffffffffa0295eb0>] tdp_page_fault+0x1d0/0x210 [kvm]
 [<ffffffffa028ff61>] kvm_mmu_page_fault+0x31/0x100 [kvm]
 [<ffffffffa02044a6>] handle_ept_violation+0x66/0x120 [kvm_intel]
 [<ffffffffa020966c>] vmx_handle_exit+0xcc/0x780 [kvm_intel]
 [<ffffffff81096365>] ? sched_c

Comment 1 Jeff Peeler 2013-04-10 16:02:26 UTC
Created attachment 733776 [details]
kerneloops

I'm experiencing the same issue on 3.8.5-201. I suspect it has to do with having nested kvm enabled as I've never had any problems previously. If there's something I can do to collect more debugging let me know. Seems to occur every few days.

Comment 2 Marcelo Tosatti 2013-06-04 02:49:36 UTC
(In reply to Jeff Peeler from comment #1)
> Created attachment 733776 [details]
> kerneloops
> 
> I'm experiencing the same issue on 3.8.5-201. I suspect it has to do with
> having nested kvm enabled as I've never had any problems previously. If
> there's something I can do to collect more debugging let me know. Seems to
> occur every few days.

Jeff,

Please install kernel-debug package (and use that to boot), enable the kernel boot option:

slub_debug=ZFPU

And attempt to reproduce?

(yes, its probably related to nested, see https://bugzilla.redhat.com/show_bug.cgi?id=885497).

Comment 3 Marcelo Tosatti 2013-06-04 03:19:27 UTC
*** Bug 911381 has been marked as a duplicate of this bug. ***

Comment 4 Edson Dino Salvati 2013-06-27 19:12:20 UTC
I had the same problem, except the fact that I'm running plain simple unfancy qemu-kvm virtual machines. The problem seems to have stopped since I installed the kernel-debug package and enabled the slub_debug=ZFPU option.
Anyway, if the system happens to hang, what should I send you?

Comment 5 Marcelo Tosatti 2013-07-01 04:36:37 UTC
(In reply to Edson Salvati from comment #4)
> I had the same problem, except the fact that I'm running plain simple
> unfancy qemu-kvm virtual machines. The problem seems to have stopped since I
> installed the kernel-debug package and enabled the slub_debug=ZFPU option.
> Anyway, if the system happens to hang, what should I send you?

Edson,

You should see a additional information from the SLUB debugging code in dmesg, before the crash.

It is useful to configure netconsole to make sure those are not lost.

Comment 6 Justin M. Forbes 2013-10-18 21:17:41 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs.

Fedora 18 has now been rebased to 3.11.4-101.fc18.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19.

If you experience different issues, please open a new bug report for those.

Comment 7 Ian Pilcher 2013-10-21 23:31:58 UTC
I haven't personally seen this for a while.  I'm going to close this.  If anyone out there is still seeing this error, add a comment to this bug, and I'll re-open (although F18 is getting close to EOL, so I'm not sure there's that much point in doing so).

Comment 8 Krzysztof Taraszka 2013-10-28 07:04:30 UTC
Oct 28 05:00:03 vmcore1ng kernel: [354678.609985] IP: [<ffffffffa0371140>] __direct_map+0xa0/0x220 [kvm]
Oct 28 05:00:03 vmcore1ng kernel: [354678.610337] PGD 0 
Oct 28 05:00:03 vmcore1ng kernel: [354678.610461] Oops: 0000 [#1] SMP 
Oct 28 05:00:03 vmcore1ng kernel: [354678.610654] Modules linked in: ip6table_filter ip6_tables ebtable_nat xt_nat ebtables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sunrpc 8021q mrp garp stp llc openvswitch gre binfmt_misc vhost_net macvta
p macvlan vhost tun kvm_intel kvm iTCO_wdt iTCO_vendor_support mperf zfs(POF) zcommon(POF) znvpair(POF) zavl(POF) zunicode(POF) spl(OF) zlib_deflate serio_raw sb_edac edac_core i2c_i801 lpc_ich mfd_core mei_me mei ioatdma igb dca ptp pps_core raid1(F) isci(F) libsas(F) scsi_
transport_sas(F) wmi(F) mgag200(F) ttm(F) drm_kms_helper(F) drm(F) i2c_algo_bit(F) i2c_core(F)
Oct 28 05:00:03 vmcore1ng kernel: [354678.614009] CPU: 7 PID: 2400 Comm: qemu-kvm Tainted: PF          O 3.11.4-201.el6.x86_64 #1
Oct 28 05:00:03 vmcore1ng kernel: [354678.614513] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 2.0a 04/19/2013
Oct 28 05:00:03 vmcore1ng kernel: [354678.615146] task: ffff88081b01db80 ti: ffff88045b224000 task.ti: ffff88045b224000
Oct 28 05:00:03 vmcore1ng kernel: [354678.615591] RIP: 0010:[<ffffffffa0371140>]  [<ffffffffa0371140>] __direct_map+0xa0/0x220 [kvm]
Oct 28 05:00:03 vmcore1ng kernel: [354678.624993] RSP: 0018:ffff88045b225b08  EFLAGS: 00010293
Oct 28 05:00:03 vmcore1ng kernel: [354678.634252] RAX: ffff87ffffffffff RBX: ffff880000000000 RCX: ffff87ffffffffff
Oct 28 05:00:03 vmcore1ng kernel: [354678.643720] RDX: 00000001f1998000 RSI: ffff8807c89d8000 RDI: 0000000000000004
Oct 28 05:00:03 vmcore1ng kernel: [354678.653305] RBP: ffff88045b225ba8 R08: 0000000000000001 R09: 00000000001f1998
Oct 28 05:00:03 vmcore1ng kernel: [354678.662684] R10: 57ff969838d9f700 R11: 0000000000000000 R12: ffff8807c89d8000
Oct 28 05:00:03 vmcore1ng kernel: [354678.671823] R13: 0000000000000001 R14: 000ffffffffff000 R15: 0000000000000001
Oct 28 05:00:03 vmcore1ng kernel: [354678.681126] FS:  00007fc120891700(0000) GS:ffff88107fce0000(0000) knlGS:0000000000000000
Oct 28 05:00:03 vmcore1ng kernel: [354678.690489] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 28 05:00:03 vmcore1ng kernel: [354678.699951] CR2: ffff87ffffffffff CR3: 00000002d0d88000 CR4: 00000000000427e0
Oct 28 05:00:03 vmcore1ng kernel: [354678.709743] Stack:
Oct 28 05:00:03 vmcore1ng kernel: [354678.719542]  ffff88045b225b18 ffffffffa034f789 ffff88045b225ba8 ffffffffa036f18a
Oct 28 05:00:03 vmcore1ng kernel: [354678.730000]  ffff88045b225b48 00ffffff816664a6 0000000000000000 00000000001f1998
Oct 28 05:00:03 vmcore1ng kernel: [354678.740561]  ffff88045b225b78 00000001f1998000 ffffffffffffffff ffff87ffffffffff
Oct 28 05:00:03 vmcore1ng kernel: [354678.751351] Call Trace:
Oct 28 05:00:03 vmcore1ng kernel: [354678.762115]  [<ffffffffa034f789>] ? gfn_to_pfn_prot+0x19/0x20 [kvm]
Oct 28 05:00:03 vmcore1ng kernel: [354678.773018]  [<ffffffffa036f18a>] ? try_async_pf+0x16a/0x220 [kvm]
Oct 28 05:00:03 vmcore1ng kernel: [354678.783769]  [<ffffffffa03714a5>] tdp_page_fault+0x1e5/0x230 [kvm]
Oct 28 05:00:03 vmcore1ng kernel: [354678.794421]  [<ffffffffa036dd21>] kvm_mmu_page_fault+0x31/0x100 [kvm]
Oct 28 05:00:03 vmcore1ng kernel: [354678.805169]  [<ffffffffa03d85ae>] handle_ept_violation+0x7e/0x150 [kvm_intel]
Oct 28 05:00:03 vmcore1ng kernel: [354678.815852]  [<ffffffffa03dcaa6>] vmx_handle_exit+0x106/0x8a0 [kvm_intel]
Oct 28 05:00:03 vmcore1ng kernel: [354678.826196]  [<ffffffffa03d43d0>] ? move_msr_up+0x70/0x70 [kvm_intel]
Oct 28 05:00:03 vmcore1ng kernel: [354678.836243]  [<ffffffffa03666ee>] kvm_arch_vcpu_ioctl_run+0xaae/0x1200 [kvm]
Oct 28 05:00:03 vmcore1ng kernel: [354678.846124]  [<ffffffff8109b65e>] ? try_to_wake_up+0xce/0x290
Oct 28 05:00:03 vmcore1ng kernel: [354678.855762]  [<ffffffffa03d6a66>] ? __vmx_load_host_state.part.16+0x116/0x140 [kvm_intel]
Oct 28 05:00:03 vmcore1ng kernel: [354678.865279]  [<ffffffffa034fd3a>] kvm_vcpu_ioctl+0x41a/0x5e0 [kvm]
Oct 28 05:00:03 vmcore1ng kernel: [354678.874724]  [<ffffffff811eefd2>] ? fsnotify+0x1d2/0x2b0
Oct 28 05:00:03 vmcore1ng kernel: [354678.884048]  [<ffffffff81094258>] ? __wake_up_locked_key+0x18/0x20
Oct 28 05:00:03 vmcore1ng kernel: [354678.893058]  [<ffffffff811f8463>] ? eventfd_write+0xd3/0x1c0
Oct 28 05:00:03 vmcore1ng kernel: [354678.901865]  [<ffffffff811c1f1b>] do_vfs_ioctl+0x8b/0x4e0
Oct 28 05:00:03 vmcore1ng kernel: [354678.910580]  [<ffffffffa035b014>] ? kvm_on_user_return+0x74/0x80 [kvm]
Oct 28 05:00:03 vmcore1ng kernel: [354678.919150]  [<ffffffff811c2401>] SyS_ioctl+0x91/0xb0
Oct 28 05:00:03 vmcore1ng kernel: [354678.927505]  [<ffffffff810139d7>] ? do_notify_resume+0x67/0xc0
Oct 28 05:00:03 vmcore1ng kernel: [354678.935835]  [<ffffffff81671619>] system_call_fastpath+0x16/0x1b
Oct 28 05:00:03 vmcore1ng kernel: [354678.944091] Code: 89 d0 48 d3 e8 48 89 d9 48 03 4d b0 25 ff 01 00 00 41 39 fd 89 45 c4 48 8d 04 c1 48 89 45 b8 74 67 66 2e 0f 1f 84 00 00 00 00 00 <48> 8b 08 f6 c1 01 0f 84 0c 01 00 00 48 8b 35 3d cb 02 00 48 21 
Oct 28 05:00:03 vmcore1ng kernel: [354678.970758] RIP  [<ffffffffa0371140>] __direct_map+0xa0/0x220 [kvm]
Oct 28 05:00:03 vmcore1ng kernel: [354678.979497]  RSP <ffff88045b225b08>
Oct 28 05:00:03 vmcore1ng kernel: [354678.987837] CR2: ffff87ffffffffff
Oct 28 05:00:03 vmcore1ng kernel: [354679.030288] ---[ end trace 1b3b031337e11615 ]---

Hi, unfortunately it happend today morning at one of my servers. I took the 3.11.4-201 FC19 kernel, has compiled and boot under EL6 (I was looking for nested virtualization). There is nested virtualization on.

Comment 9 Dr. David Alan Gilbert 2013-12-12 16:00:58 UTC
Looks like I just triggered this on nested kvm on f20 (3.11.10-301.fc20.x86_64) with a nested guest.
Only two lines to make it to the log were:
Dec 12 15:37:31 localhost.localdomain kernel: BUG: unable to handle kernel paging request at ffff87ffffffffff
Dec 12 15:37:31 localhost.localdomain kernel: IP: [<ffffffffa04709dd>] __direct_map.isra.108+0x9d/0x1f0 [kvm]

it triggered as I did a shutdown on the L2 guest.

Comment 10 Marcelo Tosatti 2013-12-12 19:25:00 UTC
(In reply to Dr. David Alan Gilbert from comment #9)
> Looks like I just triggered this on nested kvm on f20
> (3.11.10-301.fc20.x86_64) with a nested guest.
> Only two lines to make it to the log were:
> Dec 12 15:37:31 localhost.localdomain kernel: BUG: unable to handle kernel
> paging request at ffff87ffffffffff
> Dec 12 15:37:31 localhost.localdomain kernel: IP: [<ffffffffa04709dd>]
> __direct_map.isra.108+0x9d/0x1f0 [kvm]
> 
> it triggered as I did a shutdown on the L2 guest.

David,

Can you please 

1) detail the conditions which the bug was reproduced.
2) enable debugging options as noted on comment #2 (in the host), and attempt to reproduce?

Thanks

Comment 11 Dr. David Alan Gilbert 2013-12-12 19:40:36 UTC
Marcelo:
  The setup I've got is:

The host (T530 thinkpad, i7-3520M) is running an up to date F20 (KDE desktop)
with the normal mix of browsers, and normal desktop stuff.

Nesting is enabled on the kvm-intel module load.
nmi_watchdog is disabled on the host by:
echo 0 > /proc/sys/kernel/nmi_watchdog

l1 is run with the command:
qemu-system-x86_64 -machine pc-i440fx-1.6,accel=kvm -m 2048 -smp 2 -drive id=image,file=/home/vmimages/littlefed20.img -nographic -netdev user,id=unet,hostfwd=tcp::2022-:22,hostfwd=tcp::2023-:2022 -device virtio-net,netdev=unet -cpu SandyBridge,+vmx

from a terminal as my normal user.

I then ssh into l1 and run:
qemu-system-x86_64 -machine pc-i440fx-1.6,accel=kvm -m 128 -smp 2 -drive id=image,file=images/littlefed20.img -nographic -netdev user,id=unet,hostfwd=tcp::2022-:22 -device virtio-net,netdev=unet -cpu SandyBridge

giving the l2 I can ssh into.

The 'littlefed20.img' is a qcow2 image installed from an F20 beta-x86-64 netinstall selecting the 'minimal' option.

I've probably booted that stack maybe a dozen times now and only once did it trigger the oops.
The oops was triggered when I issued a shutdown -h now  on the L2.

Both the L0 and L1 have some moans about unhandled rdmsr's.

I'll try and reproduce it.

Dave

Comment 12 Dr. David Alan Gilbert 2013-12-13 20:02:39 UTC
I've tried 10+ cycles and it's not failed again, so I guess it's a good heisenbug.
I'm running with the debug kernel and the slub_debug=ZFPU  so maybe it'll spot something over the next few weeks.

Comment 13 Marcelo Tosatti 2013-12-20 22:11:13 UTC
Patch posted:

http://www.spinics.net/lists/kvm/msg98909.html

Comment 14 Fedora End Of Life 2013-12-21 12:24:05 UTC
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 15 Josh Boyer 2014-01-06 13:20:01 UTC
Patch applied.

Comment 16 Fedora Update System 2014-01-11 16:17:01 UTC
kernel-3.12.7-300.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.12.7-300.fc20

Comment 17 Fedora Update System 2014-01-11 16:20:22 UTC
kernel-3.12.7-200.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.12.7-200.fc19

Comment 18 Fedora Update System 2014-01-12 04:59:55 UTC
Package kernel-3.12.7-200.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.12.7-200.fc19'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-0684/kernel-3.12.7-200.fc19
then log in and leave karma (feedback).

Comment 19 Fedora Update System 2014-01-14 08:34:34 UTC
kernel-3.12.7-200.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 20 Fedora Update System 2014-01-14 08:37:47 UTC
kernel-3.12.7-300.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.