Bug 1047892 - [ 3.13.0-0.rc6.git0.1.fc21.x86_64+debug] Kernel stack trace: ""kernel BUG at mm/page_alloc.c:2788"
Summary: [ 3.13.0-0.rc6.git0.1.fc21.x86_64+debug] Kernel stack trace: ""kernel BUG at ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Marcelo Tosatti
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-02 14:12 UTC by Kashyap Chamarthy
Modified: 2014-06-12 01:45 UTC (History)
10 users (show)

Fixed In Version: kernel-3.12.7-300.fc20
Clone Of:
Environment:
Last Closed: 2014-01-14 08:34:26 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Complete stdout of dmesg (66.98 KB, text/plain)
2014-01-02 14:16 UTC, Kashyap Chamarthy
no flags Details
Complete stdout of dmidecode (14.28 KB, text/plain)
2014-01-02 14:17 UTC, Kashyap Chamarthy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 68051 0 None None None Never

Description Kashyap Chamarthy 2014-01-02 14:12:06 UTC
Description of problem
----------------------

Booting into 3.13.0-0.rc6.git0.1.fc21.x86_64+debug Kernel causes the below Stack trace (refer below for complete trace):

========
Jan  2 13:17:59 tesla kernel: [54532.561299] ------------[ cut here ]------------
Jan  2 13:17:59 tesla kernel: [54532.561342] kernel BUG at mm/page_alloc.c:2788!
Jan  2 13:17:59 tesla kernel: [54532.561357] invalid opcode: 0000 [#1] SMP 
========


Version info
-------------

  $ uname -r; rpm -q qemu-system-x86
  3.13.0-0.rc6.git0.1.fc21.x86_64+debug
  qemu-system-x86-1.7.0-3.fc21.x86_64


Stack trace from logs
---------------------

========
.
.
.
Jan  2 13:17:59 tesla kernel: [54532.561299] ------------[ cut here ]------------
Jan  2 13:17:59 tesla kernel: [54532.561342] kernel BUG at mm/page_alloc.c:2788!
Jan  2 13:17:59 tesla kernel: [54532.561357] invalid opcode: 0000 [#1] SMP 
Jan  2 13:17:59 tesla kernel: [54532.561374] Modules linked in: cdc_acm usb_storage vfat fat mmc_block vhost_net vhost macvtap macvlan xt_CHECKSUM tun
 bridge stp llc ebtable_nat fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_secur
ity ip6table_raw ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack_
ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables openvswitch vxlan ip_tunnel gre libcrc32c dm_crypt iT
CO_wdt iTCO_vendor_support x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel microcode snd_hd
a_codec_hdmi snd_hda_codec_conexant uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core serio_raw videodev media arc4 btusb iwldvm bluetooth ma
c80211 snd_hda_intel snd_hda_codec iwlwifi snd_hwdep snd_seq sdhci_pci lpc_ich sdhci snd_seq_device i2c_i801 mfd_core mmc_core cfg80211 snd_pcm e1000e
 ptp pps_core snd_page_alloc mei_me shpchp snd_timer mei wmi thinkpad_acpi snd soundcore rfkill binfmt_misc uinput i915 i2c_algo_bit drm_kms_helper dr
m i2c_core video
Jan  2 13:17:59 tesla kernel: [54532.561783] CPU: 0 PID: 3158 Comm: qemu-system-x86 Not tainted 3.13.0-0.rc6.git0.1.fc21.x86_64+debug #1
Jan  2 13:17:59 tesla kernel: [54532.561817] Hardware name: LENOVO 4291IQ1/4291IQ1, BIOS 8DET63WW (1.33 ) 07/19/2012
Jan  2 13:17:59 tesla kernel: [54532.561840] task: ffff8801e2f325f0 ti: ffff8801df3fe000 task.ti: ffff8801df3fe000
Jan  2 13:17:59 tesla kernel: [54532.561863] RIP: 0010:[<ffffffff8118a21d>]  [<ffffffff8118a21d>] free_pages+0x6d/0x70
Jan  2 13:17:59 tesla kernel: [54532.561891] RSP: 0018:ffff8801df3ffb78  EFLAGS: 00010246
Jan  2 13:17:59 tesla kernel: [54532.561907] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000001
Jan  2 13:17:59 tesla kernel: [54532.561929] RDX: 6b6b6b6beb6b6b6b RSI: 0000000000000000 RDI: 6b6be36b6b6b6b6b
Jan  2 13:17:59 tesla kernel: [54532.561950] RBP: ffff8801df3ffb88 R08: ffffea0002edce60 R09: 0000000000000000
Jan  2 13:17:59 tesla kernel: [54532.561971] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
Jan  2 13:17:59 tesla kernel: [54532.561993] R13: ffff8801d79002c0 R14: ffff8801d79002d8 R15: ffff8801d79030c0
Jan  2 13:17:59 tesla kernel: [54532.562014] FS:  00007ffc1d663700(0000) GS:ffff880214c00000(0000) knlGS:0000000000000000
Jan  2 13:17:59 tesla kernel: [54532.562038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan  2 13:17:59 tesla kernel: [54532.562055] CR2: 00007fdf33f94100 CR3: 0000000001c0c000 CR4: 00000000000427e0
Jan  2 13:17:59 tesla kernel: [54532.562076] Stack:
Jan  2 13:17:59 tesla kernel: [54532.562084]  00000000df3ffb88 ffff880009390638 ffff8801df3ffba0 ffffffffa0766f66
Jan  2 13:17:59 tesla kernel: [54532.562113]  ffff8800af0d8000 ffff8801df3ffbb8 ffffffffa076dea3 ffff8800af0d8000
Jan  2 13:17:59 tesla kernel: [54532.562140]  ffff8801df3ffbd0 ffffffffa05c2700 ffff8801d7900000 ffff8801df3ffc20
Jan  2 13:17:59 tesla kernel: [54532.562167] Call Trace:
Jan  2 13:17:59 tesla kernel: [54532.562180]  [<ffffffffa0766f66>] free_loaded_vmcs+0x26/0x30 [kvm_intel]
Jan  2 13:17:59 tesla kernel: [54532.562203]  [<ffffffffa076dea3>] vmx_free_vcpu+0x33/0x70 [kvm_intel]
Jan  2 13:17:59 tesla kernel: [54532.562232]  [<ffffffffa05c2700>] kvm_arch_vcpu_free+0x50/0x60 [kvm]
Jan  2 13:17:59 tesla kernel: [54532.562259]  [<ffffffffa05c3312>] kvm_arch_destroy_vm+0x102/0x240 [kvm]
Jan  2 13:17:59 tesla kernel: [54532.562281]  [<ffffffff810ed77d>] ? synchronize_srcu+0x1d/0x20
Jan  2 13:17:59 tesla kernel: [54532.562300]  [<ffffffff811d3a8e>] ? mmu_notifier_unregister+0xee/0x130
Jan  2 13:17:59 tesla kernel: [54532.562331]  [<ffffffffa05a8be1>] kvm_put_kvm+0xe1/0x190 [kvm]
Jan  2 13:17:59 tesla kernel: [54532.562353]  [<ffffffffa05a8cc8>] kvm_vcpu_release+0x18/0x20 [kvm]
Jan  2 13:17:59 tesla kernel: [54532.562372]  [<ffffffff811ff525>] __fput+0xf5/0x2c0
Jan  2 13:17:59 tesla kernel: [54532.562388]  [<ffffffff811ff73e>] ____fput+0xe/0x10
Jan  2 13:17:59 tesla kernel: [54532.562403]  [<ffffffff8109ca04>] task_work_run+0xb4/0xe0
Jan  2 13:17:59 tesla kernel: [54532.562421]  [<ffffffff81077664>] do_exit+0x2e4/0xcf0
Jan  2 13:17:59 tesla kernel: [54532.562437]  [<ffffffff810780fc>] do_group_exit+0x4c/0xc0
Jan  2 13:17:59 tesla kernel: [54532.562454]  [<ffffffff8108ae11>] get_signal_to_deliver+0x2d1/0x930
Jan  2 13:17:59 tesla kernel: [54532.562474]  [<ffffffff81019518>] do_signal+0x48/0x610
Jan  2 13:17:59 tesla kernel: [54532.562491]  [<ffffffff811018a6>] ? do_futex+0xe6/0xd00
Jan  2 13:17:59 tesla kernel: [54532.562509]  [<ffffffff810ab2df>] ? finish_task_switch+0x3f/0x120
Jan  2 13:17:59 tesla kernel: [54532.562528]  [<ffffffff810d1c8d>] ? trace_hardirqs_on+0xd/0x10
Jan  2 13:17:59 tesla kernel: [54532.563423]  [<ffffffff81019b50>] do_notify_resume+0x70/0xa0
Jan  2 13:17:59 tesla kernel: [54532.564274]  [<ffffffff81771562>] int_signal+0x12/0x17
Jan  2 13:17:59 tesla kernel: [54532.565134] Code: 00 00 00 ea ff ff 48 01 d3 48 0f 42 05 0d 8e a8 00 48 01 c3 48 c1 eb 0c 48 c1 e3 06 48 01 df e8 2a ff ff ff 48 83 c4 08 5b 5d c3 <0f> 0b 90 66 66 66 66 90 55 48 85 ff 48 89 e5 41 55 49 89 fd 41 
Jan  2 13:17:59 tesla kernel: [54532.567034] RIP  [<ffffffff8118a21d>] free_pages+0x6d/0x70
Jan  2 13:17:59 tesla kernel: [54532.567926]  RSP <ffff8801df3ffb78>
Jan  2 13:17:59 tesla kernel: [54532.572384] ---[ end trace ae34a7cb7620f953 ]---
Jan  2 13:17:59 tesla kernel: [54532.572389] Fixing recursive fault but reboot is needed!
========


Additional info
----------------

If it matters, I built QEMU from Fedora master git:

    $ sudo yum install fedpkg  -y
    $ sudo yum-builddep qemu
    $ fedpkg clone -B -a qemu
    $ cd qemu/master
    $ git log | head -1
    commit c4896d008b4e71e0cdbc505d8ff8849f830ac531
    $ fedpkg local
    $ sudo yum localupdate x86_64/*

Comment 1 Kashyap Chamarthy 2014-01-02 14:16:27 UTC
Created attachment 844569 [details]
Complete stdout of dmesg

Comment 2 Kashyap Chamarthy 2014-01-02 14:17:08 UTC
Created attachment 844570 [details]
Complete stdout of dmidecode

Comment 3 Marcelo Tosatti 2014-01-02 17:25:10 UTC
Kashyap Chamarthy,

1) Which sort of guest was executing when this issue occurred?
2) Is it reproducible?

Comment 4 Kashyap Chamarthy 2014-01-02 18:20:19 UTC
(In reply to Marcelo Tosatti from comment #3)
> Kashyap Chamarthy,
> 
> 1) Which sort of guest was executing when this issue occurred?

Fedora 20 guest.

> 2) Is it reproducible?

Cannot say definitively. I reported this on the first occurrence. I'd have to reboot my laptop again to see if I can reproduce this.

I should add some more context: I noticed this issue when I found an un-killable defunct qemu process (I ended up with it when I tried to force power-off the guest via `virsh`), and I had to reboot the host.

  $ virsh destroy ostack-compute 
  error: Failed to destroy domain 3
  error: Failed to terminate process 3152 with SIGKILL: Device or resource busy

NOTE - The disk image is *not* on NFS mount. It's on local disk:

 $ virsh domblklist ostack-compute
  Target     Source
  ------------------------------------------------
  vda        /home/kashyap/vmimages/ostack-compute.qcow2


Try to destroy the guest again, with LIBVIRT_DEBUG enabled, that's what I see in a loop:
=============
.
.
.
2014-01-02 12:42:54.012+0000: 27462: debug : virKeepAliveCheckMessage:395 : Got keepalive request from client 0x7f4dfaca75d0
2014-01-02 12:42:54.012+0000: 27462: debug : virNetMessageNew:44 : msg=0x7f4dfaca7770 tracked=0
2014-01-02 12:42:54.012+0000: 27462: debug : virNetMessageEncodePayloadEmpty:479 : Encode length as 28
2014-01-02 12:42:54.012+0000: 27462: debug : virKeepAliveMessage:101 : Sending keepalive response to client 0x7f4dfaca75d0
2014-01-02 12:42:54.012+0000: 27462: debug : virKeepAliveMessage:104 : RPC_KEEPALIVE_SEND: ka=0x7f4dfaca78c0 client=0x7f4dfaca75d0 prog=1801807216 vers=1 proc=2
2014-01-02 12:42:54.012+0000: 27462: debug : virNetClientQueueNonBlocking:1926 : RPC_CLIENT_MSG_TX_QUEUE: client=0x7f4dfaca75d0 len=28 prog=1801807216 vers=1 proc=2 type=2 status=0 serial=0
2014-01-02 12:42:54.012+0000: 27462: debug : virNetClientCallNew:1905 : New call 0x7f4dfaca7ae0: msg=0x7f4dfaca7770, expectReply=0, nonBlock=1
2014-01-02 12:42:54.012+0000: 27462: debug : virNetMessageClear:55 : msg=0x7f4dfaca7638 nfds=0
2014-01-02 12:42:54.012+0000: 27462: debug : virNetClientIOEventLoopRemoveDone:1379 : Removing completed call 0x7f4dfaca7ae0
2014-01-02 12:42:59.172+0000: 27462: debug : virNetMessageDecodeLength:149 : Got length, now need 28 total (24 more)
2014-01-02 12:42:59.172+0000: 27462: debug : virNetClientCallDispatch:1123 : RPC_CLIENT_MSG_RX: client=0x7f4dfaca75d0 len=28 prog=1801807216 vers=1 proc=1 type=2 status=0 serial=0
2014-01-02 12:42:59.172+0000: 27462: debug : virKeepAliveCheckMessage:374 : ka=0x7f4dfaca78c0, client=0x7f4dfaca75d0, msg=0x7f4dfaca7638
2014-01-02 12:42:59.172+0000: 27462: debug : virKeepAliveCheckMessage:391 : RPC_KEEPALIVE_RECEIVED: ka=0x7f4dfaca78c0 client=0x7f4dfaca75d0 prog=1801807216 vers=1 proc=1
.
.
.
=============

And:

  $ ps -ef | grep qemu
  qemu      3152     1 30  2013 ?        13:30:42 [qemu-system-x86] <defunct>

  $ pstree 3152
  qemu-system-x86───{qemu-system-x86}


Libvirt & QEMU version on the host:

  $ rpm -q libvirt-daemon-kvm qemu-system-x86
  libvirt-daemon-kvm-1.2.0-1.fc21.x86_64
  qemu-system-x86-1.7.0-3.fc21.x86_64

Comment 5 Marcelo Tosatti 2014-01-02 20:16:32 UTC
Are you using nested virtualization?

Comment 6 Kashyap Chamarthy 2014-01-02 21:04:05 UTC
(In reply to Marcelo Tosatti from comment #5)
> Are you using nested virtualization?

Yes.

  $ modinfo kvm_intel | grep -i nested
  parm:           nested:bool


And, just to be clear:

 - The crash was seen on reboot of *bare-metal* host which has Nested enabled.
 - The said guest (ostack-compute.qcow2) in comment 4 has host CPU 
   to the guest, i.e. I used:

     <cpu mode='host-passthrough'>
     </cpu>

   for the guest (ostack-compute.qcow2)

Comment 7 Marcelo Tosatti 2014-01-02 21:59:53 UTC
(In reply to Kashyap Chamarthy from comment #6)
> (In reply to Marcelo Tosatti from comment #5)
> > Are you using nested virtualization?
> 
> Yes.
> 
>   $ modinfo kvm_intel | grep -i nested
>   parm:           nested:bool
> 
> 
> And, just to be clear:
> 
>  - The crash was seen on reboot of *bare-metal* host which has Nested
> enabled.
>  - The said guest (ostack-compute.qcow2) in comment 4 has host CPU 
>    to the guest, i.e. I used:
> 
>      <cpu mode='host-passthrough'>
>      </cpu>
> 
>    for the guest (ostack-compute.qcow2)

Ok, are you actually using nested guests in the ostack-compute.qcow2 guest? TIA

Comment 8 Kashyap Chamarthy 2014-01-03 08:55:53 UTC
(In reply to Marcelo Tosatti from comment #7)
> (In reply to Kashyap Chamarthy from comment #6)
> > (In reply to Marcelo Tosatti from comment #5)
> > > Are you using nested virtualization?
> > 
> > Yes.
> > 
> >   $ modinfo kvm_intel | grep -i nested
> >   parm:           nested:bool
> > 
> > 
> > And, just to be clear:
> > 
> >  - The crash was seen on reboot of *bare-metal* host which has Nested
> > enabled.
> >  - The said guest (ostack-compute.qcow2) in comment 4 has host CPU 
> >    to the guest, i.e. I used:
> > 
> >      <cpu mode='host-passthrough'>
> >      </cpu>
> > 
> >    for the guest (ostack-compute.qcow2)
> 
> Ok, are you actually using nested guests in the ostack-compute.qcow2 guest?

Yes. The ostack-compute.qcow2 is the Compute host of OpenStack set-up which runs Nova instances (Cirros and Fedora). Also, if you prefer, I can post QEMU command-lines and libvirt XMLs of L1, L2.

Comment 9 Marcelo Tosatti 2014-01-03 21:15:08 UTC
Patch posted: http://article.gmane.org/gmane.comp.emulators.kvm.devel/117837

Comment 10 Josh Boyer 2014-01-06 19:21:19 UTC
Patch applied.  Thanks all!

Comment 11 Fedora Update System 2014-01-11 16:16:50 UTC
kernel-3.12.7-300.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.12.7-300.fc20

Comment 12 Fedora Update System 2014-01-11 16:20:16 UTC
kernel-3.12.7-200.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.12.7-200.fc19

Comment 13 Fedora Update System 2014-01-12 04:59:43 UTC
Package kernel-3.12.7-200.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.12.7-200.fc19'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-0684/kernel-3.12.7-200.fc19
then log in and leave karma (feedback).

Comment 14 Fedora Update System 2014-01-14 08:34:26 UTC
kernel-3.12.7-200.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 15 Fedora Update System 2014-01-14 08:37:40 UTC
kernel-3.12.7-300.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.