Bug 818630 - kernel BUG at include/linux/mm.h:402
kernel BUG at include/linux/mm.h:402
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
16
x86_64 Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
: 824722 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-03 11:18 EDT by Nick Byrne
Modified: 2012-11-14 10:24 EST (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-11-14 10:24:14 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Nick Byrne 2012-05-03 11:18:43 EDT
Description of problem:

[72329.911729] kernel BUG at include/linux/mm.h:402!
[72329.911786] invalid opcode: 0000 [#1] SMP 
[72329.911843] CPU 0 
[72329.911868] Modules linked in: ppdev parport_pc parport be2iscsi
iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser
rdma_cm bnep ib_cm iw_cm ib_sa bluetooth ib_mad ib_core rfkill ib_addr
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep snd_seq snd_seq_device snd_pcm shpchp edac_core tulip k10temp uinput
serio_raw edac_mce_amd snd_timer snd r8169 soundcore sp5100_tco snd_page_alloc
microcode mii i2c_piix4 ata_generic pata_acpi pata_atiixp firewire_ohci
firewire_core crc_itu_t pata_via radeon ttm drm_kms_helper drm i2c_algo_bit
i2c_core [last unloaded: nf_defrag_ipv4]
[72329.912003] 
[72329.912003] Pid: 3181, comm: scons Not tainted 3.3.2-6.fc16.x86_64 #1
BIOSTAR Group TA890GXE/TA890GXE
[72329.912003] RIP: 0010:[<ffffffff815ec0f2>]  [<ffffffff815ec0f2>]
get_page.part.5+0x4/0x6
[72329.912003] RSP: 0018:ffff8801163ebbf0  EFLAGS: 00010246
[72329.912003] RAX: 0000000000000000 RBX: 800000009ddfd045 RCX:
800000009dffc045
[72329.912003] RDX: ffffea0002777f40 RSI: 0000000001865000 RDI:
800000009ddfd045
[72329.912003] RBP: ffff8801163ebbf0 R08: ffff8800cf00ee70 R09:
0000000000016a80
[72329.912003] R10: ffff88011ffec700 R11: 0000000000000024 R12:
ffff8800cfb26328
[72329.912003] R13: 0000000001865000 R14: 0000000000000008 R15:
ffff8800cdf28328
[72329.912003] FS:  00007f040cdd2700(0000) GS:ffff88011fc00000(0000)
knlGS:00000000f64e3b40
[72329.912003] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[72329.912003] CR2: 00007f040cd61c30 CR3: 0000000118c7f000 CR4:
00000000000006f0
[72329.912003] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[72329.912003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[72329.912003] Process scons (pid: 3181, threadinfo ffff8801163ea000, task
ffff8800cfb74590)
[72329.912003] Stack:
[72329.912003]  ffff8801163ebcc0 ffffffff81145301 ffff88011ffedc08
0000000000000000
[72329.912003]  ffffffff812c131d ffffea0000000000 ffff88011187e390
ffff8801154a4790
[72329.912003]  ffff8800cfb74590 ffff8801128ba060 ffff8800cdf2a060
ffff8801163ebfd8
[72329.912003] Call Trace:
[72329.912003]  [<ffffffff81145301>] copy_pte_range+0x411/0x500
[72329.912003]  [<ffffffff812c131d>] ? cpumask_any_but+0x2d/0x40
[72329.912003]  [<ffffffff81147fb3>] copy_page_range+0x2d3/0x490
[72329.912003]  [<ffffffff81055377>] dup_mm+0x347/0x650
[72329.912003]  [<ffffffff810564a5>] copy_process+0xde5/0x14b0
[72329.912003]  [<ffffffff81056cba>] do_fork+0xfa/0x390
[72329.912003]  [<ffffffff815f2a34>] ? __schedule+0x3c4/0x7b0
[72329.912003]  [<ffffffff8101d6e8>] sys_clone+0x28/0x30
[72329.912003]  [<ffffffff815fc4f3>] stub_clone+0x13/0x20
[72329.912003]  [<ffffffff815fc1e9>] ? system_call_fastpath+0x16/0x1b
[72329.912003] Code: bf f1 81 48 85 f6 74 11 83 e2 7f 48 c1 e2 05 48 01 f2 74
05 f6 02 02 75 02 5d c3 48 89 f8 48 c1 e0 06 48 29 c8 eb f2 55 48 89 e5 <0f> 0b
55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 
[72329.912003] RIP  [<ffffffff815ec0f2>] get_page.part.5+0x4/0x6
[72329.912003]  RSP <ffff8801163ebbf0>

Version-Release number of selected component (if applicable):


How reproducible:

Occurs apparently at random. System still accessible (i.e. ssh session) but when i shutdown i got a kernel panic. I'll update with details of that when it happens again.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Dave Jones 2012-05-07 15:05:06 EDT
It's tripped up a check in the VM that should never happen.
Can you try running memtest86 for a while to rule out hardware problems ?
Comment 2 Nick Byrne 2012-05-07 15:41:20 EDT
Will do asap, probably tomorrow
Comment 3 Nick Byrne 2012-05-08 12:46:01 EDT
I ran memtest today for 1 1/2 hours with no errors.
Comment 4 Dave Jones 2012-05-08 14:34:01 EDT
can you try with the kernel-debug package ? That has even more debug features enabled, which might give some other clues.

Was this machine doing anything unusual before the oops ? hibernate/suspend ?
Comment 5 Nick Byrne 2012-05-14 14:07:29 EDT
Apologies for late reply. In answer to your questions, i'll install kernel-debug  pkg and the machine wasn't doing anything unusual, it's just a desktop that i use as a build machine.

I've no idea if related, probably not, but i got this today whilst cross compiling openssl - 

[520690.563951] BUG: Bad page map in process as  pte:20d3c067 pmd:cc062067
[520690.563958] page:ffffea0000834f00 count:2 mapcount:-1 mapping:ffff88002a85d378 index:0x4f
[520690.563962] page flags: 0x2000000000087c(referenced|uptodate|dirty|lru|active|private)
[520690.563971] addr:00000000f7665000 vm_flags:00100077 anon_vma:ffff8800a0ee4360 mapping:          (null) index:f7665
[520690.563977] Pid: 24764, comm: as Not tainted 3.1.1-2.fc16.x86_64 #1
[520690.563979] Call Trace:
[520690.563988]  [<ffffffff810f9727>] print_bad_pte+0x1a3/0x1bc
[520690.563993]  [<ffffffff810fb242>] unmap_vmas+0x4a8/0x6a6
[520690.563999]  [<ffffffff81100b79>] exit_mmap+0xa8/0x100
[520690.564023]  [<ffffffff81055662>] mmput+0x68/0xd6
[520690.564028]  [<ffffffff8105ad5a>] exit_mm+0x136/0x143
[520690.564034]  [<ffffffff814b719b>] ? _raw_spin_lock_irq+0x1c/0x1e
[520690.564042]  [<ffffffff8105afd8>] do_exit+0x271/0x764
[520690.564047]  [<ffffffff81100612>] ? do_munmap+0x2f2/0x30b
[520690.564055]  [<ffffffff8105b750>] do_group_exit+0x7a/0xa2
[520690.564063]  [<ffffffff8105b78f>] sys_exit_group+0x17/0x17
[520690.564070]  [<ffffffff814bfff0>] cstar_dispatch+0x7/0x2e
[520690.564075] Disabling lock debugging due to kernel taint
[522777.083649] sh[1785] general protection ip:304647fee5 sp:7fff78a95bf0 error:0 in libc-2.14.90.so[3046400000+1ad000]
Comment 6 Josh Boyer 2012-05-14 14:49:10 EDT
(In reply to comment #5)
> Apologies for late reply. In answer to your questions, i'll install
> kernel-debug  pkg and the machine wasn't doing anything unusual, it's just a
> desktop that i use as a build machine.
> 
> I've no idea if related, probably not, but i got this today whilst cross
> compiling openssl - 
> 
> [520690.563951] BUG: Bad page map in process as  pte:20d3c067 pmd:cc062067
> [520690.563958] page:ffffea0000834f00 count:2 mapcount:-1
> mapping:ffff88002a85d378 index:0x4f
> [520690.563962] page flags:
> 0x2000000000087c(referenced|uptodate|dirty|lru|active|private)
> [520690.563971] addr:00000000f7665000 vm_flags:00100077
> anon_vma:ffff8800a0ee4360 mapping:          (null) index:f7665
> [520690.563977] Pid: 24764, comm: as Not tainted 3.1.1-2.fc16.x86_64 #1

That's an entirely different (and stale) kernel.
Comment 7 Nick Byrne 2012-05-14 14:58:23 EDT
Indeed it is, it seems thats what booted after i ran memtest you suggested - i had to run memtest-setup and grub2-mkconfig to enable it. I'll revert and try again as the problem (whilst running 3.1.1) hasnt happened since...
Comment 8 f92809 2012-05-14 23:52:49 EDT
Just thought I'd add my two cents as the neighborhood of 
kernel BUG at include/linux/mm.h:402!
seems to be attracting more attention.   In particular regarding the use of vmware although as Comments 4 and 5 in the parent bug  https://bugzilla.redhat.com/show_bug.cgi?id=805984
to this one indicate, it happens for other applications as well.

I'm particularly interested however in this problem showing up for VMware's Workstation 8.0.x, although bug 805984 has been closed indicating this is VMware's problem.   Unfortunately, this problem reported over in vmware-land is stated as being a fedora 16 specific problem.   Examples include
http://communities.vmware.com/thread/401778
http://communities.vmware.com/message/2040785#2040785
and comment 17. in
http://weltall.heliohost.org/wordpress/2011/11/09/vmware-workstation-8-x-player-4-x-virtualbox-fix-for-linux-3-2/

As for myself I've commented in the 2040785 thread above, donating the trace below to it.  Unfortunately with the fedora/redhat responses stating this is a vmware problem and the vmware world believing it is clearly a fedora issue, we (on occasion) happy users of both have been stuck in the middle for the last few months without a solution.   I've tried my VMs created in FC15 with the 3.x kernels for FC16 since February with no joy - using all the relevant patches to build the VMware Workstations to run them.  I'm no kernel whiz, but I did build custom kernels of a couple 3.3.x versions in frustration to see if I could get closer to a solution - no joy. 

If another bug needs to be spun off for this I'm fine with that.  I'm just concerned that what happened to 805984 will happen to a new one as well.

Note that this issue appears to be hardware dependent.   I copied a couple of my VMs from my bigger box over to a small box with 8 GB memory and a quad Intel processor, using the same versions of VMware Workstation (with needed patches) and the FC16 kernels - no problems.   My bigger box has a Tyan S8230GM4NR board, dual Operton 6128s, 64 GB memory (DDR3 1333MHz ECC),  with a 64 GB SSD for the host OS accessing 18 TB of Raided SATA2 disk space for VMs and data.   My latest trace from the latter is attached below.  I've since gotten the same results with the 3.3.5-2.fc16.x86_64 kernel.

May  9 17:55:25 itgs-server01 kernel: [77022.886121] ------------[ cut here ]------------
May  9 17:55:25 itgs-server01 kernel: [77022.886127] kernel BUG at include/linux/mm.h:402!
May  9 17:55:25 itgs-server01 kernel: [77022.886131] invalid opcode: 0000 [#1] SMP
May  9 17:55:25 itgs-server01 kernel: [77022.886135] CPU 1
May  9 17:55:25 itgs-server01 kernel: [77022.886137] Modules linked in: vmnet(O) vsock(O) vmci(O) vmmon(O) parport_pc tcp_lp fuse ppdev lp parport bnep bluetooth rfkill ip6t_REJECT ip6t_ipv6header nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter nf_conntrack_ftp ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack raid0 usblp joydev snd_hda_codec_hdmi snd_hda_intel snd_virtuoso snd_oxygen_lib snd_hda_codec snd_mpu401_uart snd_hwdep snd_rawmidi sp5100_tco snd_seq microcode snd_pcm serio_raw i2c_piix4 snd_seq_device amd64_edac_mod e1000e snd_timer igb edac_core snd edac_mce_amd fglrx(PO) snd_page_alloc soundcore i2c_core k10temp amd_iommu_v2 dca nfsd lockd nfs_acl auth_rpcgss sunrpc uinput ata_generic pata_acpi usb_storage pata_atiixp [last unloaded: vmnet]
May  9 17:55:25 itgs-server01 kernel: [77022.886203]
May  9 17:55:25 itgs-server01 kernel: [77022.886207] Pid: 22997, comm: vmware-vmx Tainted: P           O 3.3.4-3.fc16.x86_64 #1 empty empty/S8230
May  9 17:55:25 itgs-server01 kernel: [77022.886214] RIP: 0010:[<ffffffffa06692d8>]  [<ffffffffa06692d8>] get_page.part.0+0x4/0xd2c [vmmon]
May  9 17:55:25 itgs-server01 kernel: [77022.886229] RSP: 0018:ffff880ee807bd38  EFLAGS: 00010246
May  9 17:55:25 itgs-server01 kernel: [77022.886232] RAX: 0000000000000000 RBX: ffffea000fc61d00 RCX: 0000000000000000
May  9 17:55:25 itgs-server01 kernel: [77022.886235] RDX: ffffea000fc61d40 RSI: ffffea000fc61d00 RDI: ffff8803f1875000
May  9 17:55:25 itgs-server01 kernel: [77022.886238] RBP: ffff880ee807bd38 R08: 0000000000430000 R09: 00000000002eb280
May  9 17:55:25 itgs-server01 kernel: [77022.886241] R10: 0000000000000003 R11: ffff880ae7d90a50 R12: ffff880407113738
May  9 17:55:25 itgs-server01 kernel: [77022.886244] R13: 0000000000000004 R14: 0000000000000003 R15: 0000000000000001
May  9 17:55:25 itgs-server01 kernel: [77022.886248] FS:  00007f51a7e19740(0000) GS:ffff88041fc40000(0000) knlGS:0000000000000000
May  9 17:55:25 itgs-server01 kernel: [77022.886251] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  9 17:55:25 itgs-server01 kernel: [77022.886254] CR2: 0000000000987720 CR3: 00000004070b2000 CR4: 00000000000006e0
May  9 17:55:25 itgs-server01 kernel: [77022.886257] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May  9 17:55:25 itgs-server01 kernel: [77022.886261] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May  9 17:55:25 itgs-server01 kernel: [77022.886264] Process vmware-vmx (pid: 22997, threadinfo ffff880ee807a000, task ffff880f19cdc590)
May  9 17:55:25 itgs-server01 kernel: [77022.886267] Stack:
May  9 17:55:25 itgs-server01 kernel: [77022.886269]  ffff880ee807bdc8 ffffffffa065fa28 0000000000000001 0000000000000000
May  9 17:55:25 itgs-server01 kernel: [77022.886280]  ffff880ae650adc0 ffff880407113710 0000000000000003 0000000000000004
May  9 17:55:25 itgs-server01 kernel: [77022.886286]  0000000400000003 ffff880407113700 00000002000200d2 0000000000000000
May  9 17:55:25 itgs-server01 kernel: [77022.886292] Call Trace:
May  9 17:55:25 itgs-server01 kernel: [77022.886300]  [<ffffffffa065fa28>] LinuxDriverMmap+0x2a8/0x2d0 [vmmon]
May  9 17:55:25 itgs-server01 kernel: [77022.886308]  [<ffffffff8114cc89>] mmap_region+0x369/0x510
May  9 17:55:25 itgs-server01 kernel: [77022.886314]  [<ffffffff8114d178>] do_mmap_pgoff+0x348/0x360
May  9 17:55:25 itgs-server01 kernel: [77022.886320]  [<ffffffff8114d256>] sys_mmap_pgoff+0xc6/0x230
May  9 17:55:25 itgs-server01 kernel: [77022.886325]  [<ffffffff8106efc0>] ? sys_setresuid+0x140/0x160
May  9 17:55:25 itgs-server01 kernel: [77022.886332]  [<ffffffff810189e2>] sys_mmap+0x22/0x30
May  9 17:55:25 itgs-server01 kernel: [77022.886337]  [<ffffffff815fbee9>] system_call_fastpath+0x16/0x1b
May  9 17:55:25 itgs-server01 kernel: [77022.886340] Code: c7 c7 7a b8 66 a0 e8 48 9f ff ff 31 c0 e9 82 fc ff ff c7 83 78 04 00 00 00 00 00 00 66 b8 00 e0 e9 eb fb ff ff 00 00 55 48 89 e5 <0f> 0b bf f2 ff ff ff e9 f4 9d ff ff 00 00 00 00 00 00 00 00 00
May  9 17:55:25 itgs-server01 kernel: [77022.886379] RIP  [<ffffffffa06692d8>] get_page.part.0+0x4/0xd2c [vmmon]
May  9 17:55:25 itgs-server01 kernel: [77022.886386]  RSP <ffff880ee807bd38>
May  9 17:55:25 itgs-server01 kernel: [77022.886390] ---[ end trace f3774dccb61daab7 ]---
Comment 9 Dave Jones 2012-05-15 00:35:33 EDT
The reason that this only shows up in Fedora is that unlike other distros, we enable DEBUG_VM, which introduces a small performance impact, at the benefit of showing up memory problems faster. Other distributions have this same problem, but they remain silent about it.

In the problem Nick reported here, he's found a way to trigger the check without vmware being loaded. This is (hopefully) something we can do something about.

There is no guarantee that this is the same root cause as the vmware case, but it is interesting that this is the first untainted report we've had so far that I recall.
Comment 10 f92809 2012-05-15 00:53:38 EDT
Hi Dave,

Good to see this is being actively tracked and there is some hope for the vmware case being resolved.   If it isn't in the next month or so, I'll drop FC16 and go back to FC15 until the smoke clears.

BTW - I'd tried running the vmware workstation on a 3.3.4 kernel built with the 
rpmbuild -bb --without debug ...
option to see if this would disable DEBUG_VM and get me going until a real solution came along, even if side effects occurred.  I gather it doesn't since I still had the same problem.
Comment 11 Eric Smith 2012-05-16 18:53:01 EDT
I've experienced this bug on F16 with various 3.3.x x86_64 kernels on a system with an AMD Athlon II X3 450 processor, but not on F16 with the same kernels on a system with an Intel Core i3 M330 processor.
Comment 12 f92809 2012-05-19 16:32:35 EDT
Interesting - over in VMware community land, this result was added a couple of days ago to http://communities.vmware.com/message/2040785#2040785

Fails with a Kernel bug on FC16
reply from brouhaha in VMware Workstation Technology Preview 2012 - View the full discussion

I get the same crash in mm.h:402 on Fedora 16 with their 3.3.0-4, 3.3.2-6, and 3.3.5-2 x86_64 kernels on an AMD processor, but I do not get that crash with the same distribution and kernels on another system with an Intel processor.
Comment 13 infinality 2012-05-23 08:47:08 EDT
I'm running into this bug on F17 x86_64 with an AMD Phenom(tm) 8650 Triple-Core Processor.  I get "kernel BUG at include/linux/mm.h:402!" on vmware-vmx, and I have to do a hard reboot in order to continue using my system.
Comment 14 f92809 2012-05-23 16:31:32 EDT
Just as a stab in the dark, I tried updating the microcode using the information over on bug 814101

su -c 'yum update --enablerepo=updates-testing microcode_ctl-1.17-24.fc16'

to see if this would help. No luck, but I do have a new patch level reported now in /var/log/messages

messages:May 21 01:51:15 itgs-server01 kernel: [   11.625677] microcode: CPU10: new patch_level=0x010000d9

From the information so far showing up on the net, this bug hits many, but not all, AMD CPUs from FC16 early kernels on through FC17.  Apparently 22xx Opterons are avoiding the problem, while 24xx Opterons share the same fate as other AMDs.  Interesting.
Comment 15 Dave Jones 2012-05-24 10:25:03 EDT
*** Bug 824722 has been marked as a duplicate of this bug. ***
Comment 16 Per Nystrom 2012-05-28 17:12:04 EDT
For whatever it's worth: I'm running Fedora 16 on an AMD 9850 quad core system.  The kernel that came with the original spin, 3.1.0-7.fc16.x86_64, works just fine with VMware Workstation 8.0.x.

However, every kernel update since that one produces "kernel BUG at include/linux/mm.h:402!" when I try to start a VM guest.
Comment 17 infinality 2012-06-01 22:48:15 EDT
I guess I'd be interested in knowing whether this is something that Fedora devs are willing to pursue and deal with, regardless of blame.  If not, we will have to bark up a different tree.  At this point I'd just like to know where we stand, as I can't load the VMs that I've needed to for the past month or so.  

It may not be the right answer, but if Fedora devs can create a workaround or exception, it would be extremely appreciated by the AMD Opteron/Phenom community.  :)
Comment 18 Eric Brunson 2012-06-02 14:23:29 EDT
Just a "me too", I can confirm the bug running several of the test cases above on an AMD Phenom II X3 720.
Comment 19 f92809 2012-06-04 00:37:23 EDT
Well, seeing no clean solution coming in the near future, I bit the bullet and followed the instructions (mostly) at 
http://fedoraproject.org/wiki/Building_a_custom_kernel
to build a 3.3.7.1 custom fedora kernel with DEBUG_VM deactivated via menuconfig.  The results are the custom kernel works just fine with Workstation 8.0.x with my AMD Opteron 6128s.  So here are your options, in increasing desirability and decreasing probability, if you've had the same problem with your AMD CPUs:
(a) Do as I have done with your favorite 3.x.x.x kernel
(b) Wait for fedora to provide a kernel with DEBUG_VM turned off as is done with the other distros (see Dave Jones comment above)
(c) Wait until the kernelmeisters correct the issue with generic 3.x kernels for AMD CPUS with DEBUG_VM turned on.  Hello, AMD contributors to the kernel?
Comment 20 Stu 2012-06-13 05:45:12 EDT
Hello All

I too am seeing an issue in the a Virtualbox VM just aborts.  The only message that gets logged that I can find is below.  However I must say that my CPU is an intel i7.

un 13 10:30:51 work16 kernel: [10883.463873] VirtualBox[2726] general protection ip:30f567e117 sp:7fd91c020c20 error:0 in libc-2.14.90.so[30f5600000+1ad000]

This is an intermittent error but I am sure that it will not be too long before I experience fs corruption in my VM.

Regards

Stu

[root@work16 HardDisks]# uname -a
Linux work16 3.3.7-1.fc16.x86_64 #1 SMP Tue May 22 13:59:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux


[root@work16 HardDisks]# lspci
00:00.0 Host bridge: Intel Corporation Core Processor DRAM Controller (rev 02)
00:01.0 PCI bridge: Intel Corporation Core Processor PCI Express x16 Root Port (rev 02)
00:16.0 Communication controller: Intel Corporation 5 Series/3400 Series Chipset HECI Controller (rev 06)
00:16.3 Serial controller: Intel Corporation 5 Series/3400 Series Chipset KT Controller (rev 06)
00:19.0 Ethernet controller: Intel Corporation 82577LM Gigabit Network Connection (rev 05)
00:1a.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05)
00:1b.0 Audio device: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio (rev 05)
00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 (rev 05)
00:1c.1 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 (rev 05)
00:1c.2 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 3 (rev 05)
00:1c.3 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 4 (rev 05)
00:1d.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation Mobile 5 Series Chipset LPC Interface Controller (rev 05)
00:1f.2 RAID bus controller: Intel Corporation Mobile 82801 SATA RAID Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller (rev 05)
00:1f.6 Signal processing controller: Intel Corporation 5 Series/3400 Series Chipset Thermal Subsystem (rev 05)
01:00.0 VGA compatible controller: nVidia Corporation GT218 [NVS 3100M] (rev a2)
01:00.1 Audio device: nVidia Corporation High Definition Audio Controller (rev a1)
03:00.0 Network controller: Intel Corporation Centrino Advanced-N 6200 (rev 35)
04:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 03)
04:00.4 FireWire (IEEE 1394): Ricoh Co Ltd FireWire Host Controller (rev 03)
3f:00.0 Host bridge: Intel Corporation Core Processor QuickPath Architecture Generic Non-core Registers (rev 02)
3f:00.1 Host bridge: Intel Corporation Core Processor QuickPath Architecture System Address Decoder (rev 02)
3f:02.0 Host bridge: Intel Corporation Core Processor QPI Link 0 (rev 02)
3f:02.1 Host bridge: Intel Corporation Core Processor QPI Physical 0 (rev 02)
3f:02.2 Host bridge: Intel Corporation Core Processor Reserved (rev 02)
3f:02.3 Host bridge: Intel Corporation Core Processor Reserved (rev 02)
Comment 21 Eric Smith 2012-06-13 10:58:10 EDT
Stu, this appears to me to be entirely unrelated to the problem reported in this bug. I suggest opening a new bug report for it.
Comment 22 Per Nystrom 2012-06-17 11:34:34 EDT
(In reply to comment #19)
> Well, seeing no clean solution coming in the near future, I bit the bullet
> and followed the instructions (mostly) at 
> http://fedoraproject.org/wiki/Building_a_custom_kernel
> to build a 3.3.7.1 custom fedora kernel with DEBUG_VM deactivated via
> menuconfig.  The results are the custom kernel works just fine with
> Workstation 8.0.x with my AMD Opteron 6128s.  So here are your options, in
> increasing desirability and decreasing probability, if you've had the same
> problem with your AMD CPUs:
> (a) Do as I have done with your favorite 3.x.x.x kernel
> (b) Wait for fedora to provide a kernel with DEBUG_VM turned off as is done
> with the other distros (see Dave Jones comment above)
> (c) Wait until the kernelmeisters correct the issue with generic 3.x kernels
> for AMD CPUS with DEBUG_VM turned on.  Hello, AMD contributors to the kernel?

I tried this with the source for kernel-3.3.8-1.fc16.x86_64 and still got the same kernel oops when starting a VMware guest.  Turning off CONFIG_DEBUG_VM had no apparent effect on my system.

# grep CONFIG_DEBUG_VM /boot/config-`uname -r`
# CONFIG_DEBUG_VM is not set

# grep "model name" /proc/cpuinfo
model name	: AMD Phenom(tm) 9850 Quad-Core Processor
model name	: AMD Phenom(tm) 9850 Quad-Core Processor
model name	: AMD Phenom(tm) 9850 Quad-Core Processor
model name	: AMD Phenom(tm) 9850 Quad-Core Processor
Comment 23 f92809 2012-06-17 19:44:43 EDT
Hi Per,

Just to be certain, did you follow all the steps to rebuild your kernel as described in the referenced URL?   Simply deactivating the CONFIG_DEBUG_VM switch will not do it for you unless you follow up by rebuilding the kernel with the resulting .config file. 

If you did, then something weirder than the DEBUG_VM switch being on is biting us, which was thought to be the main difference between fedora 16 3.x kernels and those used by other distros.  Note that I just started using workstation 8.0.4 (and the usual patch files) today with my 3.3.7.1 kernel built as described earlier.   It is working very well.
Comment 24 Per Nystrom 2012-06-17 22:18:44 EDT
(In reply to comment #23)
> Hi Per,
> 
> Just to be certain, did you follow all the steps to rebuild your kernel as
> described in the referenced URL?   Simply deactivating the CONFIG_DEBUG_VM
> switch will not do it for you unless you follow up by rebuilding the kernel
> with the resulting .config file. 
> 
> If you did, then something weirder than the DEBUG_VM switch being on is
> biting us, which was thought to be the main difference between fedora 16 3.x
> kernels and those used by other distros.  Note that I just started using
> workstation 8.0.4 (and the usual patch files) today with my 3.3.7.1 kernel
> built as described earlier.   It is working very well.

f92809,

Since it was just one parameter to change, I only loosely followed the instructions under Configure Kernel Options.  In particular, I did not use make menuconfig or make xconfig, instead just editing the .config file by hand and copying it to ~/rpmbuild/SOURCES/

As noted in my earlier comment, I did verify that the resultant kernel has the option switched off:

# grep CONFIG_DEBUG_VM /boot/config-`uname -r`
# CONFIG_DEBUG_VM is not set
Comment 25 f92809 2012-06-17 23:17:09 EDT
Hi Per,

If the DEBUG_VM parameter was switched off, then the usual error should not occur at line 402 in mm.h as the offending debug macro on that line -	
VM_BUG_ON(atomic_read(&page->_count) <= 0); - should not be activated.  

I strongly recommend that you follow the steps presented in the URL through "Install the New Kernel" to create your appropriately renamed custom kernel without taking shortcuts.  You can leave out compiling the firmware and the debug kernel to save time as you won't need either of these.

Best Regards,
Johnny
Comment 26 Per Nystrom 2012-06-18 00:11:15 EDT
(In reply to comment #25)
> Hi Per,
> 
> If the DEBUG_VM parameter was switched off, then the usual error should not
> occur at line 402 in mm.h as the offending debug macro on that line -	
> VM_BUG_ON(atomic_read(&page->_count) <= 0); - should not be activated.  
> 
> I strongly recommend that you follow the steps presented in the URL through
> "Install the New Kernel" to create your appropriately renamed custom kernel
> without taking shortcuts.  You can leave out compiling the firmware and the
> debug kernel to save time as you won't need either of these.
> 
> Best Regards,
> Johnny

Johnny,

This was totally my bad.  After retracing my steps, I realized that I had not recompiled the VMware modules using the local kernel-devel and kernel-headers RPMs I had generated.  VMware was still trying to use the modules I had built earlier on the stock 3.3.8-1 packages, which of course still had the CONFIG_DEBUG_VM=y set.

Anyway, I can confirm that building with CONFIG_DEBUG_VM switched off solves the kernel oops problem.  Sorry for the confusion!

-Per
Comment 27 f92809 2012-06-18 21:58:42 EDT
Hi Per,

Good to hear!  Glad you are able to use your VMs again as well.  Note that Mike over in the VMware community thread on this same topic, at 
http://communities.vmware.com/message/2040785#2040785 
had some success at the VMware level by doing what he thought turned off the AMD-V capabilities.   This would be much faster than a kernel recompile, but may have other side-effects as he pointed out.
Comment 28 infinality 2012-06-18 22:45:45 EDT
Confirmed that the workaround here works for me!
http://communities.vmware.com/message/2040785#2040785
Comment 29 Per Nystrom 2012-06-19 01:04:58 EDT
(In reply to comment #27)
> Hi Per,
> 
> Good to hear!  Glad you are able to use your VMs again as well.  Note that
> Mike over in the VMware community thread on this same topic, at 
> http://communities.vmware.com/message/2040785#2040785 
> had some success at the VMware level by doing what he thought turned off the
> AMD-V capabilities.   This would be much faster than a kernel recompile, but
> may have other side-effects as he pointed out.

I saw that and thought about trying it, but wouldn't turning off hardware assisted virtualization result in a performance hit?

By the way, I had been using my VMs all along on the 3.1.0-7.fc16.x86_64 kernel that came in the Fedora 16 media.  The good news is that now I can use the latest kernel and still run my VMs.  Thanks!

-Per
Comment 30 Eric Smith 2012-06-19 01:25:49 EDT
> I saw that and thought about trying it, but wouldn't turning off
> hardware assisted virtualization result in a performance hit?

Well, maybe, but not as much of a performance hit as when the VM doesn't work at all.

The performance benefit of hardware virtualization assistance in VMware isn't huge.
Comment 31 infinality 2012-06-19 08:37:24 EDT
(In reply to comment #30)
> > I saw that and thought about trying it, but wouldn't turning off
> > hardware assisted virtualization result in a performance hit?
> 
> Well, maybe, but not as much of a performance hit as when the VM doesn't
> work at all.
> 

Hah!  :)
Comment 32 Alberto Gonzalez 2012-07-23 05:47:04 EDT
I can confirm that this bug still persists in Fedora 17, last kernel 3.4.6-2.fc17.x86_64, running on AMD cpu.
In my experience, this problem was introduced in Fedora 16, kernel version 3.2.9-2.fc16.x86_64.
Comment 33 Charles Bradshaw 2012-08-01 10:19:51 EDT
I don't know if this is the same or maybe a different manifstation. The crash occours at random, although consistantly during the first desktop login of the day. It can be avoided by a startup in rescue mode followed by ^D.

$ cat /proc/version
Linux version 3.4.6-2.fc17.i686.PAE (mockbuild@x86-01.phx2.fedoraproject.org) (gcc version 4.7.0 20120507 (Red Hat 4.7.0-5) (GCC) ) #1 SMP Thu Jul 19 21:49:03 UTC 2012

/var/log/messages contains this snippet:
Aug  1 14:14:56 dell-laptop kernel: [  151.099733] ------------[ cut here ]------------
Aug  1 14:14:56 dell-laptop kernel: [  151.099814] kernel BUG at include/linux/mm.h:277!
Aug  1 14:14:56 dell-laptop kernel: [  151.099875] invalid opcode: 0000 [#1] SMP 
Aug  1 14:14:56 dell-laptop kernel: [  151.099932] Modules linked in: fuse ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc snd_usb_audio snd_usbmidi_lib snd_hwdep snd_rawmidi arc4 snd_intel8x0 snd_intel8x0m uvcvideo snd_ac97_codec snd_seq_device videobuf2_vmalloc videobuf2_memops videobuf2_core videodev ath5k ath ac97_bus snd_pcm media mac80211 snd_page_alloc ppdev parport_pc snd_timer iTCO_wdt 3c59x dell_laptop mii parport cfg80211 snd rfkill soundcore iTCO_vendor_support dcdbas microcode yenta_socket video radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]
Aug  1 14:14:56 dell-laptop kernel: [  151.100031] 
Aug  1 14:14:56 dell-laptop kernel: [  151.100031] Pid: 632, comm: Xorg Not tainted 3.4.6-2.fc17.i686.PAE #1 Dell Computer Corporation Latitude C640                   /      
Aug  1 14:14:56 dell-laptop kernel: [  151.100031] EIP: 0060:[<c0944a8d>] EFLAGS: 00213246 CPU: 0
Aug  1 14:14:56 dell-laptop kernel: [  151.100031] EIP is at put_page_testzero.part.7+0x3/0x5
Aug  1 14:14:56 dell-laptop kernel: [  151.100031] EAX: f6060140 EBX: f6060140 ECX: 00000002 EDX: 00000000
Aug  1 14:14:56 dell-laptop kernel: [  151.100031] ESI: f6060140 EDI: b5648000 EBP: f3699e10 ESP: f3699e10
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Aug  1 14:14:56 dell-laptop kernel: [  151.100031] CR0: 80050033 CR2: b55f1000 CR3: 36e20000 CR4: 000007f0
Aug  1 14:14:56 dell-laptop kernel: [  151.100031] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Aug  1 14:14:56 dell-laptop kernel: [  151.100031] DR6: ffff0ff0 DR7: 00000400
Aug  1 14:14:56 dell-laptop kernel: [  151.100031] Process Xorg (pid: 632, ti=f3698000 task=f3613240 task.ti=f3698000)
Aug  1 14:14:56 dell-laptop kernel: [  151.100031] Stack:
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  f3699e18 c050449c f3699e24 c05292a2 f3699f0c f3699e34 c0518267 ec616240
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  f6060140 f3699ec4 c0519689 0000a067 00000000 f3699f20 f8184d2a 00000001
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  00000000 00000000 00100073 b5693fff b5694000 f6e20010 00000000 b5693fff
Aug  1 14:14:56 dell-laptop kernel: [  151.100031] Call Trace:
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  [<c050449c>] put_page+0x3c/0x50
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  [<c05292a2>] free_page_and_swap_cache+0x22/0x50
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  [<c0518267>] __tlb_remove_page+0x47/0xa0
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  [<c0519689>] unmap_single_vma+0x419/0x700
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  [<f8184d2a>] ? drm_ioctl+0x3fa/0x480 [drm]
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  [<c051a100>] unmap_vmas+0x50/0x90
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  [<c051e7ed>] unmap_region+0x7d/0xf0
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  [<c051f149>] ? __split_vma+0xf9/0x1d0
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  [<c051f71e>] do_munmap+0x20e/0x2c0
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  [<c051f80d>] vm_munmap+0x3d/0x60
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  [<c0520e2d>] sys_munmap+0x1d/0x20
Aug  1 14:14:56 dell-laptop kernel: [  151.100031]  [<c095309f>] sysenter_do_call+0x12/0x28

Just prior to the crash.
Comment 34 Dave Jones 2012-10-23 11:34:24 EDT
# Mass update to all open bugs.

Kernel 3.6.2-1.fc16 has just been pushed to updates.
This update is a significant rebase from the previous version.

Please retest with this kernel, and let us know if your problem has been fixed.

In the event that you have upgraded to a newer release and the bug you reported
is still present, please change the version field to the newest release you have
encountered the issue with.  Before doing so, please ensure you are testing the
latest kernel update in that release and attach any new and relevant information
you may have gathered.

If you are not the original bug reporter and you still experience this bug,
please file a new report, as it is possible that you may be seeing a
different problem. 
(Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).
Comment 35 Justin M. Forbes 2012-11-14 10:24:14 EST
With no response, we are closing this bug under the assumption that it is no longer an issue. If you still experience this bug, please feel free to reopen the bug report.

Note You need to log in before you can comment on or make changes to this bug.