Bug 747738

Summary:	[abrt] kernel: [1136667.935710] kernel BUG at mm/huge_memory.c:1368!: TAINTED -------D
Product:	[Fedora] Fedora	Reporter:	Elliott Sales de Andrade <quantum.analyst>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED ERRATA	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	15	CC:	gansalmon, itamar, jlmagee, jonathan, kernel-maint, madhu.chinakonda
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Unspecified
Whiteboard:	abrt_hash:b883c5068f48831ca166b8e8dd4be016977c77b0
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-04-11 16:30:42 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Elliott Sales de Andrade 2011-10-20 21:29:36 UTC

abrt version: 2.0.3
architecture:   x86_64
cmdline:        ro root=/dev/mapper/vg_terraemotus-lv_root rd_LVM_LV=vg_terraemotus/lv_root rd_LVM_LV=vg_terraemotus/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rhgb quiet
comment:        Unknown. Was connected via SSH at the time, downloading a large repository over SVN.
component:      kernel
kernel:         2.6.40.6-0.fc15.x86_64
kernel_tainted: 128
kernel_tainted_long: Kernel has oopsed before.
os_release:     Fedora release 15 (Lovelock)
package:        kernel
reason:         [1136667.935710] kernel BUG at mm/huge_memory.c:1368!
time:           Thu Oct 20 16:40:02 2011

backtrace:
:[1136667.935710] kernel BUG at mm/huge_memory.c:1368!
:[1136667.935748] invalid opcode: 0000 [#1] SMP 
:[1136667.935787] CPU 2 
:[1136667.935804] Modules linked in: ipt_MASQUERADE iptable_mangle iptable_nat nf_nat tcp_lp tun ppdev parport_pc lp parport nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack fuse ebtable_nat ebtables xt_CHECKSUM bridge 8021q garp stp llc sunrpc cpufreq_ondemand acpi_cpufreq mperf bnep bluetooth rfkill ip6t_REJECT snd_hda_codec_hdmi snd_hda_codec_realtek virtio_net snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device iTCO_wdt snd_pcm iTCO_vendor_support snd_timer snd soundcore snd_page_alloc r8169 mii kvm_intel kvm joydev xhci_hcd serio_raw i2c_i801 microcode ipv6 usb_storage uas mxm_wmi wmi i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: nf_conntrack]
:[1136667.936487] 
:[1136667.936503] Pid: 53, comm: kswapd0 Not tainted 2.6.40.6-0.fc15.x86_64 #1 Gigabyte Technology Co., Ltd. Z68MA-D2H-B3/Z68MA-D2H-B3
:[1136667.936596] RIP: 0010:[<ffffffff8111d81e>]  [<ffffffff8111d81e>] split_huge_page+0x181/0x5ac
:[1136667.936667] RSP: 0018:ffff8804035fb9f0  EFLAGS: 00010297
:[1136667.936709] RAX: 0000000000000001 RBX: ffffea00009ca000 RCX: 000000000000f42d
:[1136667.936762] RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
:[1136667.936816] RBP: ffff8804035fba80 R08: 0000000000000002 R09: 0000ffff00066c0a
:[1136667.936870] R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff8803ed1d6000
:[1136667.936923] R13: fffffffffffffff2 R14: ffff8803f1773630 R15: ffff8803f1773630
:[1136667.936978] FS:  0000000000000000(0000) GS:ffff88041fa80000(0000) knlGS:0000000000000000
:[1136667.937039] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
:[1136667.937083] CR2: 00007f82acb5e000 CR3: 0000000001a03000 CR4: 00000000000426e0
:[1136667.937137] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
:[1136667.937190] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
:[1136667.937245] Process kswapd0 (pid: 53, threadinfo ffff8804035fa000, task ffff880403620000)
:[1136667.937305] Stack:
:[1136667.937323]  ffff8804035fba00 ffffffff81101200 ffff8804035fbab0 ffff8803f1773660
:[1136667.937388]  0000000000000003 4000000000000001 ffff8803f1773660 0000010009a28518
:[1136667.937453]  00000000035fbb78 ffff8803f1773640 0000000000000001 ffff8804035fbd98
:[1136667.937519] Call Trace:
:[1136667.937544]  [<ffffffff81101200>] ? page_unlock_anon_vma+0x15/0x17
:[1136667.937594]  [<ffffffff81106823>] add_to_swap+0x3f/0x88
:[1136667.937637]  [<ffffffff810e9cb4>] shrink_page_list+0x22a/0x6f4
:[1136667.937686]  [<ffffffff81120cf6>] ? mem_cgroup_del_lru+0x1d/0x21
:[1136667.937733]  [<ffffffff810f26c7>] ? __mod_zone_page_state+0x45/0x4f
:[1136667.937783]  [<ffffffff810e8db1>] ? update_isolated_counts+0x139/0x157
:[1136667.937835]  [<ffffffff810ea596>] shrink_inactive_list+0x230/0x399
:[1136667.937885]  [<ffffffff81488576>] ? _raw_spin_lock+0xe/0x10
:[1136667.937930]  [<ffffffff81137e63>] ? shrink_dentry_list+0x7c/0xb4
:[1136667.937977]  [<ffffffff810e3c94>] ? determine_dirtyable_memory+0x1a/0x23
:[1136667.938031]  [<ffffffff810eadd5>] shrink_zone+0x426/0x569
:[1136667.938075]  [<ffffffff810ebb4d>] balance_pgdat+0x2c1/0x574
:[1136667.938120]  [<ffffffff81242483>] ? __bitmap_weight+0x34/0x80
:[1136667.938167]  [<ffffffff810ec0b4>] kswapd+0x2b4/0x2f0
:[1136667.938208]  [<ffffffff8107064a>] ? remove_wait_queue+0x3a/0x3a
:[1136667.938255]  [<ffffffff810ebe00>] ? balance_pgdat+0x574/0x574
:[1136667.938300]  [<ffffffff8106ff5b>] kthread+0x84/0x8c
:[1136667.938340]  [<ffffffff8148fe24>] kernel_thread_helper+0x4/0x10
:[1136667.938387]  [<ffffffff8106fed7>] ? kthread_worker_fn+0x148/0x148
:[1136667.938435]  [<ffffffff8148fe20>] ? gs_change+0x13/0x13
:[1136667.938475] Code: 0c 4d 89 fe ff c0 39 45 b4 74 16 8b 53 0c 8b 75 b4 48 c7 c7 b4 57 7b 81 31 c0 ff c2 e8 53 22 36 00 8b 43 0c ff c0 39 45 b4 74 02 <0f> 0b 4c 8b 2b 4c 8b 7b 20 4c 89 e8 49 c1 ed 35 41 83 e5 03 48 
:[1136667.938751] RIP  [<ffffffff8111d81e>] split_huge_page+0x181/0x5ac
:[1136667.938801]  RSP <ffff8804035fb9f0>
:[1136668.004983] ---[ end trace fe4d53f3d25b4f27 ]---

Comment 1 John L Magee 2011-11-17 08:14:05 UTC

Appears to be same issue. The system degraded to the point of being unuseable over the following 2 hours.


Nov 16 21:08:06 tebreckvm10 kernel: [4321072.391322] mapcount 0 page_mapcount 1
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.391536] ------------[ cut here ]------------
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.391730] kernel BUG at mm/huge_memory.c:1368!
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.391923] invalid opcode: 0000 [#1] SMP
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.392117] CPU 0
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.392125] Modules linked in: ipt_MASQUERADE iptable_mangle iptable_nat nf_nat iptable_raw nf_conntrack_ipv4 nf_defr
ag_ipv4 fuse ebtable_nat ebtables xt_CHECKSUM vhost_net macvtap macvlan tun bridge stp llc bonding ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_co
nntrack ip6table_filter ip6_tables ses enclosure dcdbas microcode serio_raw ghes hed joydev i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_co
re igb dca virtio_net kvm_intel kvm ipv6 raid1 megaraid_sas [last unloaded: nf_defrag_ipv4]
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.393764]
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.393946] Pid: 4583, comm: postgres Not tainted 2.6.40.4-5.fc15.x86_64 #1 Dell                   PowerEdge C2100
    /0P19C9
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.394340] RIP: 0010:[<ffffffff8111d662>]  [<ffffffff8111d662>] split_huge_page+0x181/0x5ac
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.394726] RSP: 0000:ffff88234d035788  EFLAGS: 00010297
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.394921] RAX: 0000000000000001 RBX: ffffea00011db000 RCX: 0000000000003dfe
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.395296] RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.395669] RBP: ffff88234d035818 R08: 0000000000000000 R09: 0000000000000000
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.396044] R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff880fce11cfd0
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.396417] R13: fffffffffffffff2 R14: ffff88234f9402d0 R15: ffff88234f9402d0
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.396790] FS:  00007f74dd5a87e0(0000) GS:ffff88243f200000(0000) knlGS:0000000000000000
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.397167] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.397363] CR2: 0000000006e00000 CR3: 0000000e25634000 CR4: 00000000000026e0
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.397736] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.398109] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.398484] Process postgres (pid: 4583, threadinfo ffff88234d034000, task ffff881450c81730)
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.398862] Stack:
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.399046]  ffff88234d035798 ffffffff81101034 ffff88234d035848 ffff88234f940300
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.399428]  0000000000000003 4000000000000001 ffff88234f940300 000001004a8e2020
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.399810]  000000004d035910 ffff88234f9402e0 0000000000000000 ffff88234d035b78
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.400194] Call Trace:
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.400384]  [<ffffffff81101034>] ? page_unlock_anon_vma+0x15/0x17
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.400585]  [<ffffffff81106653>] add_to_swap+0x3f/0x88
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.400784]  [<ffffffff810e9b4d>] shrink_page_list+0x22a/0x6f4
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.400985]  [<ffffffff81120b3a>] ? mem_cgroup_del_lru+0x1d/0x21
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.401186]  [<ffffffff810f24ff>] ? __mod_zone_page_state+0x45/0x4f
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.401387]  [<ffffffff810e8c4a>] ? update_isolated_counts+0x139/0x157
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.401589]  [<ffffffff810ea42f>] shrink_inactive_list+0x230/0x399
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.401789]  [<ffffffff810e3b37>] ? determine_dirtyable_memory+0x1a/0x23
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.401992]  [<ffffffff810eac17>] shrink_zone+0x3cf/0x50c
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.402190]  [<ffffffff810eb0bf>] do_try_to_free_pages+0x10c/0x34e
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.402393]  [<ffffffff810e2594>] ? get_page_from_freelist+0x60b/0x64e
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.402595]  [<ffffffff810eb582>] try_to_free_pages+0xad/0x100
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.402794]  [<ffffffff810e2c16>] __alloc_pages_nodemask+0x4d2/0x736
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.402997]  [<ffffffff8110f8bb>] alloc_pages_vma+0xf5/0xfa
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.403194]  [<ffffffff8111dcac>] do_huge_pmd_anonymous_page+0xbf/0x26c
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.403396]  [<ffffffff810f654f>] ? pmd_offset+0x19/0x3f
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.403592]  [<ffffffff810f98ae>] handle_mm_fault+0x120/0x1db
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.403792]  [<ffffffff8148b3cd>] do_page_fault+0x354/0x39b
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.403990]  [<ffffffff81042cb3>] ? set_next_entity+0x45/0x97
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.404190]  [<ffffffff81008842>] ? __switch_to+0x20e/0x220
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.404390]  [<ffffffff8123bce0>] ? rb_insert_color+0xb8/0xe1
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.404588]  [<ffffffff8104522f>] ? finish_task_switch+0x49/0xb7
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.404803]  [<ffffffffa0085a48>] ? kvm_on_user_return+0x65/0x6d [kvm]
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.405003]  [<ffffffff810d9d7c>] ? fire_user_return_notifiers+0x2d/0x39
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.405204]  [<ffffffff81488755>] page_fault+0x25/0x30
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.405399] Code: 0c 4d 89 fe ff c0 39 45 b4 74 16 8b 53 0c 8b 75 b4 48 c7 c7 a9 55 7b 81 31 c0 ff c2 e8 92 1e 36 00
8b 43 0c ff c0 39 45 b4 74 02 <0f> 0b 4c 8b 2b 4c 8b 7b 20 4c 89 e8 49 c1 ed 35 41 83 e5 03 48
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.406107] RIP  [<ffffffff8111d662>] split_huge_page+0x181/0x5ac
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.406306]  RSP <ffff88234d035788>
Nov 16 21:08:06 tebreckvm10 kernel: [4321072.406937] ---[ end trace dc6935e840e8825a ]---

Comment 2 John L Magee 2011-11-17 08:18:10 UTC

please raise the priority and severity to urgent

Comment 3 Chuck Ebbert 2011-11-18 07:08:27 UTC

Please try the 2.6.41.1-1 kernel update.

Comment 4 Chuck Ebbert 2011-11-29 12:53:53 UTC

Will close this bug in another week if there's no response to the request to try a 2.6.41 kernel.

Comment 5 John L Magee 2011-11-30 03:45:26 UTC

Is there a specific update in 2.6.41 that you believe addresses the issue? Or are we all shooting in the dark? We cannot upgrade until we position some new hardware in a couple of weeks. This occurred once in 3 months on one of six servers supporting similar workloads. so even after we upgrade, it will be difficult to declare victory.

Comment 6 Josh Boyer 2011-11-30 12:40:22 UTC

(In reply to comment #5)
> Is there a specific update in 2.6.41 that you believe addresses the issue? Or
> are we all shooting in the dark? We cannot upgrade until we position some new
> hardware in a couple of weeks. This occurred once in 3 months on one of six
> servers supporting similar workloads. so even after we upgrade, it will be
> difficult to declare victory.

It's an entirely new kernel release that contains many fixes across the board, including the MM subsystems.  While it is indeed a small shot in the dark, it is not an unreasonable one.

More importantly, F15 has move to 2.6.41 so there will be no more 2.6.40.x updates.  If this still needs fixing, it needs fixing in 2.6.41.

Comment 7 Dave Jones 2012-04-11 16:30:42 UTC

I think this was fixed in 1c641e84719429bbfe62a95ed3545ee7fe24408f upstream.

2.6.42.9-2.fc15 and newer should have this fixed.