Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 646384 - kernel BUG at mm/migrate.c:113!
kernel BUG at mm/migrate.c:113!
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.0
Unspecified Unspecified
urgent Severity urgent
: rc
: ---
Assigned To: Andrea Arcangeli
Caspar Zhang
: Regression, ZStream
Depends On: Rhel6KvmTier1
Blocks: 647391
  Show dependency treegraph
 
Reported: 2010-10-25 05:42 EDT by Qian Cai
Modified: 2013-07-03 03:27 EDT (History)
8 users (show)

See Also:
Fixed In Version: kernel-2.6.32-81.el6
Doc Type: Bug Fix
Doc Text:
Running certain workload tests on a Non-Uniform Memory Architecture (NUMA) system could cause kernel panic at mm/migrate.c:113. This was due to a false positive BUG_ON. With this update, the false positive BUG_ON has been removed.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-05-19 08:01:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0542 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update 2011-05-19 07:58:07 EDT

  None (edit)
Description Qian Cai 2010-10-25 05:42:48 EDT
Description of problem:
kernel BUG at mm/migrate.c:113!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu63/cache/index2/shared_cpu_map
CPU 0 
Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT iptable_filter ip_tables bridge stp llc kvm_intel kvm autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core sg igb dca ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ahci megaraid_sas dm_mod [last unloaded: microcode]

Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT iptable_filter ip_tables bridge stp llc kvm_intel kvm autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core sg igb dca ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ahci megaraid_sas dm_mod [last unloaded: microcode]
Pid: 28103, comm: largepages15 Tainted: G        W  ----------------  2.6.32-76.el6.test.x86_64 #1 QSSC-S4R
RIP: 0010:[<ffffffff8115b4ea>]  [<ffffffff8115b4ea>] remove_migration_pte+0x20a/0x2f0
RSP: 0000:ffff88105d7c99a8  EFLAGS: 00010246
RAX: 8000000937e000e5 RBX: ffff880c6cc2cdc0 RCX: ffffea000732f1e0
RDX: ffff880bf61d9000 RSI: ffff8809021d2d40 RDI: 0000000000000000
RBP: ffff88105d7c9a08 R08: 00003ffffffff000 R09: ffff880000000000
R10: ffffc00000000fff R11: ffff880894b0edf0 R12: 00007ffff7ce4000
R13: ffff8809021d2d40 R14: ffffea000f717138 R15: ffffffff8115b2e0
FS:  00007ffff7ff1700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007ffff7df1000 CR3: 000000086bc54000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process largepages15 (pid: 28103, threadinfo ffff88105d7c8000, task ffff88105b2c2080)
Stack:
 0000000000000000 00003ffffffff000 ffff88046c6891b8 ffffc00000000fff
<0> ffffea000000001e ffffea000f5075c0 800000020e8e4045 ffff880c6c8c52b8
<0> ffffea000f717138 ffffea000732f1e0 ffff880c6c8cd4d8 ffffffff8115b2e0
Call Trace:
 [<ffffffff8115b2e0>] ? remove_migration_pte+0x0/0x2f0
 [<ffffffff8113e8fe>] rmap_walk+0x16e/0x1c0
 [<ffffffff8115b892>] ? migrate_page_copy+0x102/0x1c0
 [<ffffffff8115c08d>] migrate_pages+0x48d/0x5d0
 [<ffffffff81152750>] ? compaction_alloc+0x0/0x370
 [<ffffffff811521ac>] compact_zone+0x4ec/0x630
 [<ffffffff81152591>] compact_zone_order+0xa1/0xe0
 [<ffffffff811526db>] try_to_compact_pages+0x10b/0x180
 [<ffffffff8111e6cc>] __alloc_pages_nodemask+0x55c/0x810
 [<ffffffff811505f4>] alloc_pages_vma+0x84/0x110
 [<ffffffff8113f1c0>] ? anon_vma_prepare+0x30/0x160
 [<ffffffff81167995>] do_huge_pmd_anonymous_page+0x135/0x340
 [<ffffffff811365b5>] handle_mm_fault+0x245/0x2b0
 [<ffffffff814cd8d3>] do_page_fault+0x123/0x3a0
 [<ffffffff814cb345>] page_fault+0x25/0x30
Code: 48 09 c6 48 89 f2 48 c1 ea 3b 83 fa 1e 74 24 83 fa 1f 74 1f 48 8b 45 c8 66 ff 00 66 66 90 e9 06 ff ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b 0f 1f 40 00 eb fa 48 b8 ff ff ff ff ff ff ff 07 48 21 c6 
RIP  [<ffffffff8115b4ea>] remove_migration_pte+0x20a/0x2f0
 RSP <ffff88105d7c99a8>

Version-Release number of selected component (if applicable):
kernel from RHBZ#622327#c81.

How reproducible:
unknown

Steps to Reproduce:
1. prepare a NUMA system (reproduced on a Nehalem-EX system).
2. threade_memtest+oom+kernelbuild+kvm workloads.
3. reproducer from RHBZ#642570 and modify largepages15.c to use KSM.
# for i in `seq 1 60`; do ./largepages15 & done
  
Actual results:
panic

Expected results:
No panic.

Additional info:
Unfortunately, kdump did not work in this case so no vmcore captured.
Comment 3 Andrea Arcangeli 2010-10-25 13:39:23 EDT
Fix posted to rhkernel-list with Message-ID: <20101025173439.GM910@random.random>

I removed the false positive BUG_ON and introduced one new VM_BUG_ON in a s/!pmd_present/pmd_none/ related place, the VM_BUG_ON introduced will be converted to BUG_ON to exercise it in the build that I will provide to QA.

The build system I use has disk full problem, as soon as it's fixed I'll provide a build with patch included. Thanks!
Comment 4 Andrea Arcangeli 2010-10-25 14:11:58 EDT
Build with fix in comment #3 included (with VM_BUG_ON converted to BUG_ON) here:

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2850419
Comment 5 RHEL Product and Program Management 2010-10-26 06:49:26 EDT
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.
Comment 12 Aristeu Rozanski 2010-11-12 14:14:19 EST
Patch(es) available on kernel-2.6.32-82.el6
Comment 18 Martin Prpič 2011-05-09 08:21:56 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Running certain workload tests on a Non-Uniform Memory Architecture (NUMA) system could cause kernel panic at mm/migrate.c:113. This was due to a false positive BUG_ON. With this update, the false positive BUG_ON has been removed.
Comment 19 errata-xmlrpc 2011-05-19 08:01:44 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Note You need to log in before you can comment on or make changes to this bug.