Bug 620874

Summary: kernel BUG at mm/huge_memory.c:1269!
Product: Red Hat Enterprise Linux 6 Reporter: Alex Williamson <alex.williamson>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: aarcange, alex.williamson, riel
Target Milestone: rcKeywords: RHELNAK
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-09-02 16:45:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
panic log
none
panic none

Description Alex Williamson 2010-08-03 16:46:31 UTC
Created attachment 436326 [details]
panic log

Description of problem:
kernel panic, see attachment

Version-Release number of selected component (if applicable):
kernel-2.6.32-54.el6.x86_64

How reproducible:
unknown

Steps to Reproduce:
1. unknown, happened while shutting down kvm guest
2.
3.
  
Actual results:
host panic

Expected results:
no panic

Additional info:

Comment 2 RHEL Program Management 2010-08-03 17:08:05 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 3 RHEL Program Management 2010-08-18 21:25:40 UTC
Thank you for your bug report. This issue was evaluated for inclusion
in the current release of Red Hat Enterprise Linux. Unfortunately, we
are unable to address this request in the current release. Because we
are in the final stage of Red Hat Enterprise Linux 6 development, only
significant, release-blocking issues involving serious regressions and
data corruption can be considered.

If you believe this issue meets the release blocking criteria as
defined and communicated to you by your Red Hat Support representative,
please ask your representative to file this issue as a blocker for the
current release. Otherwise, ask that it be evaluated for inclusion in
the next minor release of Red Hat Enterprise Linux.

Comment 4 Alex Williamson 2010-09-01 21:19:10 UTC
Created attachment 442499 [details]
panic

Same panic on .70

Comment 6 Alex Williamson 2010-09-01 21:35:39 UTC
Before the most recent oops, I see:

mapcount 2 page_mapcount 3

Comment 7 Alex Williamson 2010-09-01 22:47:08 UTC
The VM I was running was using a patch, upstream qemu-kvm with the following options:

-enable-kvm -m 2000 -smp 2,sockets=2,cores=1,threads=1 -name rhel6vm -uuid 1e3f234d-338f-e438-1d43-14393856409c -nodefconfig -nodefaults -monitor stdio -rtc base=utc -boot c -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/var/lib/libvirt/images/VMs/rhel6vm.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -usb -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -netdev tap,script=/home/alwillia/bin/br0-ifup,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:5f:78:73,bus=pci.0,addr=0x3,txtimer=1 -vnc :1 -no-kvm-irqchip -no-kvm-pit

Host system is a 12G, single socket Xeon W3520 (nehalem class)

Comment 8 Andrea Arcangeli 2010-09-02 16:44:49 UTC
Hello,

I build a kernel that includes the fix for bug 627591 . I also added some debug code specific for this bugreport so if this happens again we'll know more of what's going on, but you'll have to include the output before the oops too (I'll be printed as 1 line before the page_mapcount %d mapcount %d line).

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2729732

Please use this build until you can reproduce again. I've absolutely never seen this myself, and I only recall one report from CAI Qian in bug 622327 where one stack trace is identical to yours. I thought 622327 was bad hardware. It also worth comparing the CPU and systems you're using to be sure it's not the same cpu that leads to bad results considering THP is stressing bits of the CPU that normally wouldn't be stressed to this extent.

It probably can be explained as the slab RCU race condition fixed by Hugh. I'm not 100% sure though, so we need more testing with the above build to be sure. If it's that bug you need to stress the slab a lot.

I'll mark this as duplicate of 622327 because that bug also has another different stack trace within page_lock_anon_vma context (the very function patched by the fix of bug 627591).

Comment 9 Andrea Arcangeli 2010-09-02 16:45:17 UTC

*** This bug has been marked as a duplicate of bug 622327 ***

Comment 10 Andrea Arcangeli 2010-09-02 18:02:46 UTC
debug code failed build in some arch, submitted now build as in comment #8

https://brewweb.devel.redhat.com/taskinfo?taskID=2729988