Bug 433286
Summary: | [ext4dev] Unable to handle kernel paging request at 0xffff810055481000 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Joachim Frieben <jfrieben> | ||||||
Component: | kernel | Assignee: | Eric Sandeen <esandeen> | ||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | low | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | rawhide | CC: | cebbert, davej, jarod, jonstanley | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-02-20 21:42:15 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Joachim Frieben
2008-02-18 12:32:03 UTC
Created attachment 295163 [details]
Kernel oops section in "messages" for kernel 2.6.25-0.50.rc2.fc9
Reassigning to Eric. Joachim, thanks for the new bug. I don't see that this is at all similar to bug 428329... what am I missing? I'll look into this. -Eric (In reply to comment #3) There is this same "BUG: unable to handle kernel paging request at virtual address .." message and some "ext3" related enries. Maybe rather generic, so just forget about that part of my bug report. Does it seem to matter which rpm you're rebuilding? Jarod has been doing mock builds on ext4 w/o trouble, and I just did a quick test of an e2fsprogs rebuild under 2.6.25-rc1 on ext4dev w/o problems. So... Most of my building on top of ext4 has been under 2.6.24.x rawhide kernels. This morning, a simple scp of a file onto the ext4 volume, now running 2.6.25-rc1-git2 or so, and I believe I hit the exact same oops. My oops output: http://people.redhat.com/jwilson/misc/ext4-go-boom.txt Ok, I can hit this too. On a "vanilla" 2.6.25-rc2 kernel from rawhide. On my own 2.6.25-rc1 kernel, seems I don't hit it, testing "vanilla" 2.6.25-rc1 from rawhide now. I suppose I'll go look at the oopsing function, too :) Created attachment 295297 [details]
my oops
Here's an oops I hit with all the mballoc stuff uninlined; shows a bit more
about how we got there:
ext4dev:ext4_mb_regular_allocator
ext4dev:ext4_mb_simple_scan_group
find_next_zero_bit
So, this is a little weird. This is the mballoc code trying to find bits set in a bitmap at the end of a page. Some test code that does similar things as far as the bitmap testing goes: unsigned long *p; unsigned long *p2; unsigned long bit; p = kzalloc(8192, GFP_KERNEL); /* set first 4k to 1's, no 0 'til 4098 bytes in */ memset(p, 0xFFFFFFFF, 4098); p2 = (unsigned long *)((char *)p + 4092); printk("p at %p, p2 at %p\n", p, p2); /* search within 16 bits (2 bytes) from p2 for a zero */ bit = find_next_zero_bit(p2, 16, 0); printk("found 0 bit at offset %lu\n", bit); ... and this finds bit 48... which means it has gone into the next page. We asked to search 16 bits (2 bytes) at 4 bytes from the end of the page, but it continued into the next. This (walking off the page) is what was causing the oops I think. This is not the behavior I expect from find_next_zero_bit, so I'm confused here. The semantics of find_next_zero_bit() are not very well defined. It is assuming it can access at least one unsigned long starting at offset though, that's for sure. For what it's worth, the generic implementation seems to behave differently, and just returns 16 ("no zeros found") like I'd expect. I'm not sure offhand if ext4 or find_next_zero_bit needs to be fixed here; also not quite sure why this was working up 'til now. I'll see if we can work around it in ext4 for starters, at least. Joy. ext4 used to work around it already, but it was taken out because it wasn't aesthetically pleasing, or something. I've committed a patch to rawhide which should fix this (along with a few other issues) Next kernel build should have it... Thanks for pointing it out, -Eric kernel-2.6.25-0.61.rc2.git4.fc9 works fine again. Thanks! |