Red Hat Bugzilla – Bug 433286
[ext4dev] Unable to handle kernel paging request at 0xffff810055481000
Last modified: 2008-02-21 09:56:34 EST
Description of problem:
It happens frequently that while building some RPM package as an ordinary
user in /home, a kernel oops is reported in the shell. Afterwards, it's
impossible to enter any new command in the shell. New terminal windows
remain blank. Likewise, it's impossible to log in as an ordinary user
after switching to a virtual console.
It's possible though to login is as "root", probably because in this case
the home directory is not located in /home which is the only ext4dev
volume on this system.
Furthermore, it's impossible to reboot/shutdown the system by any normal
means. It will acknowledge the request but simply sit there afterwards.
It's actually necessary to cut the power to shut it down.
Version-Release number of selected component (if applicable):
Most of the time.
Steps to Reproduce:
1. Build some RPM package as ordinary user in /home.
During compilation, the system beeps and reports a kernel oops.
Compilation completes successfully.
Symptoms are similar to those of bug 428329. The currently installed
compiler collection is gcc-4.3.0-0.9. If I remember correctly, this
issue showed up some time earlier this month.
Created attachment 295163 [details]
Kernel oops section in "messages" for kernel 2.6.25-0.50.rc2.fc9
Reassigning to Eric.
Joachim, thanks for the new bug. I don't see that this is at all similar to bug
428329... what am I missing?
I'll look into this.
(In reply to comment #3)
There is this same "BUG: unable to handle kernel paging request at
virtual address .." message and some "ext3" related enries. Maybe
rather generic, so just forget about that part of my bug report.
Does it seem to matter which rpm you're rebuilding?
Jarod has been doing mock builds on ext4 w/o trouble, and I just did a quick
test of an e2fsprogs rebuild under 2.6.25-rc1 on ext4dev w/o problems.
So... Most of my building on top of ext4 has been under 2.6.24.x rawhide
kernels. This morning, a simple scp of a file onto the ext4 volume, now running
2.6.25-rc1-git2 or so, and I believe I hit the exact same oops.
My oops output:
Ok, I can hit this too. On a "vanilla" 2.6.25-rc2 kernel from rawhide. On my
own 2.6.25-rc1 kernel, seems I don't hit it, testing "vanilla" 2.6.25-rc1 from
I suppose I'll go look at the oopsing function, too :)
Created attachment 295297 [details]
Here's an oops I hit with all the mballoc stuff uninlined; shows a bit more
about how we got there:
So, this is a little weird.
This is the mballoc code trying to find bits set in a bitmap at the end of a page.
Some test code that does similar things as far as the bitmap testing goes:
unsigned long *p;
unsigned long *p2;
unsigned long bit;
p = kzalloc(8192, GFP_KERNEL);
/* set first 4k to 1's, no 0 'til 4098 bytes in */
memset(p, 0xFFFFFFFF, 4098);
p2 = (unsigned long *)((char *)p + 4092);
printk("p at %p, p2 at %p\n", p, p2);
/* search within 16 bits (2 bytes) from p2 for a zero */
bit = find_next_zero_bit(p2, 16, 0);
printk("found 0 bit at offset %lu\n", bit);
... and this finds bit 48... which means it has gone into the next page.
We asked to search 16 bits (2 bytes) at 4 bytes from the end of the page,
but it continued into the next. This (walking off the page) is what was causing
the oops I think. This is not the behavior I expect from find_next_zero_bit, so
I'm confused here.
The semantics of find_next_zero_bit() are not very well defined. It is assuming
it can access at least one unsigned long starting at offset though, that's for sure.
For what it's worth, the generic implementation seems to behave differently, and
just returns 16 ("no zeros found") like I'd expect.
I'm not sure offhand if ext4 or find_next_zero_bit needs to be fixed here; also
not quite sure why this was working up 'til now.
I'll see if we can work around it in ext4 for starters, at least.
Joy. ext4 used to work around it already, but it was taken out because it
wasn't aesthetically pleasing, or something.
I've committed a patch to rawhide which should fix this (along with a few other
Next kernel build should have it...
Thanks for pointing it out,
kernel-2.6.25-0.61.rc2.git4.fc9 works fine again. Thanks!