clumens & jeremy both hit this... see soon-to-be-attached jpegs
Created attachment 309103 [details] first part of oops
Created attachment 309105 [details] 2nd part of oops
Created attachment 309106 [details] 3rd part of oops
See also: http://www.kerneloops.org/search.php?search=do_split&btnG=Function+Search
actually I'll take this, I think it's my fault and I can reproduce it :)
I had a hunch that it might be gcc's fault; all the oopsing kernels were built on shiny new 4.3.1, I tested 4.3.0 and had no problems. Thanks to Roland for all his help looking into this one.... <roland> the bug is that for ptr[-1].size it went from *(short*)&ptr[-1].size to *(long*)&ptr[-1].size <roland> it's gcc's fault I'll get a proper gcc bug report filed tonight or tomorrow... in the meantime looks like gcc 4.3.1 in rawhide is slightly busted... -Eric
This is with: [root@magnesium ~]# rpm -q gcc gcc-4.3.1-1.i386 [root@magnesium ~]# gcc -v Using built-in specs. Target: i386-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-cpu=generic --build=i386-redhat-linux Thread model: posix gcc version 4.3.1 20080609 (Red Hat 4.3.1-1) (GCC)
Created attachment 309166 [details] preprocessed namei.i from 2.6.26-0.57.rc5.git3.fc10.i686
Created attachment 309167 [details] do_split disassembly from 4.3.0
Created attachment 309168 [details] do_split disassembly from 4.3.1
The interesting bit: for (i = count-1; i >= 0; i--) { /* is more than half of this entry in 2nd half of the block? */ if (size + map[i].size/2 > blocksize/2) 906: 8b 7d a0 mov -0x60(%ebp),%edi 909: 31 f6 xor %esi,%esi 90b: 31 d2 xor %edx,%edx 90d: 8b 45 d4 mov -0x2c(%ebp),%eax 910: 8b 5d 98 mov -0x68(%ebp),%ebx 913: d1 ef shr %edi 915: 8d 4c 18 fe lea -0x2(%eax,%ebx,1),%ecx 919: 66 8b 19 mov (%ecx),%bx The only difference between compilers seems to be %bx vs. %ebx on this last line. map[i].size is a u16, and it looks like what is happening is that if it loads 4 bytes instead of 2, it crosses the page boundary and we go "BUG: unable to handle kernel paging request at <first byte in next page>" Thanks, -Eric
What exact gcc options were used to compile namei.i?
Sorry, knew I was forgetting something: gcc -Wp,-MD,/root/ext3/.namei.o.d -nostdinc -isystem /usr/lib/gcc/i386-redhat-linux/4.3.1/include -D__KERNEL__ -Iinclude -include include/linux/autoconf.h -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Os -fno-stack-protector -m32 -msoft-float -mregparm=3 -freg-struct-return -mpreferred-stack-boundary=2 -march=i686 -mtune=generic -mtune=generic -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Iinclude/asm-x86/mach-generic -Iinclude/asm-x86/mach-default -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -Wdeclaration-after-statement -Wno-pointer-sign -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(namei)" -D"KBUILD_MODNAME=KBUILD_STR(ext3)" -c -o /root/ext3/namei.o /root/ext3/namei.c
Ah that was namei.o; here's namei.i just to be exact about what you asked: gcc -E -Wp,-MD,/root/ext3/.namei.i.d -nostdinc -isystem /usr/lib/gcc/i386-redhat-linux/4.3.1/include -D__KERNEL__ -Iinclude -include include/linux/autoconf.h -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Os -fno-stack-protector -m32 -msoft-float -mregparm=3 -freg-struct-return -mpreferred-stack-boundary=2 -march=i686 -mtune=generic -mtune=generic -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Iinclude/asm-x86/mach-generic -Iinclude/asm-x86/mach-default -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -Wdeclaration-after-statement -Wno-pointer-sign -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(namei)" -D"KBUILD_MODNAME=KBUILD_STR(ext3)" -o /root/ext3/namei.i /root/ext3/namei.c
Caused by http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=135124
*** Bug 451573 has been marked as a duplicate of this bug. ***
*** Bug 451546 has been marked as a duplicate of this bug. ***
*** Bug 451487 has been marked as a duplicate of this bug. ***
Jakub, any ETA on a fix for this? Should we un-tag gcc 4.3.1 from rawhide for now? Thanks, -Eric
meanwhile, as a workaround for rawhide installs, use ext2 instead of ext3 or ext4 it hits the ext4 filesystems as well.
Actually any ext* filesystem which enables the dir_index feature is likely susceptible; another workaround would be to turn this feature off. -Eric
Should be fixed in gcc-4.3.1-3.
WORKSFORME, I rebuilt the latest kernel w/ this version, did a big yum update, no problems. I think 2.6.26-0.93.rc8.fc10 should be the first kernel built with this. Thanks! -Eric