Bug 451068 - ext3: oops in do_split, miscompilation with gcc 4.3.1
Summary: ext3: oops in do_split, miscompilation with gcc 4.3.1
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: gcc
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 451487 451546 451573 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-06-12 17:01 UTC by Eric Sandeen
Modified: 2008-06-26 04:43 UTC (History)
11 users (show)

Fixed In Version: 4.3.1-3
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-06-26 04:43:42 UTC


Attachments (Terms of Use)
first part of oops (1.03 MB, image/jpeg)
2008-06-12 17:02 UTC, Eric Sandeen
no flags Details
2nd part of oops (1.04 MB, image/jpeg)
2008-06-12 17:03 UTC, Eric Sandeen
no flags Details
3rd part of oops (1.04 MB, image/jpeg)
2008-06-12 17:03 UTC, Eric Sandeen
no flags Details
preprocessed namei.i from 2.6.26-0.57.rc5.git3.fc10.i686 (715.85 KB, text/plain)
2008-06-13 04:19 UTC, Eric Sandeen
no flags Details
do_split disassembly from 4.3.0 (19.25 KB, text/plain)
2008-06-13 04:21 UTC, Eric Sandeen
no flags Details
do_split disassembly from 4.3.1 (19.25 KB, text/plain)
2008-06-13 04:22 UTC, Eric Sandeen
no flags Details


Links
System ID Priority Status Summary Last Updated
GNU Compiler Collection 36533 None None None Never

Description Eric Sandeen 2008-06-12 17:01:47 UTC
clumens & jeremy both hit this... see soon-to-be-attached jpegs

Comment 1 Eric Sandeen 2008-06-12 17:02:42 UTC
Created attachment 309103 [details]
first part of oops

Comment 2 Eric Sandeen 2008-06-12 17:03:12 UTC
Created attachment 309105 [details]
2nd part of oops

Comment 3 Eric Sandeen 2008-06-12 17:03:34 UTC
Created attachment 309106 [details]
3rd part of oops

Comment 5 Eric Sandeen 2008-06-12 20:34:31 UTC
actually I'll take this, I think it's my fault and I can reproduce it :)

Comment 6 Eric Sandeen 2008-06-12 22:15:36 UTC
I had a hunch that it might be gcc's fault; all the oopsing kernels were built
on shiny new 4.3.1, I tested 4.3.0 and had no problems.

Thanks to Roland for all his help looking into this one....

<roland> the bug is that for ptr[-1].size it went from *(short*)&ptr[-1].size to
*(long*)&ptr[-1].size 
<roland> it's gcc's fault

I'll get a proper gcc bug report filed tonight or tomorrow... in the meantime
looks like gcc 4.3.1 in rawhide is slightly busted...

-Eric

Comment 7 Eric Sandeen 2008-06-13 04:16:49 UTC
This is with:

[root@magnesium ~]# rpm -q gcc
gcc-4.3.1-1.i386

[root@magnesium ~]# gcc -v
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla
--enable-bootstrap --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk
--disable-dssi --enable-plugin
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
--with-cpu=generic --build=i386-redhat-linux
Thread model: posix
gcc version 4.3.1 20080609 (Red Hat 4.3.1-1) (GCC) 



Comment 8 Eric Sandeen 2008-06-13 04:19:47 UTC
Created attachment 309166 [details]
preprocessed namei.i from 2.6.26-0.57.rc5.git3.fc10.i686

Comment 9 Eric Sandeen 2008-06-13 04:21:53 UTC
Created attachment 309167 [details]
do_split disassembly from 4.3.0

Comment 10 Eric Sandeen 2008-06-13 04:22:37 UTC
Created attachment 309168 [details]
do_split disassembly from 4.3.1

Comment 11 Eric Sandeen 2008-06-13 04:28:44 UTC
The interesting bit:

        for (i = count-1; i >= 0; i--) {
                /* is more than half of this entry in 2nd half of the block? */
                if (size + map[i].size/2 > blocksize/2)
     906:       8b 7d a0                mov    -0x60(%ebp),%edi
     909:       31 f6                   xor    %esi,%esi
     90b:       31 d2                   xor    %edx,%edx
     90d:       8b 45 d4                mov    -0x2c(%ebp),%eax
     910:       8b 5d 98                mov    -0x68(%ebp),%ebx
     913:       d1 ef                   shr    %edi
     915:       8d 4c 18 fe             lea    -0x2(%eax,%ebx,1),%ecx
     919:       66 8b 19                mov    (%ecx),%bx

The only difference between compilers seems to be %bx vs. %ebx on this last line.

map[i].size is a u16, and it looks like what is happening is that if it loads 4
bytes instead of 2, it crosses the page boundary and we go "BUG: unable to
handle kernel paging request at <first byte in next page>"

Thanks,
-Eric

Comment 12 Jakub Jelinek 2008-06-13 06:48:29 UTC
What exact gcc options were used to compile namei.i?

Comment 13 Eric Sandeen 2008-06-13 13:18:11 UTC
Sorry, knew I was forgetting something:

  gcc -Wp,-MD,/root/ext3/.namei.o.d  -nostdinc -isystem
/usr/lib/gcc/i386-redhat-linux/4.3.1/include -D__KERNEL__ -Iinclude  -include
include/linux/autoconf.h -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs
-fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Os  
-fno-stack-protector -m32 -msoft-float -mregparm=3 -freg-struct-return
-mpreferred-stack-boundary=2  -march=i686 -mtune=generic -mtune=generic
-ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe
-Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2
-mno-3dnow -Iinclude/asm-x86/mach-generic -Iinclude/asm-x86/mach-default
-fno-omit-frame-pointer -fno-optimize-sibling-calls -g
-Wdeclaration-after-statement -Wno-pointer-sign    -DMODULE -D"KBUILD_STR(s)=#s"
-D"KBUILD_BASENAME=KBUILD_STR(namei)"  -D"KBUILD_MODNAME=KBUILD_STR(ext3)" -c -o
/root/ext3/namei.o /root/ext3/namei.c


Comment 14 Eric Sandeen 2008-06-13 13:19:18 UTC
Ah that was namei.o; here's namei.i just to be exact about what you asked:

  gcc -E -Wp,-MD,/root/ext3/.namei.i.d  -nostdinc -isystem
/usr/lib/gcc/i386-redhat-linux/4.3.1/include -D__KERNEL__ -Iinclude  -include
include/linux/autoconf.h -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs
-fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Os  
-fno-stack-protector -m32 -msoft-float -mregparm=3 -freg-struct-return
-mpreferred-stack-boundary=2  -march=i686 -mtune=generic -mtune=generic
-ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe
-Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2
-mno-3dnow -Iinclude/asm-x86/mach-generic -Iinclude/asm-x86/mach-default
-fno-omit-frame-pointer -fno-optimize-sibling-calls -g
-Wdeclaration-after-statement -Wno-pointer-sign    -DMODULE -D"KBUILD_STR(s)=#s"
-D"KBUILD_BASENAME=KBUILD_STR(namei)"  -D"KBUILD_MODNAME=KBUILD_STR(ext3)"   -o
/root/ext3/namei.i /root/ext3/namei.c

Comment 15 Jakub Jelinek 2008-06-13 17:21:07 UTC
Caused by
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=135124


Comment 16 Chris Lumens 2008-06-16 12:45:36 UTC
*** Bug 451573 has been marked as a duplicate of this bug. ***

Comment 17 Chris Lumens 2008-06-16 13:13:45 UTC
*** Bug 451546 has been marked as a duplicate of this bug. ***

Comment 18 Chris Lumens 2008-06-16 13:21:01 UTC
*** Bug 451487 has been marked as a duplicate of this bug. ***

Comment 19 Eric Sandeen 2008-06-17 02:58:19 UTC
Jakub, any ETA on a fix for this?  Should we un-tag gcc 4.3.1 from rawhide for now?

Thanks,
-Eric

Comment 20 G.Wolfe Woodbury 2008-06-19 21:29:42 UTC
meanwhile, as a workaround for rawhide installs, use ext2 instead of ext3 or ext4

it hits the ext4 filesystems as well.

Comment 21 Eric Sandeen 2008-06-19 21:33:44 UTC
Actually any ext* filesystem which enables the dir_index feature is likely
susceptible; another workaround would be to turn this feature off.

-Eric

Comment 22 Jakub Jelinek 2008-06-25 09:51:37 UTC
Should be fixed in gcc-4.3.1-3.

Comment 23 Eric Sandeen 2008-06-26 04:43:42 UTC
WORKSFORME, I rebuilt the latest kernel w/ this version, did a big yum update,
no problems.

I think 2.6.26-0.93.rc8.fc10 should be the first kernel built with this.

Thanks!

-Eric


Note You need to log in before you can comment on or make changes to this bug.