Bug 451068

Summary: ext3: oops in do_split, miscompilation with gcc 4.3.1
Product: [Fedora] Fedora Reporter: Eric Sandeen <esandeen>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: arekm, atkac, clumens, jmccann, jmtaylor90, katzj, petersen, redwolfe, sangu.fedora, thethirddoorontheleft, yaneti
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 4.3.1-3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-06-26 00:43:42 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
first part of oops
none
2nd part of oops
none
3rd part of oops
none
preprocessed namei.i from 2.6.26-0.57.rc5.git3.fc10.i686
none
do_split disassembly from 4.3.0
none
do_split disassembly from 4.3.1 none

Description Eric Sandeen 2008-06-12 13:01:47 EDT
clumens & jeremy both hit this... see soon-to-be-attached jpegs
Comment 1 Eric Sandeen 2008-06-12 13:02:42 EDT
Created attachment 309103 [details]
first part of oops
Comment 2 Eric Sandeen 2008-06-12 13:03:12 EDT
Created attachment 309105 [details]
2nd part of oops
Comment 3 Eric Sandeen 2008-06-12 13:03:34 EDT
Created attachment 309106 [details]
3rd part of oops
Comment 5 Eric Sandeen 2008-06-12 16:34:31 EDT
actually I'll take this, I think it's my fault and I can reproduce it :)
Comment 6 Eric Sandeen 2008-06-12 18:15:36 EDT
I had a hunch that it might be gcc's fault; all the oopsing kernels were built
on shiny new 4.3.1, I tested 4.3.0 and had no problems.

Thanks to Roland for all his help looking into this one....

<roland> the bug is that for ptr[-1].size it went from *(short*)&ptr[-1].size to
*(long*)&ptr[-1].size 
<roland> it's gcc's fault

I'll get a proper gcc bug report filed tonight or tomorrow... in the meantime
looks like gcc 4.3.1 in rawhide is slightly busted...

-Eric
Comment 7 Eric Sandeen 2008-06-13 00:16:49 EDT
This is with:

[root@magnesium ~]# rpm -q gcc
gcc-4.3.1-1.i386

[root@magnesium ~]# gcc -v
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla
--enable-bootstrap --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk
--disable-dssi --enable-plugin
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
--with-cpu=generic --build=i386-redhat-linux
Thread model: posix
gcc version 4.3.1 20080609 (Red Hat 4.3.1-1) (GCC) 

Comment 8 Eric Sandeen 2008-06-13 00:19:47 EDT
Created attachment 309166 [details]
preprocessed namei.i from 2.6.26-0.57.rc5.git3.fc10.i686
Comment 9 Eric Sandeen 2008-06-13 00:21:53 EDT
Created attachment 309167 [details]
do_split disassembly from 4.3.0
Comment 10 Eric Sandeen 2008-06-13 00:22:37 EDT
Created attachment 309168 [details]
do_split disassembly from 4.3.1
Comment 11 Eric Sandeen 2008-06-13 00:28:44 EDT
The interesting bit:

        for (i = count-1; i >= 0; i--) {
                /* is more than half of this entry in 2nd half of the block? */
                if (size + map[i].size/2 > blocksize/2)
     906:       8b 7d a0                mov    -0x60(%ebp),%edi
     909:       31 f6                   xor    %esi,%esi
     90b:       31 d2                   xor    %edx,%edx
     90d:       8b 45 d4                mov    -0x2c(%ebp),%eax
     910:       8b 5d 98                mov    -0x68(%ebp),%ebx
     913:       d1 ef                   shr    %edi
     915:       8d 4c 18 fe             lea    -0x2(%eax,%ebx,1),%ecx
     919:       66 8b 19                mov    (%ecx),%bx

The only difference between compilers seems to be %bx vs. %ebx on this last line.

map[i].size is a u16, and it looks like what is happening is that if it loads 4
bytes instead of 2, it crosses the page boundary and we go "BUG: unable to
handle kernel paging request at <first byte in next page>"

Thanks,
-Eric
Comment 12 Jakub Jelinek 2008-06-13 02:48:29 EDT
What exact gcc options were used to compile namei.i?
Comment 13 Eric Sandeen 2008-06-13 09:18:11 EDT
Sorry, knew I was forgetting something:

  gcc -Wp,-MD,/root/ext3/.namei.o.d  -nostdinc -isystem
/usr/lib/gcc/i386-redhat-linux/4.3.1/include -D__KERNEL__ -Iinclude  -include
include/linux/autoconf.h -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs
-fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Os  
-fno-stack-protector -m32 -msoft-float -mregparm=3 -freg-struct-return
-mpreferred-stack-boundary=2  -march=i686 -mtune=generic -mtune=generic
-ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe
-Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2
-mno-3dnow -Iinclude/asm-x86/mach-generic -Iinclude/asm-x86/mach-default
-fno-omit-frame-pointer -fno-optimize-sibling-calls -g
-Wdeclaration-after-statement -Wno-pointer-sign    -DMODULE -D"KBUILD_STR(s)=#s"
-D"KBUILD_BASENAME=KBUILD_STR(namei)"  -D"KBUILD_MODNAME=KBUILD_STR(ext3)" -c -o
/root/ext3/namei.o /root/ext3/namei.c
Comment 14 Eric Sandeen 2008-06-13 09:19:18 EDT
Ah that was namei.o; here's namei.i just to be exact about what you asked:

  gcc -E -Wp,-MD,/root/ext3/.namei.i.d  -nostdinc -isystem
/usr/lib/gcc/i386-redhat-linux/4.3.1/include -D__KERNEL__ -Iinclude  -include
include/linux/autoconf.h -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs
-fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Os  
-fno-stack-protector -m32 -msoft-float -mregparm=3 -freg-struct-return
-mpreferred-stack-boundary=2  -march=i686 -mtune=generic -mtune=generic
-ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe
-Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2
-mno-3dnow -Iinclude/asm-x86/mach-generic -Iinclude/asm-x86/mach-default
-fno-omit-frame-pointer -fno-optimize-sibling-calls -g
-Wdeclaration-after-statement -Wno-pointer-sign    -DMODULE -D"KBUILD_STR(s)=#s"
-D"KBUILD_BASENAME=KBUILD_STR(namei)"  -D"KBUILD_MODNAME=KBUILD_STR(ext3)"   -o
/root/ext3/namei.i /root/ext3/namei.c
Comment 15 Jakub Jelinek 2008-06-13 13:21:07 EDT
Caused by
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=135124
Comment 16 Chris Lumens 2008-06-16 08:45:36 EDT
*** Bug 451573 has been marked as a duplicate of this bug. ***
Comment 17 Chris Lumens 2008-06-16 09:13:45 EDT
*** Bug 451546 has been marked as a duplicate of this bug. ***
Comment 18 Chris Lumens 2008-06-16 09:21:01 EDT
*** Bug 451487 has been marked as a duplicate of this bug. ***
Comment 19 Eric Sandeen 2008-06-16 22:58:19 EDT
Jakub, any ETA on a fix for this?  Should we un-tag gcc 4.3.1 from rawhide for now?

Thanks,
-Eric
Comment 20 G.Wolfe Woodbury 2008-06-19 17:29:42 EDT
meanwhile, as a workaround for rawhide installs, use ext2 instead of ext3 or ext4

it hits the ext4 filesystems as well.
Comment 21 Eric Sandeen 2008-06-19 17:33:44 EDT
Actually any ext* filesystem which enables the dir_index feature is likely
susceptible; another workaround would be to turn this feature off.

-Eric
Comment 22 Jakub Jelinek 2008-06-25 05:51:37 EDT
Should be fixed in gcc-4.3.1-3.
Comment 23 Eric Sandeen 2008-06-26 00:43:42 EDT
WORKSFORME, I rebuilt the latest kernel w/ this version, did a big yum update,
no problems.

I think 2.6.26-0.93.rc8.fc10 should be the first kernel built with this.

Thanks!

-Eric