Bug 456101

Summary: F10 pv_ops xen: ext3:do_split() oops during yum update on i686
Product: [Fedora] Fedora Reporter: Mark McLoughlin <markmc>
Component: kernel-xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: esandeen, jakub, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-21 16:54:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 442569    
Attachments:
Description Flags
config-2.6.27-0.2.rc0.git6.fc10.i686.xen none

Description Mark McLoughlin 2008-07-21 15:20:09 UTC
With kernel-xen-2.6.26-0.1.rc6.git2.fc10.i686

Seen this a few times on i686 now, but not on x86_64. I don't have a better
reproducer than "during a yum update":

BUG: unable to handle kernel paging request at c7553000
IP: [<e08a0109>] :ext3:do_split+0x1f4/0x41f
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: bridge bnep rfcomm l2cap bluetooth autofs4 sunrpc ipt_REJECT
nf_conntrack_ipv4 iptable_filter ip_tables i\
p6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter
ip6_tables x_tables ipv6 loop dm_multipath pcsp\
kr xen_netfront dm_snapshot dm_zero dm_mirror dm_log dm_mod xen_blkfront ext3
jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last u\
nloaded: microcode]

Pid: 1878, comm: yum Tainted: G        W (2.6.26-0.1.rc6.git2.fc10.i686.xen #1)
EIP: 0061:[<e08a0109>] EFLAGS: 00210206 CPU: 0
EIP is at do_split+0x1f4/0x41f [ext3]
EAX: c7552b20 EBX: 000004e0 ECX: c7552ffe EDX: 00000000
ESI: 00000000 EDI: 00000800 EBP: d7ce3da8 ESP: d7ce3d30
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
Process yum (pid: 1878, ti=d7ce3000 task=d95fc7a0 task.ti=d7ce3000)
Stack: df205000 d7ce3e3c dfaaaee4 db473000 000004e0 d7ce3d8c 00001000 ccd0e820 
       d6127de0 c7553000 df204000 c7552000 0000009c c7552ff8 4ab2d388 00000fd4 
       d7ce3dac e08a03a0 df204000 c7552b20 00000014 64c74e0c 8f48d87c 00000002 
Call Trace:
 [<e08a03a0>] ? add_dirent_to_buf+0x6c/0x26c [ext3]
 [<e08a0968>] ? ext3_add_entry+0x3c8/0x787 [ext3]
 [<c0650015>] ? _spin_unlock+0x1d/0x20
 [<c040484f>] ? xen_restore_fl_direct_end+0x0/0x1
 [<c044b567>] ? lock_acquire+0x84/0x90
 [<e08a1ef2>] ? ext3_rename+0x1b5/0x434 [ext3]
 [<c0499011>] ? vfs_rename+0x273/0x3d2
 [<c0650015>] ? _spin_unlock+0x1d/0x20
 [<c049a704>] ? sys_renameat+0x188/0x1f3
 [<c064ef35>] ? mutex_unlock+0x8/0xa
 [<c040484f>] ? xen_restore_fl_direct_end+0x0/0x1
 [<c064ef25>] ? __mutex_unlock_slowpath+0x105/0x10d
 [<c064ef35>] ? mutex_unlock+0x8/0xa
 [<c064ef35>] ? mutex_unlock+0x8/0xa
 [<c04090bf>] ? do_IRQ+0xac/0xc5
 [<c05582c2>] ? xen_evtchn_do_upcall+0xe4/0x111
 [<c049a781>] ? sys_rename+0x12/0x14
 [<c0406bfa>] ? syscall_call+0x7/0xb
 =======================
Code: 89 56 04 89 79 f8 89 f1 3b 4d d4 77 d7 85 c0 74 07 8b 75 bc 31 c0 eb ee 8b
7d a0 31 f6 31 d2 8b 45 d4 8b 5d 98 d1 ef 8\
d 4c 18 fe <8b> 19 83 e9 08 89 d8 66 d1 e8 0f b7 c0 8d 04 02 39 f8 77 08 0f 
EIP: [<e08a0109>] do_split+0x1f4/0x41f [ext3] SS:ESP 0069:d7ce3d30
---[ end trace 4eaa2a86a8e2da22 ]---


The git tree corresponding to this build is:

http://git.et.redhat.com/?p=linux-2.6-fedora-pvops.git;a=commit;h=8e2c4a66e3132aa5e5209906484f3a0ab50e7a44

Comment 1 Mark McLoughlin 2008-07-21 15:20:09 UTC
Created attachment 312270 [details]
config-2.6.27-0.2.rc0.git6.fc10.i686.xen

Comment 2 Mark McLoughlin 2008-07-21 16:01:08 UTC
See also bug #451068 and:

  http://www.kerneloops.org/search.php?search=do_split

That was a gcc bug supposedly fixed by gcc-4.3.1-3

However, looking at:

http://kojipkgs.fedoraproject.org/packages/kernel-xen-2.6/2.6.27/0.2.rc0.git6.fc10/data/logs/i686/root.log

this package was built with gcc-4.3.1-4



Comment 3 Jeremy Fitzhardinge 2008-07-21 16:04:27 UTC
So, to be clear, this oops happens only:
 - under Xen
 - on i386
 - in this place
?

The fault address looks perfectly reasonable, so I assume it's some kind of
use-after-free detected by DEBUG_PAGEALLOC.  I'll try to reproduce it, but at
first look it doesn't seem terribly Xen-specific.

Comment 4 Mark McLoughlin 2008-07-21 16:10:59 UTC
Yep, nevermind this one Jeremy - most probably a gcc bug

Comment 5 Mark McLoughlin 2008-07-21 16:35:34 UTC
Looking at the where it was previously mis-compiled, we don't seem to have the
same issue:

    72e1:       8b 7d a0                mov    -0x60(%ebp),%edi
    72e4:       31 f6                   xor    %esi,%esi
    72e6:       31 d2                   xor    %edx,%edx
    72e8:       8b 45 d4                mov    -0x2c(%ebp),%eax
    72eb:       8b 5d 98                mov    -0x68(%ebp),%ebx
    72ee:       d1 ef                   shr    %edi
    72f0:       8d 4c 18 fe             lea    -0x2(%eax,%ebx,1),%ecx
    72f4:       66 8b 19                mov    (%ecx),%bx

With the previous gcc-4.3.1 bug, this last line was:

    7109:	8b 19                	mov    (%ecx),%ebx

i.e. %ebx vs. %bx was apparently the problem previously


Comment 6 Mark McLoughlin 2008-07-21 16:54:54 UTC
Bah, this seems to have been a total mixup:

(In reply to comment #0)
> With kernel-xen-2.6.26-0.1.rc6.git2.fc10.i686
...
> Pid: 1878, comm: yum Tainted: G        W (2.6.26-0.1.rc6.git2.fc10.i686.xen #1)

I should have been running kernel-xen-2.6.27-0.2.rc0.git6.fc10.i686