Bug 177644

Summary: BUG() at at include/asm/xor.h:633! in domU
Product: [Fedora] Fedora Reporter: Stephen Tweedie <sct>
Component: kernel-xenAssignee: Juan Quintela <quintela>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: high    
Version: rawhideCC: chrisw, justin.conover
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-06 01:28:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stephen Tweedie 2006-01-12 17:02:07 UTC
Description of problem:

Loading the md raid modules in a domU results in 

BUG() at at include/asm/xor.h:633!

bug.

Version-Release number of selected component (if applicable):
kernel-xen-guest-2.6.15-1.29_FC5

How reproducible:
100%

Steps to Reproduce:
1. Run Jeremy's anaconda wrapper script to install a new domU: available at
http://people.redhat.com/~katzj/xenguest-install.py
2. Once the installer gets to disk probe stage, watch the oops.
  
Actual results:
kernel BUG at include/asm/xor.h:633!
invalid opcode: 0000 [#1]
SMP
Modules linked in: xor raid1 raid0 xennet sr_mod sd_mod scsi_mod cdrom squashfs
loop nfs nfs_acl lockd sunrpc vfat fat cramfs
CPU:    0
EIP:    0061:[<d412bb5e>]    Not tainted VLI
EFLAGS: 00010246   (2.6.15-1.29_FC5guest)
EIP is at xor_sse_2+0x1fc/0x20a [xor] <Space> selects | <F12> next screen
eax: 00000000   ebx: 00000000   ecx: d1c98000   edx: 00000000
esi: d1c95000   edi: 00000000   ebp: d412e5c4   esp: d22fff20
ds: 007b   es: 007b   ss: 0069
Process loader (pid: 148, threadinfo=d22fe000 task=d245e000)
Stack: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
       00000000 00000000 d412d4d5 00001000 d1c94000 d1c97000 ffff009a d1c97000
Call Trace:
 [<d412d4d5>] do_xor_speed+0x3e/0x8b [xor]
 [<d412d650>] calibrate_xor_block+0x12e/0x1f4 [xor]
 [<c012f49c>] sys_init_module+0xda/0x1fa
 [<c0106b01>] syscall_call+0x7/0xb
Code: 00 00 81 c6 00 01 00 00 81 c1 00 01 00 00 4a 0f 85 78 fe ff ff 0f ae f8 0f
10 04 24 0f 10 4c 24 10 0f 10 54 24 20 0f 10 5c 24 30 <0f> 0b 79 02 26 d7 12 d4
83 c4 40 5b 5e c3 57 56 53 83 ec 40 8b


Expected results:
No oops.

Comment 1 Stephen Tweedie 2006-02-07 22:40:52 UTC
Fixed in current rawhide.

Comment 2 Stephen Tweedie 2006-02-20 13:34:01 UTC
Looks like we didn't pick up the fix, or lost it: we need to track this down as
it is reported to still be present in kernel*-1.1955.

Comment 3 Justin Conover 2006-02-22 22:49:00 UTC
(In reply to comment #2)
> Looks like we didn't pick up the fix, or lost it: we need to track this down as
> it is reported to still be present in kernel*-1.1955.

If I comment out the raid5 /dev/md2 /home in /etc/fstab the xen kernel boots

lsmod | grep raid
raid1                  24769  2
# modprobe raid5
FATAL: Error inserting raid5
(/lib/modules/2.6.15-1.1955_FC5hypervisor/kernel/drivers/md/raid5.ko): Unknown
symbol in module, or unknown parameter (see dmesg)

# dmesg | grep raid5
raid5: automatically using best checksumming function: pIII_sse
raid5: Unknown symbol xor_block
raid5: Unknown symbol xor_block
raid5: Unknown symbol xor_block

It is a i386/SMP box

Comment 4 Brian Brock 2006-02-23 14:17:32 UTC
I'm seeing this in 2.6.15-1.1975_FC5guest running the installer via
xenguest-installer.py.  BUG occurs during probe for a dhcp address on eth0, and
is fatal to anaconda.

Comment 5 Brian Brock 2006-02-23 14:50:45 UTC
BUG really occurs when raid5 is loaded, like previous comments indicate,
according to Call trace.  raid5 is loaded  after network initialization in anaconda.

Call Trace:
 [<d10b9b13>] do_xor_speed+0x42/0x90 [xor]
 [<d10b9c94>] calibrate_xor_block+0x133/0x13f [xor]
 [<c0133600>] sys_init_module+0x16c9/0x1863
 [<c013b71e>] find_get_page+0x3c/0x41
 [<c026457b>] _spin_unlock+0x6/0x8
 [<c026457b>] _spin_unlock+0x6/0x8
 [<c0149d0a>] vma_link+0xbd/0xc5
 [<c0110f9a>] do_page_fault+0x1d5/0x633
 [<c0107a75>] syscall_call+0x7/0xb
Code: 00 00 81 c6 00 01 00 00 81 c2 00 01 00 00 49 0f 85 78 fe ff ff
0f ae f8 0f 10 04 24 0f 10 4c 24 10 0f 10 54 24 20 0f 10 5c 24 30
<0f> 0b 79 02 cf b6 0b d1 83 c4 40 5b 5e c3 57 56 53 83 ec 44 8b  
<4>raid5: Unknown symbol xor_block
raid6: Unknown symbol xor_block
JFS: nTxBlock = 2061, nTxLock = 16495
SGI XFS with ACLs, security attributes, large block numbers, no
debug enabled
SGI XFS Quota Management subsystem
device-mapper: 4.5.0-ioctl (2005-10-04) initialised:
dm-devel

Comment 6 Stephen Tweedie 2006-02-24 20:53:49 UTC
Still works for me on x86_64 but not on i386.  

Comment 8 Justin Conover 2006-03-05 04:38:07 UTC
Looks like the 2009 kernel is now working on xen/smp/i386

 uname -rmv
2.6.15-1.2009.4.2_FC5hypervisor #1 SMP Thu Mar 2 18:45:34 EST 2006 i686
 df -h /dev/md2
Filesystem            Size  Used Avail Use% Mounted on
/dev/md2              688G  490G  163G  76% /home

/sbin/lsmod | grep raid
raid5                  27713  1
xor                    18505  1 raid5
raid1                  24769  2

I didn't try to test the last few kernels, so I'm not sure which one it might
have been, but I thought in one of the last rawhide reports about kernels
mentioned something about xen/xor so I just had time to test it tonight.

Comment 9 Stephen Tweedie 2006-03-06 01:28:06 UTC
Thanks; I'd expect it to be fixed in this kernel but it's good to have it confirmed.