Bug 229850 - EIP in blktab
EIP in blktab
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel-xen (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Stephen Tweedie
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-02-23 14:39 EST by Karl MacMillan
Modified: 2007-11-30 17:11 EST (History)
4 users (show)

See Also:
Fixed In Version: kernel-xen-2.6-2.6.20-2925.4.3.fc7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-04-27 03:34:56 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Karl MacMillan 2007-02-23 14:39:31 EST
Description of problem:

Installing paravirt FC6 with xen kernel 2.6.19-1.2898.2.3.fc7xen reliably
results in the following kernel error:

Feb 22 12:18:38 localhost kernel: CPU:    1
Feb 22 12:18:38 localhost kernel: EIP:    0061:[<ee4ecfab>]    Not tainted VLI
Feb 22 12:18:38 localhost kernel: EFLAGS: 00010246   (2.6.19-1.2898.2.3.fc7xen #1)
Feb 22 12:18:38 localhost kernel: EIP is at dispatch_rw_block_io+0x96/0x853 [blktap]
Feb 22 12:18:38 localhost kernel: eax: ebcd3940   ebx: e77e89a4   ecx: ea81a801
  edx: 00000000
Feb 22 12:18:38 localhost kernel: esi: ee27e550   edi: d40b7fbc   ebp: e77e89b4
  esp: d40b7b9c
Feb 22 12:18:38 localhost kernel: ds: 007b   es: 007b   ss: 0069
Feb 22 12:18:39 localhost kernel: Process xvd 1 (pid: 4302, ti=d40b7000
task=c1bfacf0 task.ti=d40b7000)
Feb 22 12:18:39 localhost kernel: Stack: 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 
Feb 22 12:18:39 localhost kernel:        d40b7f1c 0000000a 00000000 ea81a800
d40b7f50 e77e89a4 00000001 00000000 
Feb 22 12:18:39 localhost kernel:        00000000 00000000 0b000000 000000d1
00041d25 00000000 00000016 00000016 
Feb 22 12:18:39 localhost kernel: Call Trace:
Feb 22 12:18:39 localhost kernel:  [<ee4ee0aa>] tap_blkif_schedule+0x29f/0x3df
[blktap]
Feb 22 12:18:39 localhost kernel:  [<c0431928>] kthread+0xc0/0xec
Feb 22 12:18:39 localhost kernel:  [<c040580f>] kernel_thread_helper+0x7/0x10
Feb 22 12:18:39 localhost kernel:  =======================
Feb 22 12:18:39 localhost kernel: Code: 50 c7 44 24 70 00 00 00 00 81 38 00 00
ad de 74 10 ff 44 24 70 83 c0 04 83 7c 24 70 20 74 0c eb e8 81 7c 24 70 00 00 ad
de 75 0d 
<0f> 0b b3 04 19 e5 4e ee e9 80 07 00 00 8b 44 24 30 8a 40 01 0f 
Feb 22 12:18:39 localhost kernel: EIP: [<ee4ecfab>]
dispatch_rw_block_io+0x96/0x853 [blktap] SS:ESP 0069:d40b7b9c
Feb 22 12:31:34 localhost kernel:  <6>xenbr0: port 3(vif1.0) entering disabled state
Feb 22 12:31:34 localhost kernel: device vif1.0 left promiscuous mode
Feb 22 12:31:34 localhost kernel: xenbr0: port 3(vif1.0) entering disabled state
Feb 22 12:31:34 localhost kernel: BUG: unable to handle kernel paging request at
virtual address d3151008
Feb 22 12:31:34 localhost kernel:  printing eip:
Feb 22 12:31:34 localhost kernel: c0459a05
Feb 22 12:31:34 localhost kernel: 10e1b000 -> *pde = 00000000:0cf55001
Feb 22 12:31:34 localhost kernel: 11955000 -> *pme = 00000000:0208f067
Feb 22 12:31:34 localhost kernel: 0008f000 -> *pte = 00000000:0b751061
Feb 22 12:31:34 localhost kernel: Oops: 0003 [#2]
Feb 22 12:31:34 localhost kernel: SMP 
Feb 22 12:31:34 localhost kernel: last sysfs file: /class/net/eth0/carrier

This is on a dual Xeon running 32bit (dell precision workstation 470) using
virt-manager to do the install over ftp. The virtual disk is a regular file.
This happens while anaconda is formatting the filesystem (normally) or
installing packages (once).


Version-Release number of selected component (if applicable):

kernel-xen-2.6.18-1.2849.fc6
xen-devel-3.0.4-6.fc7
xen-libs-3.0.4-6.fc7
xen-3.0.4-6.fc7
kernel-xen-2.6.19-1.2898.2.3.fc7

libvirt-0.2.0-3.fc7
virt-manager-0.3.1-2.fc7
libvirt-python-0.2.0-3.fc7
python-virtinst-0.101.0-2.fc7

This also happened with a previous version of xen and libvirt/virt-manager.

How reproducible:

Always

Steps to Reproduce:

Create a paravirt domain and install FC6 over ftp / http with virt-manager.
Comment 1 Karl MacMillan 2007-02-23 14:42:18 EST
Forgot to add:

The system doesn't lock up at this point, but the guest domain is mainly
unresponsive. The guest can't be stopped or destroyed either via virt-manager or
xm. Eventually the system locks up without producing further errors.
Comment 2 Daniel Berrange 2007-02-23 14:47:14 EST
I've seen exactly same crashes on x86_64 rawhide kernel-xen in Dom0, so don't
think this is arch specific. It is a little non-deterministic - I can run
existing VMs under light load without it hitting too often, but if I do a fresh
VM install it'll crash nearly everytime. So there's some wierd race there in the
blktap code I reckon.
Comment 3 Mark McLoughlin 2007-02-28 09:05:48 EST
I'm seeing this on x86_64 rawhide, reliably happening when anaconda starts
glibc-common. This call trace looks a bit more useful:

Kernel BUG at drivers/xen/blktap/blktapmain.c:1203
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 2 
Modules linked in: xt_physdev bridge netloop netbk blktap blkbk autofs4 hidp
rfcomm l2cap bluetooth sunrpc ip_conntrack_netbi
os_ns xt_state ip_conntrack nfnetlink ipt_REJECT iptable_filter ip_tables
xt_tcpudp ip6t_REJECT ip6table_filter ip6_tables x_
tables ipv6 dm_multipath video sbs i2c_ec button battery asus_acpi ac lp e1000
snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_s
eq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
parport_pc snd_mixer_oss i2c_i801 i2c_core parport
 floppy serial_core shpchp ide_cd snd_pcm snd_timer snd soundcore snd_page_alloc
cdrom pcspkr sg dm_snapshot dm_zero dm_mirro
r dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 3787, comm: xvd 1 Not tainted 2.6.19-1.2898.2.3.fc7xen #1
RIP: e030:[<ffffffff88381118>]  [<ffffffff88381118>]
:blktap:dispatch_rw_block_io+0x98/0x966
RSP: e02b:ffff8800c4121aa0  EFLAGS: 00010246
RAX: 00000000dead0000 RBX: ffff8800dcd79e80 RCX: 0000000000000001
RDX: ffff8800e3349ac0 RSI: ffff8800c4121e70 RDI: ffff8800dcd79e80
RBP: ffff8800e33496c0 R08: 0000070000000221 R09: 00000700000002ba
R10: 0000070000000321 R11: 0000070000000243 R12: ffff8800dcd79e90
R13: ffff8800e473e000 R14: 0000000000002e6c R15: ffff8800e33496c0
FS:  00002aaaaaac9fa0(0000) GS:ffffffff805b9100(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000de45e000 CR4: 0000000000002620
Process xvd 1 (pid: 3787, threadinfo ffff8800c4120000, task ffff8800e25340c0)
Stack:  0000000000000000 ffff8800e473e000 ffff8800c4121e70 ffff8800dcd79e80
 0000000000000001 0000000000000000 0000000000000000 0100000000000000
 000000000000001e 000000000000014a 0000000200000000 0000000100000001
Call Trace:
 [<ffffffff80283327>] find_busiest_group+0x1db/0x447
 [<ffffffff802629a6>] _spin_unlock_irq+0x9/0x10
 [<ffffffff80260d84>] thread_return+0x64/0xfe
 [<ffffffff8022f469>] __wake_up+0x38/0x4f
 [<ffffffff883824a0>] :blktap:tap_blkif_schedule+0x2ef/0x42f
 [<ffffffff883821b1>] :blktap:tap_blkif_schedule+0x0/0x42f
 [<ffffffff80298770>] keventd_create_kthread+0x0/0x66
 [<ffffffff80233789>] kthread+0xd0/0x100
 [<ffffffff8025ea98>] child_rip+0xa/0x12
 [<ffffffff80298770>] keventd_create_kthread+0x0/0x66
 [<ffffffff802336b9>] kthread+0x0/0x100
 [<ffffffff8025ea8e>] child_rip+0x0/0x12

Code: 0f 0b 68 31 29 38 88 c2 b3 04 e9 8a 08 00 00 48 8b 54 24 10 
RIP  [<ffffffff88381118>] :blktap:dispatch_rw_block_io+0x98/0x966
 RSP <ffff8800c4121aa0>
Comment 4 Mark McLoughlin 2007-02-28 09:08:03 EST
Worryingly, if I then destroy the guest, Dom0 oops and dies too
Comment 5 Mark McLoughlin 2007-02-28 10:06:48 EST
Okay, found it ... the problem seems to be that some csets are being merged into
blktap.c, but not blktapmain.c

In this case, we're missing:

  http://lists.xensource.com/archives/html/xen-changelog/2006-11/msg00464.html

I've tested kernel-xen-2.6.19-1.2898.2.3.fc7 with the missing patch and a
paravirt install completes successfully

So, a couple of other things we should do:

  - re-submit the blktap modular build fix upstream to help prevent these
    kind of merge errors:

      http://lists.xensource.com/archives/html/xen-devel/2006-09/msg00859.html

  - review the other differences between blktap.c and blktapmain.c - currently
    it looks like it might just be devfs removals we made ourselves
Comment 6 Tatsuro Enokura 2007-03-20 06:00:18 EDT
Description of problem:

  Installing paravirt Fedora7 test2 with xen kernel 2.6.19-1.2898.2.3.fc7xen
reliably
  results in the following kernel error:

Mar  8 14:57:56 coolmint kernel: kernel BUG at drivers/xen/blktap/blktapmain.c:1203!
Mar  8 14:57:56 coolmint kernel: invalid opcode: 0000 [#1]
Mar  8 14:57:56 coolmint kernel: SMP 
Mar  8 14:57:56 coolmint kernel: last sysfs file:
/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/irq
Mar  8 14:57:56 coolmint kernel: Modules linked in: xt_physdev iptable_filter
ip_tables x_tables i915 drm bridge netloop 
netbk blktap blkbk autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_mod
video sbs i2c_ec button battery asus_acpi 
ac ipv6 lp snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device parport_pc 
snd_pcm_oss snd_mixer_oss ide_cd parport i2c_i801 irda cdrom snd_pcm serio_raw
sky2 sg crc_ccitt i2c_core snd_timer 
pcspkr snd soundcore iTCO_wdt snd_page_alloc serial_core joydev ata_piix libata
sd_mod scsi_mod ext3 jbd ehci_hcd 
ohci_hcd uhci_hcd
Mar  8 14:57:56 coolmint kernel: CPU:    1
Mar  8 14:57:56 coolmint kernel: EIP:    0061:[<ee510fab>]    Not tainted VLI
Mar  8 14:57:56 coolmint kernel: EFLAGS: 00010246   (2.6.19-1.2898.2.3.fc7xen #1)
Mar  8 14:57:56 coolmint kernel: EIP is at dispatch_rw_block_io+0x96/0x853 [blktap]
Mar  8 14:57:56 coolmint kernel: eax: e34d1e40   ebx: ebde79a4   ecx: e8f00801 
 edx: 00000000
Mar  8 14:57:56 coolmint kernel: esi: ee554b38   edi: ec3ddfbc   ebp: ebde79b4 
 esp: ec3ddb9c
Mar  8 14:57:56 coolmint kernel: ds: 007b   es: 007b   ss: 0069
Mar  8 14:57:56 coolmint kernel: Process xvd 1 (pid: 3234, ti=ec3dd000
task=e855a590 task.ti=ec3dd000)
Mar  8 14:57:56 coolmint kernel: Stack: 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 
Mar  8 14:57:56 coolmint kernel:        ec3ddc9c 00000000 00000000 e8f00800
ec3ddf50 ebde79a4 00000001 00000000 
Mar  8 14:57:56 coolmint kernel:        00000000 00000000 01000000 000000b0
004f2755 00000000 00000002 00000002 
Mar  8 14:57:56 coolmint kernel: Call Trace:
Mar  8 14:57:56 coolmint kernel:  [<ee5120aa>] tap_blkif_schedule+0x29f/0x3df
[blktap]
Mar  8 14:57:56 coolmint kernel:  [<c0431928>] kthread+0xc0/0xec
Mar  8 14:57:56 coolmint kernel:  [<c040580f>] kernel_thread_helper+0x7/0x10
Mar  8 14:57:56 coolmint kernel:  =======================
Mar  8 14:57:56 coolmint kernel: Code: 50 c7 44 24 70 00 00 00 00 81 38 00 00 ad
de 74 10 ff 44 24 70 83 c0 04 83 7c 24 
70 20 74 0c eb e8 81 7c 24 70 00 00 ad de 75 0d <0f> 0b b3 04 19 25 51 ee e9 80
07 00 00 8b 44 24 30 8a 40 01 0f 
Mar  8 14:57:56 coolmint kernel: EIP: [<ee510fab>]
dispatch_rw_block_io+0x96/0x853 [blktap] SS:ESP 0069:ec3ddb9c


This is on a Core Duo running 32bit (Fujitsu FMV-S8225) using
virt-install to do the install over http. The virtual disk is a regular file.
This happens while anaconda is formatting the filesystem or installing packages.


Version-Release number of selected component (if applicable):
  xen-3.0.4-7.fc7
  xen-libs-3.0.7-9.fc7
  xen-devel-3.0.7-9.fc7
  kernel-xen-2.6.19-1.2898.2.3.fc7

  libvirt: 0.2.0(revision: 1.445)
  virt-install: 0.3.1(changeset 117: 2e5b60ecbd93)


How reproducible:
  Always

Steps to Reproduce:
  Create a paravirt domain and install Fedora7 test2 over ftp / http with
virt-install.

  virt-install --name=F7test2_PV --file=/root/F7test2_PV.img --file-size=5
--ram=512 \
  --paravirt --location=http://10.131.236.20/f7test2_x86 --nographics
Comment 8 Mark McLoughlin 2007-04-27 03:34:56 EDT
Should be fixed in rawhide

Note You need to log in before you can comment on or make changes to this bug.