Red Hat Bugzilla – Bug 847548
kernel BUG at include/linux/scatterlist.h:67 (vp_set / virtscsi_init / virtscsi_complete_free) kernel panics when virtio-scsi module is loaded
Last modified: 2013-01-09 07:06:52 EST
Description of problem:
Running libguestfs with latest qemu + kernel from F18.
Immediately after the virtio-scsi module is loaded,
the kernel panics.
febootstrap: internal insmod virtio_scsi.ko
[ 1.743146] ------------[ cut here ]------------
[ 1.744012] kernel BUG at include/linux/scatterlist.h:67!
[ 1.744012] invalid opcode: 0000 [#1] SMP
[ 1.744012] Modules linked in: virtio_scsi(+) virtio_blk virtio_rng virtio_balloon virtio_mmio sparse_keymap rfkill sym53c8xx scsi_transport_spi crc8 crc_ccitt crc_itu_t libcrc32c
[ 1.744012] CPU 0
[ 1.744012] Pid: 1, comm: init Not tainted 3.6.0-0.rc1.git3.2.bz844485.2.fc19.x86_64 #1 Bochs Bochs
[ 1.744012] RIP: 0010:[<ffffffffa00647d9>] [<ffffffffa00647d9>] virtscsi_kick_event+0xd9/0xe0 [virtio_scsi]
[ 1.744012] RSP: 0018:ffff88001ea01b48 EFLAGS: 00010286
[ 1.744012] RAX: ffffea00006c1400 RBX: ffff88001b050bd8 RCX: 0000000087654321
[ 1.744012] RDX: ffff88001bffb7b0 RSI: 0000000000000000 RDI: ffff88001b050cd8
[ 1.744012] RBP: ffff88001ea01b98 R08: ffffffff81d27c40 R09: 0000000000000002
[ 1.744012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88001b050cd0
[ 1.744012] R13: 0000000000000cd8 R14: ffff88001bffb7b0 R15: ffffffff81cb4ac0
[ 1.744012] FS: 0000000000715880(0063) GS:ffff88001ee00000(0000) knlGS:0000000000000000
[ 1.744012] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1.744012] CR2: 00007f3a431ce000 CR3: 000000001b741000 CR4: 00000000000006f0
[ 1.744012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1.744012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1.744012] Process init (pid: 1, threadinfo ffff88001ea00000, task ffff88001e9b8000)
[ 1.744012] Stack:
[ 1.744012] ffffffffa00660e8 ffffffffa00674f8 ffff88001bffb7b0 ffffffff81cb4ac0
[ 1.744012] ffff88001ea01b98 ffffffff813ecf14 0000000000000001 ffff88001b050c40
[ 1.744012] ffff88001b050bd8 ffff88001bffb7b0 ffff88001ea01c48 ffffffffa0064a42
[ 1.744012] Call Trace:
[ 1.744012] [<ffffffff813ecf14>] ? vp_set+0x54/0x70
[ 1.744012] [<ffffffffa0064a42>] virtscsi_init+0x262/0x270 [virtio_scsi]
[ 1.744012] [<ffffffffa00644c0>] ? virtscsi_complete_free+0x30/0x30 [virtio_scsi]
[ 1.744012] [<ffffffffa0064120>] ? virtscsi_vq_done+0x60/0x60 [virtio_scsi]
[ 1.744012] [<ffffffffa0064530>] ? virtscsi_ctrl_done+0x70/0x70 [virtio_scsi]
[ 1.744012] [<ffffffffa0065625>] virtscsi_probe+0xa7/0x1a4 [virtio_scsi]
[ 1.744012] [<ffffffff813ed1b0>] ? vp_reset+0x90/0x90
[ 1.744012] [<ffffffff813ebfc0>] virtio_dev_probe+0xe0/0x150
[ 1.744012] [<ffffffff8143eceb>] driver_probe_device+0x8b/0x390
[ 1.744012] [<ffffffff8143f09b>] __driver_attach+0xab/0xb0
[ 1.744012] [<ffffffff8143eff0>] ? driver_probe_device+0x390/0x390
[ 1.744012] [<ffffffff8143cc85>] bus_for_each_dev+0x55/0x90
[ 1.744012] [<ffffffff8143e65e>] driver_attach+0x1e/0x20
[ 1.744012] [<ffffffff8143e280>] bus_add_driver+0x1b0/0x2a0
[ 1.744012] [<ffffffffa006a000>] ? 0xffffffffa0069fff
[ 1.744012] [<ffffffff8143f797>] driver_register+0x77/0x170
[ 1.744012] [<ffffffff81161520>] ? mempool_kmalloc+0x20/0x20
[ 1.744012] [<ffffffffa006a000>] ? 0xffffffffa0069fff
[ 1.744012] [<ffffffff813ec250>] register_virtio_driver+0x20/0x30
[ 1.744012] [<ffffffffa006a088>] init+0x88/0x1000 [virtio_scsi]
[ 1.744012] [<ffffffff8100212a>] do_one_initcall+0x12a/0x180
[ 1.744012] [<ffffffff810e3c46>] sys_init_module+0x156/0x2290
[ 1.744012] [<ffffffff813650a0>] ? ddebug_proc_open+0xd0/0xd0
[ 1.744012] [<ffffffff816dc870>] ? _raw_spin_unlock_irq+0x30/0x50
[ 1.744012] [<ffffffff816e5869>] system_call_fastpath+0x16/0x1b
[ 1.744012] Code: 8b bb a0 00 00 00 e8 17 7e 38 e1 4c 89 f6 4c 89 ef e8 cc 80 67 e1 44 89 e0 48 8b 5d e0 4c 8b 65 e8 4c 8b 6d f0 4c 8b 75 f8 c9 c3 <0f> 0b 0f 0b 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48
[ 1.744012] RIP [<ffffffffa00647d9>] virtscsi_kick_event+0xd9/0xe0 [virtio_scsi]
[ 1.744012] RSP <ffff88001ea01b48>
[ 1.810755] ---[ end trace 4edd72ac44d1feb2 ]---
[ 1.812192] init (1) used greatest stack depth: 3736 bytes left
[ 1.813394] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 1.814336] Rebooting in 1 seconds..libguestfs: child_cleanup: 0x2549d70: child process died
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Run 'libguestfs-test-tool'
The qemu command line we're using is:
-global virtio-blk-pci.scsi=off \
-device virtio-scsi-pci,id=scsi \
-drive file=/tmp/libguestfs-test-tool-sda-D1C0Dp,format=raw,id=hd0,if=none \
-device scsi-hd,drive=hd0 \
-drive file=/var/tmp/.guestfs-1000/root.3411,snapshot=on,id=appliance,if=none,cache=unsafe \
-device scsi-hd,drive=appliance \
-machine accel=kvm:tcg \
-m 500 \
-device virtio-serial \
-serial stdio \
-device sga \
-chardev socket,path=/tmp/libguestfspKdIKD/guestfsd.sock,id=channel0 \
-device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
-kernel /var/tmp/.guestfs-1000/kernel.3411 \
-initrd /var/tmp/.guestfs-1000/initrd.3411 \
-append 'panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=xterm '
Still getting this with guest kernel 3.6.0-0.rc6.1.fc18.x86_64
and qemu-1.2-0.3.20120806git3e430569.fc19.x86_64. Note I can
reproduce this using a regular guest as well as with libguestfs.
Reassigning to the kernel, since changing back to kernel 3.5.0
makes the bug go away.
(In reply to comment #2)
> Still getting this with guest kernel 3.6.0-0.rc6.1.fc18.x86_64
I believe I meant to write kernel 3.6.0-0.rc1.git6.1.fc18.x86_64
In any case, I'll retest with the latest kernel from Koji.
Same problem occurs with 3.6.0-0.rc2.git0.2.fc18.x86_64. Stack trace
is identical to above.
Created attachment 605707 [details]
I'm trying out this patch.
Posted on LKML.
(In reply to comment #7)
I'll get this in later today. Somewhat unfortunately, I doubt it will show up in the Alpha since we're in freeze and they're only taking blocker+NTH bugs.
(In reply to comment #8)
> (In reply to comment #7)
> > https://lkml.org/lkml/2012/8/20/365
> I'll get this in later today. Somewhat unfortunately, I doubt it will show
> up in the Alpha since we're in freeze and they're only taking blocker+NTH
- In Rawhide, this prevents anyone from using virtio-scsi. That's
serious, but hopefully you can get this into the Rawhide kernel
so we should be OK.
- In Fedora 18, this *doesn't* affect anything because the
BUG_ON is an integrity check which only kicks in when debugging
is enabled (disabled in Fedora 18 kernels, I think). Although
the warning happens because a structure isn't initialized, in fact
this doesn't cause a problem -- I tested that.
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > > https://lkml.org/lkml/2012/8/20/365
> > I'll get this in later today. Somewhat unfortunately, I doubt it will show
> > up in the Alpha since we're in freeze and they're only taking blocker+NTH
> > bugs.
> - In Rawhide, this prevents anyone from using virtio-scsi. That's
> serious, but hopefully you can get this into the Rawhide kernel
> so we should be OK.
We haven't been building kernels for rawhide/f19 (git master branch) explicitly because the f18 branch is identical thus far. We rely on inheritance to get the kernels into rawhide.
> - In Fedora 18, this *doesn't* affect anything because the
> BUG_ON is an integrity check which only kicks in when debugging
> is enabled (disabled in Fedora 18 kernels, I think). Although
> the warning happens because a structure isn't initialized, in fact
> this doesn't cause a problem -- I tested that.
Nope. Debugging is enabled in F18. We always ship Alpha with a debug kernel.
Anyway, the patch is committed to the f18/master branches now.
Hmm I wonder if this counts as a NTH bug ...
> In general, nice-to-have bugs are usually bugs for which
> an update is not an optimal solution,
Yes: guest will not even boot, so update is not possible.
> and for which the fix
> is reasonably small and testable (this consideration becomes
> progressively more important as a release nears, so bugs may
> be downgraded from nice-to-have status late in the release
> process if it transpires that the fix is complex and hard to test).
Yes: fix is a one-liner, and well understood / tested.
> Types of bugs which are typically likely to be accepted as
> nice-to-have bugs include:
> * bugs which constitute infringements of the desktop-
> related Fedora_Release_Criteria as applied to
> non-default desktops
> * bugs which result in a system being unable to
> reach a graphical environment
Yes: F18 guest using virtio-scsi will not even boot unless this
patch has been applied to the kernel.
> * significant installer bugs which do not meet the
> criteria to be blocker bugs
I'll likely be building a kernel with this fix later today. If it gets accepted via NTH, I can use that build in the update instead of the one currently queued.
(In reply to comment #10)
> We haven't been building kernels for rawhide/f19 (git master branch)
> explicitly because the f18 branch is identical thus far. We rely on
> inheritance to get the kernels into rawhide.
There's a build of kernel-3.6.0-0.rc2.git1.2.fc18 which contains
this fix (http://koji.fedoraproject.org/koji/buildinfo?buildID=349638).
However that build isn't included/inherited when I build against
Rawhide. Instead I'm still getting the old broken 3.6.0-0.rc1.git6.1.fc18
I've waited over 12 hours and there have been multiple rawhide repo builds
in that time.
Any idea what's going on?
(In reply to comment #13)
> (In reply to comment #10)
> > We haven't been building kernels for rawhide/f19 (git master branch)
> > explicitly because the f18 branch is identical thus far. We rely on
> > inheritance to get the kernels into rawhide.
> There's a build of kernel-3.6.0-0.rc2.git1.2.fc18 which contains
> this fix (http://koji.fedoraproject.org/koji/buildinfo?buildID=349638).
> However that build isn't included/inherited when I build against
> Rawhide. Instead I'm still getting the old broken 3.6.0-0.rc1.git6.1.fc18
> (see: http://kojipkgs.fedoraproject.org//work/tasks/3038/4413038/root.log).
> I've waited over 12 hours and there have been multiple rawhide repo builds
> in that time.
> Any idea what's going on?
Yeah, we're in Alpha freeze.
The f19/rawhide koji tags inherit from the f18 koji tag. Builds done against f18 go into f18-updates-candidate and we have to file bodhi updates to get builds into the f18 tag from there. However, since we're in Alpha freeze, only blocker and NTH bugs are making it out of updates-testing and into the f18 tag. Since nothing new is going into the f18 tag, nothing is being inherited into rawhide.
It'll get there eventually. That whole reason is why I haven't closed this bug yet either.
kernel-3.6.0-0.rc2.git2.1.fc18, grub2-2.00-5.fc18, pesign-0.10-4.fc18 has been submitted as an update for Fedora 18.
Discussed at 2012-08-22 NTH review meeting. We agreed that on merit this bug doesn't quite rank NTH status, the impact is nasty but it's in a pretty obscure configuration and we're trying to be strict about kernel NTH bugs. We think it'd be acceptable in an Alpha to document this issue and have anyone who wants to use the Alpha in this specific configuration take care to install a kernel from updates.
However, in practice, the kernel build that fixes this is likely to make Alpha anyhow due to #849244 and #850003 being accepted as NTH. So don't worry about the process wankery. =)
kernel-3.6.0-0.rc2.git2.1.fc18, grub2-2.00-5.fc18, pesign-0.10-4.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.