Bug 847548 - kernel BUG at include/linux/scatterlist.h:67 (vp_set / virtscsi_init / virtscsi_complete_free) kernel panics when virtio-scsi module is loaded
kernel BUG at include/linux/scatterlist.h:67 (vp_set / virtscsi_init / virtsc...
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
RejectedNTH
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-12 17:40 EDT by Richard W.M. Jones
Modified: 2013-01-09 07:06 EST (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-08-23 20:12:16 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
0001-SCSI-virtio-scsi-Initialize-scatterlist-structure.patch (986 bytes, patch)
2012-08-20 09:24 EDT, Richard W.M. Jones
no flags Details | Diff

  None (edit)
Description Richard W.M. Jones 2012-08-12 17:40:40 EDT
Description of problem:

Running libguestfs with latest qemu + kernel from F18.
Immediately after the virtio-scsi module is loaded,
the kernel panics.

febootstrap: internal insmod virtio_scsi.ko
[    1.743146] ------------[ cut here ]------------
[    1.744012] kernel BUG at include/linux/scatterlist.h:67!
[    1.744012] invalid opcode: 0000 [#1] SMP 
[    1.744012] Modules linked in: virtio_scsi(+) virtio_blk virtio_rng virtio_balloon virtio_mmio sparse_keymap rfkill sym53c8xx scsi_transport_spi crc8 crc_ccitt crc_itu_t libcrc32c
[    1.744012] CPU 0 
[    1.744012] Pid: 1, comm: init Not tainted 3.6.0-0.rc1.git3.2.bz844485.2.fc19.x86_64 #1 Bochs Bochs
[    1.744012] RIP: 0010:[<ffffffffa00647d9>]  [<ffffffffa00647d9>] virtscsi_kick_event+0xd9/0xe0 [virtio_scsi]
[    1.744012] RSP: 0018:ffff88001ea01b48  EFLAGS: 00010286
[    1.744012] RAX: ffffea00006c1400 RBX: ffff88001b050bd8 RCX: 0000000087654321
[    1.744012] RDX: ffff88001bffb7b0 RSI: 0000000000000000 RDI: ffff88001b050cd8
[    1.744012] RBP: ffff88001ea01b98 R08: ffffffff81d27c40 R09: 0000000000000002
[    1.744012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88001b050cd0
[    1.744012] R13: 0000000000000cd8 R14: ffff88001bffb7b0 R15: ffffffff81cb4ac0
[    1.744012] FS:  0000000000715880(0063) GS:ffff88001ee00000(0000) knlGS:0000000000000000
[    1.744012] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    1.744012] CR2: 00007f3a431ce000 CR3: 000000001b741000 CR4: 00000000000006f0
[    1.744012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    1.744012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    1.744012] Process init (pid: 1, threadinfo ffff88001ea00000, task ffff88001e9b8000)
[    1.744012] Stack:
[    1.744012]  ffffffffa00660e8 ffffffffa00674f8 ffff88001bffb7b0 ffffffff81cb4ac0
[    1.744012]  ffff88001ea01b98 ffffffff813ecf14 0000000000000001 ffff88001b050c40
[    1.744012]  ffff88001b050bd8 ffff88001bffb7b0 ffff88001ea01c48 ffffffffa0064a42
[    1.744012] Call Trace:
[    1.744012]  [<ffffffff813ecf14>] ? vp_set+0x54/0x70
[    1.744012]  [<ffffffffa0064a42>] virtscsi_init+0x262/0x270 [virtio_scsi]
[    1.744012]  [<ffffffffa00644c0>] ? virtscsi_complete_free+0x30/0x30 [virtio_scsi]
[    1.744012]  [<ffffffffa0064120>] ? virtscsi_vq_done+0x60/0x60 [virtio_scsi]
[    1.744012]  [<ffffffffa0064530>] ? virtscsi_ctrl_done+0x70/0x70 [virtio_scsi]
[    1.744012]  [<ffffffffa0065625>] virtscsi_probe+0xa7/0x1a4 [virtio_scsi]
[    1.744012]  [<ffffffff813ed1b0>] ? vp_reset+0x90/0x90
[    1.744012]  [<ffffffff813ebfc0>] virtio_dev_probe+0xe0/0x150
[    1.744012]  [<ffffffff8143eceb>] driver_probe_device+0x8b/0x390
[    1.744012]  [<ffffffff8143f09b>] __driver_attach+0xab/0xb0
[    1.744012]  [<ffffffff8143eff0>] ? driver_probe_device+0x390/0x390
[    1.744012]  [<ffffffff8143cc85>] bus_for_each_dev+0x55/0x90
[    1.744012]  [<ffffffff8143e65e>] driver_attach+0x1e/0x20
[    1.744012]  [<ffffffff8143e280>] bus_add_driver+0x1b0/0x2a0
[    1.744012]  [<ffffffffa006a000>] ? 0xffffffffa0069fff
[    1.744012]  [<ffffffff8143f797>] driver_register+0x77/0x170
[    1.744012]  [<ffffffff81161520>] ? mempool_kmalloc+0x20/0x20
[    1.744012]  [<ffffffffa006a000>] ? 0xffffffffa0069fff
[    1.744012]  [<ffffffff813ec250>] register_virtio_driver+0x20/0x30
[    1.744012]  [<ffffffffa006a088>] init+0x88/0x1000 [virtio_scsi]
[    1.744012]  [<ffffffff8100212a>] do_one_initcall+0x12a/0x180
[    1.744012]  [<ffffffff810e3c46>] sys_init_module+0x156/0x2290
[    1.744012]  [<ffffffff813650a0>] ? ddebug_proc_open+0xd0/0xd0
[    1.744012]  [<ffffffff816dc870>] ? _raw_spin_unlock_irq+0x30/0x50
[    1.744012]  [<ffffffff816e5869>] system_call_fastpath+0x16/0x1b
[    1.744012] Code: 8b bb a0 00 00 00 e8 17 7e 38 e1 4c 89 f6 4c 89 ef e8 cc 80 67 e1 44 89 e0 48 8b 5d e0 4c 8b 65 e8 4c 8b 6d f0 4c 8b 75 f8 c9 c3 <0f> 0b 0f 0b 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 
[    1.744012] RIP  [<ffffffffa00647d9>] virtscsi_kick_event+0xd9/0xe0 [virtio_scsi]
[    1.744012]  RSP <ffff88001ea01b48>
[    1.810755] ---[ end trace 4edd72ac44d1feb2 ]---
[    1.812192] init (1) used greatest stack depth: 3736 bytes left
[    1.813394] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.813394] 
[    1.814336] Rebooting in 1 seconds..libguestfs: child_cleanup: 0x2549d70: child process died

Version-Release number of selected component (if applicable):

kernel-3.6.0-0.rc1.git3.2.bz844485.2.fc19.x86_64
qemu-1.2-0.1.20120806git3e430569.fc18.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Run 'libguestfs-test-tool'
Comment 1 Richard W.M. Jones 2012-08-12 17:44:38 EDT
The qemu command line we're using is:

/usr/bin/qemu-kvm \
    -global virtio-blk-pci.scsi=off \
    -nodefconfig \
    -nodefaults \
    -nographic \
    -device virtio-scsi-pci,id=scsi \
    -drive file=/tmp/libguestfs-test-tool-sda-D1C0Dp,format=raw,id=hd0,if=none \
    -device scsi-hd,drive=hd0 \
    -drive file=/var/tmp/.guestfs-1000/root.3411,snapshot=on,id=appliance,if=none,cache=unsafe \
    -device scsi-hd,drive=appliance \
    -machine accel=kvm:tcg \
    -m 500 \
    -no-reboot \
    -no-hpet \
    -device virtio-serial \
    -serial stdio \
    -device sga \
    -chardev socket,path=/tmp/libguestfspKdIKD/guestfsd.sock,id=channel0 \
    -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
    -kernel /var/tmp/.guestfs-1000/kernel.3411 \
    -initrd /var/tmp/.guestfs-1000/initrd.3411 \
    -append 'panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=xterm '
Comment 2 Richard W.M. Jones 2012-08-20 08:37:55 EDT
Still getting this with guest kernel 3.6.0-0.rc6.1.fc18.x86_64
and qemu-1.2-0.3.20120806git3e430569.fc19.x86_64.  Note I can
reproduce this using a regular guest as well as with libguestfs.

Reassigning to the kernel, since changing back to kernel 3.5.0
makes the bug go away.
Comment 3 Richard W.M. Jones 2012-08-20 08:58:50 EDT
(In reply to comment #2)
> Still getting this with guest kernel 3.6.0-0.rc6.1.fc18.x86_64

I believe I meant to write kernel 3.6.0-0.rc1.git6.1.fc18.x86_64

In any case, I'll retest with the latest kernel from Koji.
Comment 4 Richard W.M. Jones 2012-08-20 09:14:53 EDT
Same problem occurs with 3.6.0-0.rc2.git0.2.fc18.x86_64.  Stack trace
is identical to above.
Comment 5 Richard W.M. Jones 2012-08-20 09:24:46 EDT
Created attachment 605707 [details]
0001-SCSI-virtio-scsi-Initialize-scatterlist-structure.patch

I'm trying out this patch.
Comment 6 Richard W.M. Jones 2012-08-20 10:05:10 EDT
Fixed :-)

Posted on LKML.
Comment 7 Richard W.M. Jones 2012-08-20 17:27:42 EDT
https://lkml.org/lkml/2012/8/20/365
Comment 8 Josh Boyer 2012-08-21 10:16:02 EDT
(In reply to comment #7)
> https://lkml.org/lkml/2012/8/20/365

I'll get this in later today.  Somewhat unfortunately, I doubt it will show up in the Alpha since we're in freeze and they're only taking blocker+NTH bugs.
Comment 9 Richard W.M. Jones 2012-08-21 10:37:47 EDT
(In reply to comment #8)
> (In reply to comment #7)
> > https://lkml.org/lkml/2012/8/20/365
> 
> I'll get this in later today.  Somewhat unfortunately, I doubt it will show
> up in the Alpha since we're in freeze and they're only taking blocker+NTH
> bugs.

So:

 - In Rawhide, this prevents anyone from using virtio-scsi.  That's
   serious, but hopefully you can get this into the Rawhide kernel
   so we should be OK.

 - In Fedora 18, this *doesn't* affect anything because the
   BUG_ON is an integrity check which only kicks in when debugging
   is enabled (disabled in Fedora 18 kernels, I think).  Although
   the warning happens because a structure isn't initialized, in fact
   this doesn't cause a problem -- I tested that.
Comment 10 Josh Boyer 2012-08-21 10:50:02 EDT
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > > https://lkml.org/lkml/2012/8/20/365
> > 
> > I'll get this in later today.  Somewhat unfortunately, I doubt it will show
> > up in the Alpha since we're in freeze and they're only taking blocker+NTH
> > bugs.
> 
> So:
> 
>  - In Rawhide, this prevents anyone from using virtio-scsi.  That's
>    serious, but hopefully you can get this into the Rawhide kernel
>    so we should be OK.

We haven't been building kernels for rawhide/f19 (git master branch) explicitly because the f18 branch is identical thus far.  We rely on inheritance to get the kernels into rawhide.
>
>  - In Fedora 18, this *doesn't* affect anything because the
>    BUG_ON is an integrity check which only kicks in when debugging
>    is enabled (disabled in Fedora 18 kernels, I think).  Although
>    the warning happens because a structure isn't initialized, in fact
>    this doesn't cause a problem -- I tested that.

Nope.  Debugging is enabled in F18.  We always ship Alpha with a debug kernel.

Anyway, the patch is committed to the f18/master branches now.
Comment 11 Richard W.M. Jones 2012-08-21 11:14:01 EDT
Hmm I wonder if this counts as a NTH bug ...

From: https://fedoraproject.org/wiki/QA:SOP_nth_bug_process
> In general, nice-to-have bugs are usually bugs for which
> an update is not an optimal solution,

Yes: guest will not even boot, so update is not possible.

> and for which the fix
> is reasonably small and testable (this consideration becomes
> progressively more important as a release nears, so bugs may
> be downgraded from nice-to-have status late in the release
> process if it transpires that the fix is complex and hard to test).

Yes: fix is a one-liner, and well understood / tested.

> 
> Types of bugs which are typically likely to be accepted as
> nice-to-have bugs include:
> 
>   * bugs which constitute infringements of the desktop-
>     related Fedora_Release_Criteria as applied to
>     non-default desktops
>   * bugs which result in a system being unable to
>     reach a graphical environment

Yes: F18 guest using virtio-scsi will not even boot unless this
patch has been applied to the kernel.

>   * significant installer bugs which do not meet the
>     criteria to be blocker bugs
Comment 12 Josh Boyer 2012-08-21 12:13:08 EDT
I'll likely be building a kernel with this fix later today.  If it gets accepted via NTH, I can use that build in the update instead of the one currently queued.
Comment 13 Richard W.M. Jones 2012-08-22 05:36:43 EDT
(In reply to comment #10)
> We haven't been building kernels for rawhide/f19 (git master branch)
> explicitly because the f18 branch is identical thus far.  We rely on
> inheritance to get the kernels into rawhide.

There's a build of kernel-3.6.0-0.rc2.git1.2.fc18 which contains
this fix (http://koji.fedoraproject.org/koji/buildinfo?buildID=349638).

However that build isn't included/inherited when I build against
Rawhide.  Instead I'm still getting the old broken 3.6.0-0.rc1.git6.1.fc18
(see: http://kojipkgs.fedoraproject.org//work/tasks/3038/4413038/root.log).
I've waited over 12 hours and there have been multiple rawhide repo builds
in that time.

Any idea what's going on?
Comment 14 Josh Boyer 2012-08-22 09:22:13 EDT
(In reply to comment #13)
> (In reply to comment #10)
> > We haven't been building kernels for rawhide/f19 (git master branch)
> > explicitly because the f18 branch is identical thus far.  We rely on
> > inheritance to get the kernels into rawhide.
> 
> There's a build of kernel-3.6.0-0.rc2.git1.2.fc18 which contains
> this fix (http://koji.fedoraproject.org/koji/buildinfo?buildID=349638).
> 
> However that build isn't included/inherited when I build against
> Rawhide.  Instead I'm still getting the old broken 3.6.0-0.rc1.git6.1.fc18
> (see: http://kojipkgs.fedoraproject.org//work/tasks/3038/4413038/root.log).
> I've waited over 12 hours and there have been multiple rawhide repo builds
> in that time.
> 
> Any idea what's going on?

Yeah, we're in Alpha freeze.

The f19/rawhide koji tags inherit from the f18 koji tag.  Builds done against f18 go into f18-updates-candidate and we have to file bodhi updates to get builds into the f18 tag from there.  However, since we're in Alpha freeze, only blocker and NTH bugs are making it out of updates-testing and into the f18 tag.  Since nothing new is going into the f18 tag, nothing is being inherited into rawhide.

It'll get there eventually.  That whole reason is why I haven't closed this bug yet either.
Comment 15 Fedora Update System 2012-08-22 12:06:56 EDT
kernel-3.6.0-0.rc2.git2.1.fc18, grub2-2.00-5.fc18, pesign-0.10-4.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.6.0-0.rc2.git2.1.fc18,grub2-2.00-5.fc18,pesign-0.10-4.fc18
Comment 16 Adam Williamson 2012-08-22 14:07:07 EDT
Discussed at 2012-08-22 NTH review meeting. We agreed that on merit this bug doesn't quite rank NTH status, the impact is nasty but it's in a pretty obscure configuration and we're trying to be strict about kernel NTH bugs. We think it'd be acceptable in an Alpha to document this issue and have anyone who wants to use the Alpha in this specific configuration take care to install a kernel from updates.

However, in practice, the kernel build that fixes this is likely to make Alpha anyhow due to #849244 and #850003 being accepted as NTH. So don't worry about the process wankery. =)
Comment 17 Fedora Update System 2012-08-23 20:12:16 EDT
kernel-3.6.0-0.rc2.git2.1.fc18, grub2-2.00-5.fc18, pesign-0.10-4.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.