Description of problem: Running libguestfs with latest qemu + kernel from F18. Immediately after the virtio-scsi module is loaded, the kernel panics. febootstrap: internal insmod virtio_scsi.ko [ 1.743146] ------------[ cut here ]------------ [ 1.744012] kernel BUG at include/linux/scatterlist.h:67! [ 1.744012] invalid opcode: 0000 [#1] SMP [ 1.744012] Modules linked in: virtio_scsi(+) virtio_blk virtio_rng virtio_balloon virtio_mmio sparse_keymap rfkill sym53c8xx scsi_transport_spi crc8 crc_ccitt crc_itu_t libcrc32c [ 1.744012] CPU 0 [ 1.744012] Pid: 1, comm: init Not tainted 3.6.0-0.rc1.git3.2.bz844485.2.fc19.x86_64 #1 Bochs Bochs [ 1.744012] RIP: 0010:[<ffffffffa00647d9>] [<ffffffffa00647d9>] virtscsi_kick_event+0xd9/0xe0 [virtio_scsi] [ 1.744012] RSP: 0018:ffff88001ea01b48 EFLAGS: 00010286 [ 1.744012] RAX: ffffea00006c1400 RBX: ffff88001b050bd8 RCX: 0000000087654321 [ 1.744012] RDX: ffff88001bffb7b0 RSI: 0000000000000000 RDI: ffff88001b050cd8 [ 1.744012] RBP: ffff88001ea01b98 R08: ffffffff81d27c40 R09: 0000000000000002 [ 1.744012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88001b050cd0 [ 1.744012] R13: 0000000000000cd8 R14: ffff88001bffb7b0 R15: ffffffff81cb4ac0 [ 1.744012] FS: 0000000000715880(0063) GS:ffff88001ee00000(0000) knlGS:0000000000000000 [ 1.744012] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1.744012] CR2: 00007f3a431ce000 CR3: 000000001b741000 CR4: 00000000000006f0 [ 1.744012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.744012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1.744012] Process init (pid: 1, threadinfo ffff88001ea00000, task ffff88001e9b8000) [ 1.744012] Stack: [ 1.744012] ffffffffa00660e8 ffffffffa00674f8 ffff88001bffb7b0 ffffffff81cb4ac0 [ 1.744012] ffff88001ea01b98 ffffffff813ecf14 0000000000000001 ffff88001b050c40 [ 1.744012] ffff88001b050bd8 ffff88001bffb7b0 ffff88001ea01c48 ffffffffa0064a42 [ 1.744012] Call Trace: [ 1.744012] [<ffffffff813ecf14>] ? vp_set+0x54/0x70 [ 1.744012] [<ffffffffa0064a42>] virtscsi_init+0x262/0x270 [virtio_scsi] [ 1.744012] [<ffffffffa00644c0>] ? virtscsi_complete_free+0x30/0x30 [virtio_scsi] [ 1.744012] [<ffffffffa0064120>] ? virtscsi_vq_done+0x60/0x60 [virtio_scsi] [ 1.744012] [<ffffffffa0064530>] ? virtscsi_ctrl_done+0x70/0x70 [virtio_scsi] [ 1.744012] [<ffffffffa0065625>] virtscsi_probe+0xa7/0x1a4 [virtio_scsi] [ 1.744012] [<ffffffff813ed1b0>] ? vp_reset+0x90/0x90 [ 1.744012] [<ffffffff813ebfc0>] virtio_dev_probe+0xe0/0x150 [ 1.744012] [<ffffffff8143eceb>] driver_probe_device+0x8b/0x390 [ 1.744012] [<ffffffff8143f09b>] __driver_attach+0xab/0xb0 [ 1.744012] [<ffffffff8143eff0>] ? driver_probe_device+0x390/0x390 [ 1.744012] [<ffffffff8143cc85>] bus_for_each_dev+0x55/0x90 [ 1.744012] [<ffffffff8143e65e>] driver_attach+0x1e/0x20 [ 1.744012] [<ffffffff8143e280>] bus_add_driver+0x1b0/0x2a0 [ 1.744012] [<ffffffffa006a000>] ? 0xffffffffa0069fff [ 1.744012] [<ffffffff8143f797>] driver_register+0x77/0x170 [ 1.744012] [<ffffffff81161520>] ? mempool_kmalloc+0x20/0x20 [ 1.744012] [<ffffffffa006a000>] ? 0xffffffffa0069fff [ 1.744012] [<ffffffff813ec250>] register_virtio_driver+0x20/0x30 [ 1.744012] [<ffffffffa006a088>] init+0x88/0x1000 [virtio_scsi] [ 1.744012] [<ffffffff8100212a>] do_one_initcall+0x12a/0x180 [ 1.744012] [<ffffffff810e3c46>] sys_init_module+0x156/0x2290 [ 1.744012] [<ffffffff813650a0>] ? ddebug_proc_open+0xd0/0xd0 [ 1.744012] [<ffffffff816dc870>] ? _raw_spin_unlock_irq+0x30/0x50 [ 1.744012] [<ffffffff816e5869>] system_call_fastpath+0x16/0x1b [ 1.744012] Code: 8b bb a0 00 00 00 e8 17 7e 38 e1 4c 89 f6 4c 89 ef e8 cc 80 67 e1 44 89 e0 48 8b 5d e0 4c 8b 65 e8 4c 8b 6d f0 4c 8b 75 f8 c9 c3 <0f> 0b 0f 0b 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 [ 1.744012] RIP [<ffffffffa00647d9>] virtscsi_kick_event+0xd9/0xe0 [virtio_scsi] [ 1.744012] RSP <ffff88001ea01b48> [ 1.810755] ---[ end trace 4edd72ac44d1feb2 ]--- [ 1.812192] init (1) used greatest stack depth: 3736 bytes left [ 1.813394] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 1.813394] [ 1.814336] Rebooting in 1 seconds..libguestfs: child_cleanup: 0x2549d70: child process died Version-Release number of selected component (if applicable): kernel-3.6.0-0.rc1.git3.2.bz844485.2.fc19.x86_64 qemu-1.2-0.1.20120806git3e430569.fc18.x86_64 How reproducible: 100% Steps to Reproduce: 1. Run 'libguestfs-test-tool'
The qemu command line we're using is: /usr/bin/qemu-kvm \ -global virtio-blk-pci.scsi=off \ -nodefconfig \ -nodefaults \ -nographic \ -device virtio-scsi-pci,id=scsi \ -drive file=/tmp/libguestfs-test-tool-sda-D1C0Dp,format=raw,id=hd0,if=none \ -device scsi-hd,drive=hd0 \ -drive file=/var/tmp/.guestfs-1000/root.3411,snapshot=on,id=appliance,if=none,cache=unsafe \ -device scsi-hd,drive=appliance \ -machine accel=kvm:tcg \ -m 500 \ -no-reboot \ -no-hpet \ -device virtio-serial \ -serial stdio \ -device sga \ -chardev socket,path=/tmp/libguestfspKdIKD/guestfsd.sock,id=channel0 \ -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \ -kernel /var/tmp/.guestfs-1000/kernel.3411 \ -initrd /var/tmp/.guestfs-1000/initrd.3411 \ -append 'panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=xterm '
Still getting this with guest kernel 3.6.0-0.rc6.1.fc18.x86_64 and qemu-1.2-0.3.20120806git3e430569.fc19.x86_64. Note I can reproduce this using a regular guest as well as with libguestfs. Reassigning to the kernel, since changing back to kernel 3.5.0 makes the bug go away.
(In reply to comment #2) > Still getting this with guest kernel 3.6.0-0.rc6.1.fc18.x86_64 I believe I meant to write kernel 3.6.0-0.rc1.git6.1.fc18.x86_64 In any case, I'll retest with the latest kernel from Koji.
Same problem occurs with 3.6.0-0.rc2.git0.2.fc18.x86_64. Stack trace is identical to above.
Created attachment 605707 [details] 0001-SCSI-virtio-scsi-Initialize-scatterlist-structure.patch I'm trying out this patch.
Fixed :-) Posted on LKML.
https://lkml.org/lkml/2012/8/20/365
(In reply to comment #7) > https://lkml.org/lkml/2012/8/20/365 I'll get this in later today. Somewhat unfortunately, I doubt it will show up in the Alpha since we're in freeze and they're only taking blocker+NTH bugs.
(In reply to comment #8) > (In reply to comment #7) > > https://lkml.org/lkml/2012/8/20/365 > > I'll get this in later today. Somewhat unfortunately, I doubt it will show > up in the Alpha since we're in freeze and they're only taking blocker+NTH > bugs. So: - In Rawhide, this prevents anyone from using virtio-scsi. That's serious, but hopefully you can get this into the Rawhide kernel so we should be OK. - In Fedora 18, this *doesn't* affect anything because the BUG_ON is an integrity check which only kicks in when debugging is enabled (disabled in Fedora 18 kernels, I think). Although the warning happens because a structure isn't initialized, in fact this doesn't cause a problem -- I tested that.
(In reply to comment #9) > (In reply to comment #8) > > (In reply to comment #7) > > > https://lkml.org/lkml/2012/8/20/365 > > > > I'll get this in later today. Somewhat unfortunately, I doubt it will show > > up in the Alpha since we're in freeze and they're only taking blocker+NTH > > bugs. > > So: > > - In Rawhide, this prevents anyone from using virtio-scsi. That's > serious, but hopefully you can get this into the Rawhide kernel > so we should be OK. We haven't been building kernels for rawhide/f19 (git master branch) explicitly because the f18 branch is identical thus far. We rely on inheritance to get the kernels into rawhide. > > - In Fedora 18, this *doesn't* affect anything because the > BUG_ON is an integrity check which only kicks in when debugging > is enabled (disabled in Fedora 18 kernels, I think). Although > the warning happens because a structure isn't initialized, in fact > this doesn't cause a problem -- I tested that. Nope. Debugging is enabled in F18. We always ship Alpha with a debug kernel. Anyway, the patch is committed to the f18/master branches now.
Hmm I wonder if this counts as a NTH bug ... From: https://fedoraproject.org/wiki/QA:SOP_nth_bug_process > In general, nice-to-have bugs are usually bugs for which > an update is not an optimal solution, Yes: guest will not even boot, so update is not possible. > and for which the fix > is reasonably small and testable (this consideration becomes > progressively more important as a release nears, so bugs may > be downgraded from nice-to-have status late in the release > process if it transpires that the fix is complex and hard to test). Yes: fix is a one-liner, and well understood / tested. > > Types of bugs which are typically likely to be accepted as > nice-to-have bugs include: > > * bugs which constitute infringements of the desktop- > related Fedora_Release_Criteria as applied to > non-default desktops > * bugs which result in a system being unable to > reach a graphical environment Yes: F18 guest using virtio-scsi will not even boot unless this patch has been applied to the kernel. > * significant installer bugs which do not meet the > criteria to be blocker bugs
I'll likely be building a kernel with this fix later today. If it gets accepted via NTH, I can use that build in the update instead of the one currently queued.
(In reply to comment #10) > We haven't been building kernels for rawhide/f19 (git master branch) > explicitly because the f18 branch is identical thus far. We rely on > inheritance to get the kernels into rawhide. There's a build of kernel-3.6.0-0.rc2.git1.2.fc18 which contains this fix (http://koji.fedoraproject.org/koji/buildinfo?buildID=349638). However that build isn't included/inherited when I build against Rawhide. Instead I'm still getting the old broken 3.6.0-0.rc1.git6.1.fc18 (see: http://kojipkgs.fedoraproject.org//work/tasks/3038/4413038/root.log). I've waited over 12 hours and there have been multiple rawhide repo builds in that time. Any idea what's going on?
(In reply to comment #13) > (In reply to comment #10) > > We haven't been building kernels for rawhide/f19 (git master branch) > > explicitly because the f18 branch is identical thus far. We rely on > > inheritance to get the kernels into rawhide. > > There's a build of kernel-3.6.0-0.rc2.git1.2.fc18 which contains > this fix (http://koji.fedoraproject.org/koji/buildinfo?buildID=349638). > > However that build isn't included/inherited when I build against > Rawhide. Instead I'm still getting the old broken 3.6.0-0.rc1.git6.1.fc18 > (see: http://kojipkgs.fedoraproject.org//work/tasks/3038/4413038/root.log). > I've waited over 12 hours and there have been multiple rawhide repo builds > in that time. > > Any idea what's going on? Yeah, we're in Alpha freeze. The f19/rawhide koji tags inherit from the f18 koji tag. Builds done against f18 go into f18-updates-candidate and we have to file bodhi updates to get builds into the f18 tag from there. However, since we're in Alpha freeze, only blocker and NTH bugs are making it out of updates-testing and into the f18 tag. Since nothing new is going into the f18 tag, nothing is being inherited into rawhide. It'll get there eventually. That whole reason is why I haven't closed this bug yet either.
kernel-3.6.0-0.rc2.git2.1.fc18, grub2-2.00-5.fc18, pesign-0.10-4.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/kernel-3.6.0-0.rc2.git2.1.fc18,grub2-2.00-5.fc18,pesign-0.10-4.fc18
Discussed at 2012-08-22 NTH review meeting. We agreed that on merit this bug doesn't quite rank NTH status, the impact is nasty but it's in a pretty obscure configuration and we're trying to be strict about kernel NTH bugs. We think it'd be acceptable in an Alpha to document this issue and have anyone who wants to use the Alpha in this specific configuration take care to install a kernel from updates. However, in practice, the kernel build that fixes this is likely to make Alpha anyhow due to #849244 and #850003 being accepted as NTH. So don't worry about the process wankery. =)
kernel-3.6.0-0.rc2.git2.1.fc18, grub2-2.00-5.fc18, pesign-0.10-4.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.