Bug 1434462 - frequent kernel panic during VM boot
frequent kernel panic during VM boot
Status: CLOSED DUPLICATE of bug 1430297
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
26
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks: AlphaBlocker/F26AlphaBlocker
  Show dependency treegraph
 
Reported: 2017-03-21 10:41 EDT by Kamil Páral
Modified: 2017-03-23 17:14 EDT (History)
17 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-03-23 17:14:46 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
vm.xml (5.40 KB, text/plain)
2017-03-21 10:42 EDT, Kamil Páral
no flags Details
rpm-qa (51.57 KB, text/plain)
2017-03-21 10:42 EDT, Kamil Páral
no flags Details
panic1.txt (38.34 KB, text/plain)
2017-03-21 10:42 EDT, Kamil Páral
no flags Details
panic2.txt (41.88 KB, text/plain)
2017-03-21 10:42 EDT, Kamil Páral
no flags Details
panic3.txt (3.56 KB, text/plain)
2017-03-21 10:42 EDT, Kamil Páral
no flags Details
dusty-panic1.txt (2.20 KB, text/plain)
2017-03-23 14:17 EDT, Dusty Mabe
no flags Details
dusty-panic2.txt (3.39 KB, text/plain)
2017-03-23 14:17 EDT, Dusty Mabe
no flags Details

  None (edit)
Description Kamil Páral 2017-03-21 10:41:08 EDT
Description of problem:
I have a default installation of F26 Workstation Live in virt-manager. When I try to boot the installed system, very often (my guess is in about 30-50% of times) I see a kernel panic during boot and the system doesn't boot. Either I see kernel panic printed out to tty1, or my screen is black (and I can't do anything), or I see "A start job is running for udev Wai…e Initialization (1min 16s / 3min)" timing out (caused by a previous panic).

There seem to be multiple tracebacks during each boot. This is the first one from the first boot attempt:

[    1.944550] general protection fault: 0000 [#1] SMP
[    1.945084] Modules linked in: virtio_net virtio_blk virtio_rng virtio_console qxl drm_kms_helper ttm drm crc32c_intel serio_raw qemu_fw_cfg virtio_pci ata_generic virtio_ring pata_acpi virtio
[    1.946812] CPU: 1 PID: 383 Comm: dracut-pre-pivo Not tainted 4.11.0-0.rc2.git2.2.fc26.x86_64 #1
[    1.947702] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
[    1.948574] task: ffff8c063655cb00 task.stack: ffffb70880730000
[    1.949183] RIP: 0010:kmem_cache_alloc+0x84/0x1b0
[    1.949655] RSP: 0018:ffffb70880733d68 EFLAGS: 00010246
[    1.950194] RAX: 2d316f6974726976 RBX: 00000000014000c0 RCX: 0000000000000000
[    1.950917] RDX: 00000000000000e3 RSI: 00000000014000c0 RDI: 000000000001d8a0
[    1.951629] RBP: ffffb70880733d98 R08: ffff8c067fd1d8a0 R09: ffff8c0636b14440
[    1.952345] R10: ffff8c06366a87c0 R11: ffff8c06366eac40 R12: 00000000014000c0
[    1.953062] R13: ffff8c067d098dc0 R14: 2d316f6974726976 R15: ffff8c067d098dc0
[    1.953781] FS:  00007f01fda21b40(0000) GS:ffff8c067fd00000(0000) knlGS:0000000000000000
[    1.954607] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.955188] CR2: 000055ebcf4d4018 CR3: 0000000036384000 CR4: 00000000003406e0
[    1.955917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    1.956630] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    1.957346] Call Trace:
[    1.957603]  ? copy_process.part.30+0xab1/0x1d40
[    1.958082]  copy_process.part.30+0xab1/0x1d40
[    1.958533]  ? _do_fork+0xd7/0x390
[    1.958886]  ? selinux_inode_permission+0xe2/0x1b0
[    1.959364]  _do_fork+0xd7/0x390
[    1.959682]  SyS_clone+0x19/0x20
[    1.960011]  do_syscall_64+0x67/0x170
[    1.960407]  entry_SYSCALL64_slow_path+0x25/0x25
[    1.960905] RIP: 0033:0x7f01fd0d495d
[    1.961294] RSP: 002b:00007fffe99acac0 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[    1.962151] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f01fd0d495d
[    1.962905] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[    1.963613] RBP: 00007fffe99acb00 R08: 0000000000000000 R09: 00007f01fda21b40
[    1.964339] R10: 00007f01fda21e10 R11: 0000000000000246 R12: 0000000000000000
[    1.965089] R13: 00007fffe99acbb0 R14: 0000000000000000 R15: 00000000ffffffff
[    1.965874] Code: 49 83 78 10 00 4d 8b 30 0f 84 fe 00 00 00 4d 85 f6 0f 84 f5 00 00 00 49 63 47 20 49 8b 3f 4c 01 f0 40 f6 c7 0f 0f 85 21 01 00 00 <48> 8b 18 48 8d 4a 01 4c 89 f0 65 48 0f c7 0f 0f 94 c0 84 c0 74 
[    1.967857] RIP: kmem_cache_alloc+0x84/0x1b0 RSP: ffffb70880733d68
[    1.968534] ---[ end trace 6da8659d65b4fdb4 ]---


This is the first traceback from the second boot attempt:

[    2.064734] general protection fault: 0000 [#1] SMP
[    2.065348] Modules linked in: virtio_net virtio_console virtio_blk virtio_rng qxl drm_kms_helper ttm crc32c_intel drm serio_raw qemu_fw_cfg virtio_pci ata_generic virtio_ring virtio pata_acpi
[    2.066935] CPU: 1 PID: 277 Comm: systemd-udevd Not tainted 4.11.0-0.rc2.git2.2.fc26.x86_64 #1
[    2.067750] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
[    2.067803] EXT4-fs (vda2): mounted filesystem with ordered data mode. Opts: (null)
[    2.069335] task: ffff9ff676b7a580 task.stack: ffffb77880628000
[    2.069886] RIP: 0010:__kmalloc+0xa4/0x210
[    2.070264] RSP: 0018:ffffb7788062bd78 EFLAGS: 00010246
[    2.070759] RAX: 2d316f6974726976 RBX: 00000000014080c0 RCX: ffffffffaef2f200
[    2.071511] RDX: 00000000000000a6 RSI: 0000000000000000 RDI: 000000000001caa0
[    2.072414] RBP: ffffb7788062bda8 R08: ffff9ff6bfd1caa0 R09: ffff9ff6bd002000
[    2.073306] R10: 0000000000000000 R11: 0000000000000000 R12: 2d316f6974726976
[    2.074490] R13: 00000000014080c0 R14: 0000000000000410 R15: ffff9ff6bd002000
[    2.076150] FS:  00007fa8c3eac8c0(0000) GS:ffff9ff6bfd00000(0000) knlGS:0000000000000000
[    2.077344] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.078117] CR2: 00007ffe185daff8 CR3: 0000000036b92000 CR4: 00000000003406e0
[    2.079504] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    2.081197] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    2.082486] Call Trace:
[    2.082939]  ? sk_prot_alloc+0x86/0x130
[    2.083413]  sk_prot_alloc+0x86/0x130
[    2.083919]  sk_alloc+0x31/0x230
[    2.084339]  __netlink_create+0x37/0xc0
[    2.084867]  netlink_create+0x116/0x240
[    2.085442]  __sock_create+0xe3/0x200
[    2.085962]  SyS_socket+0x55/0xb0
[    2.086403]  do_syscall_64+0x67/0x170
[    2.086914]  entry_SYSCALL64_slow_path+0x25/0x25
[    2.087641] RIP: 0033:0x7fa8c2b0afa7
[    2.088260] RSP: 002b:00007ffe185dccd8 EFLAGS: 00000286 ORIG_RAX: 0000000000000029
[    2.089619] RAX: ffffffffffffffda RBX: 000055bf64340860 RCX: 00007fa8c2b0afa7
[    2.091501] RDX: 000000000000000f RSI: 0000000000080803 RDI: 0000000000000010
[    2.094067] RBP: 000055bf643310e0 R08: 000055bf64340860 R09: 00000000000000fd
[    2.095912] R10: 000055bf626dd520 R11: 0000000000000286 R12: 00000000ffffffff
[    2.097110] R13: 0000000000000000 R14: 000055bf64340230 R15: 0000000000000556
[    2.098127] Code: 49 83 78 10 00 4d 8b 20 0f 84 03 01 00 00 4d 85 e4 0f 84 fa 00 00 00 49 63 41 20 49 8b 39 4c 01 e0 40 f6 c7 0f 0f 85 5f 01 00 00 <48> 8b 18 48 8d 4a 01 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 
[    2.100267] RIP: __kmalloc+0xa4/0x210 RSP: ffffb7788062bd78
[    2.101261] ---[ end trace 570401f32630b870 ]---


This is the first (and only) traceback from the third boot attempt:

[    1.803931] general protection fault: 0000 [#1] SMP
[    1.804630] Modules linked in: virtio_console(+) virtio_blk virtio_rng qxl crc32c_intel drm_kms_helper serio_raw ttm drm virtio_pci virtio_ring virtio ata_generic pata_acpi qemu_fw_cfg
[    1.809650] CPU: 1 PID: 290 Comm: systemd-udevd Not tainted 4.11.0-0.rc2.git2.2.fc26.x86_64 #1
[    1.813569] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
[    1.817512] task: ffff946876a10000 task.stack: ffffba1940604000
[    1.817516] RIP: 0010:vp_modern_find_vqs+0x39/0x70 [virtio_pci]
[    1.817517] RSP: 0018:ffffba1940607a68 EFLAGS: 00010282
[    1.817518] RAX: ffffba19403d5000 RBX: 2d316f6974726976 RCX: 0000000000000000
[    1.817518] RDX: 00000000000000fc RSI: ffffba19403d501c RDI: 0000000000000001
[    1.817519] RBP: ffffba1940607a88 R08: 000000000001c960 R09: ffffffffa841f129
[    1.817519] R10: ffffe2bd41f25640 R11: 0000000000000000 R12: ffff946876340b08
[    1.817519] R13: 0000000000000000 R14: ffff946876340800 R15: 000000000000001f
[    1.817520] FS:  00007fa22bb378c0(0000) GS:ffff9468bfd00000(0000) knlGS:0000000000000000
[    1.817521] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.817521] CR2: 00007f285417f000 CR3: 00000000364ef000 CR4: 00000000003406e0
[    1.817524] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    1.817524] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    1.817525] Call Trace:
[    1.817529]  init_vqs+0x1a0/0x2e0 [virtio_console]
[    1.817531]  virtcons_probe+0xb9/0x360 [virtio_console]
[    1.817533]  virtio_dev_probe+0x144/0x1e0 [virtio]
[    1.817535]  driver_probe_device+0x106/0x450
[    1.817537]  __driver_attach+0xa4/0xe0
[    1.817538]  ? driver_probe_device+0x450/0x450
[    1.817538]  bus_for_each_dev+0x6e/0xb0
[    1.817539]  driver_attach+0x1e/0x20
[    1.817540]  bus_add_driver+0x1d0/0x270
[    1.817542]  ? virtio_cons_early_init+0x1d/0x1d [virtio_console]
[    1.817543]  driver_register+0x60/0xe0
[    1.817544]  ? virtio_cons_early_init+0x1d/0x1d [virtio_console]
[    1.817545]  register_virtio_driver+0x20/0x30 [virtio]
[    1.817547]  init+0x9f/0xfe3 [virtio_console]
[    1.817548]  do_one_initcall+0x50/0x1a0
[    1.817549]  ? free_hot_cold_page+0x19a/0x300
[    1.817551]  ? kmem_cache_alloc_trace+0x15f/0x1c0
[    1.817552]  ? do_init_module+0x27/0x1e6
[    1.817553]  do_init_module+0x5f/0x1e6
[    1.817554]  load_module+0x22b7/0x2820
[    1.817555]  ? __symbol_put+0x60/0x60
[    1.817557]  SYSC_init_module+0x16f/0x1a0
[    1.817559]  SyS_init_module+0xe/0x10
[    1.817561]  do_syscall_64+0x67/0x170
[    1.817563]  entry_SYSCALL64_slow_path+0x25/0x25
[    1.817564] RIP: 0033:0x7fa22a7953da
[    1.817564] RSP: 002b:00007ffd9b028ed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[    1.817566] RAX: ffffffffffffffda RBX: 0000563dc91893f0 RCX: 00007fa22a7953da
[    1.817566] RDX: 00007fa22b2ca9c5 RSI: 000000000000b37b RDI: 0000563dc99985d0
[    1.817567] RBP: 00007fa22b2ca9c5 R08: 0000563dc918bd30 R09: 00007ffd9b0280eb
[    1.817568] R10: 00007fa22aa4fb00 R11: 0000000000000246 R12: 0000563dc99985d0
[    1.817569] R13: 0000563dc9185270 R14: 0000000000020000 R15: 0000563dc7aa9fca
[    1.817570] Code: 54 53 49 89 fe e8 78 0d 00 00 85 c0 41 89 c5 75 44 49 8b 9e 08 03 00 00 4d 8d a6 08 03 00 00 4c 39 e3 74 31 49 8b 86 38 03 00 00 <0f> b7 7b 28 48 8d 70 16 e8 ba 2d 29 e8 49 8b 86 38 03 00 00 bf 
[    1.817593] RIP: vp_modern_find_vqs+0x39/0x70 [virtio_pci] RSP: ffffba1940607a68
[    1.817654] ---[ end trace b2ba09fd6ab60155 ]---

After this one, I saw:
[   ***] A start job is running for udev Wai…e Initialization (1min 16s / 3min)

In this case, the system booted after 3 minutes of timeout.



Version-Release number of selected component (if applicable):
kernel-4.11.0-0.rc2.git2.2.fc26.x86_64

How reproducible:
30-50%

Steps to Reproduce:
1. create a VM in virt-manager (spice, qxl, 2 cores, 2GB RAM)
2. install Workstation Live Alpha RC1, fully update
3. boot the system multiple times, see if it doesn't boot properly
4. capture the traceback by replacing "rhgb quiet" kernel options with "console=ttyS0" and connect to it by "virsh console VM"
Comment 1 Kamil Páral 2017-03-21 10:42:15 EDT
Created attachment 1265097 [details]
vm.xml
Comment 2 Kamil Páral 2017-03-21 10:42:23 EDT
Created attachment 1265098 [details]
rpm-qa
Comment 3 Kamil Páral 2017-03-21 10:42:38 EDT
Created attachment 1265099 [details]
panic1.txt
Comment 4 Kamil Páral 2017-03-21 10:42:47 EDT
Created attachment 1265100 [details]
panic2.txt
Comment 5 Kamil Páral 2017-03-21 10:42:55 EDT
Created attachment 1265101 [details]
panic3.txt
Comment 6 Kamil Páral 2017-03-21 14:34:36 EDT
Proposing as an alpha blocker under:
" A system installed with a release-blocking desktop must boot to a log in screen where it is possible to log in to a working desktop using a user account created during installation or a 'first boot' utility. "
https://fedoraproject.org/wiki/Fedora_26_Alpha_Release_Criteria#Expected_installed_system_boot_behavior

Provided that other people can replicate my issues and it's not just me affected by this. Also please note this might be actually a duplicate of bug 1433899.
Comment 7 Kamil Páral 2017-03-22 06:32:28 EDT
I forgot to add, my host system is F25:
kernel-4.10.4-200.fc25.x86_64
libvirt-2.2.0-2.fc25.x86_64
virt-manager-1.4.1-1.fc25.noarch
qemu-system-x86-2.7.1-4.fc25.x86_64
Comment 8 Kamil Páral 2017-03-22 07:32:46 EDT
The above was an updated VM from updates-testing. But after some attempts I reproduced the panic even with completely clean Workstation Live Alpha RC1.1 installation (no updates).
Comment 9 Jan Sedlák 2017-03-22 07:45:04 EDT
It happened to me also - kernel panic right at the start of the boot.
Comment 10 Kamil Páral 2017-03-22 08:17:41 EDT
The third traceback in comment 0 seems to be the same as in bug 1430297.
Comment 11 Stephen Gallagher 2017-03-22 08:38:32 EDT
I can also confirm this with WS Live Alpha RC1.2. I booted four times, first time with disk check which succeeded, second time without disk-check led to a kernel panic, third time without a disk-check led to a kernel panic, fourth time without disk-check succeeded.
Comment 12 Stephen Gallagher 2017-03-22 09:13:51 EDT
I cannot reproduce this on an *installed* VM of Workstation. I can only get it to panic when booting the Live media. I've been rebooting the installed OS repeatedly for a while now (at least ten boots) and I haven't encountered the panic. I did hit the three-minute timeout that Kamil mentioned twice, but it finished booting.
Comment 13 Kamil Páral 2017-03-22 09:23:03 EDT
(In reply to Stephen Gallagher from comment #12)
> I did hit the three-minute timeout that Kamil mentioned twice,
> but it finished booting.

Check the logs, I can usually see a kernel panic shortly before this happens, the system just seems to recover in this case (but I saw poweroff hanging when this happened).
Comment 14 Adam Williamson 2017-03-22 11:45:37 EDT
Note that we have explicit virt criteria, and they're *Beta* criteria:

https://fedoraproject.org/wiki/Fedora_26_Beta_Release_Criteria#Virtualization_requirements

"The release must be able host virtual guest instances of the same release." is the relevant one here.

This has always been held to mean that virt-only bugs can't block Alpha. It is, admittedly, a longstanding situation and came about when use of virtualization was somewhat less common than it is now, but it's how things are at present. On that basis I'm -1 Alpha blocker here, unless we adjust the criteria.
Comment 15 Mike Ruckman 2017-03-22 12:06:03 EDT
I just tested with RC1.2 on Bare metal and it all worked fine. I concur with Adam.

-1 Alpha blocker.
Comment 16 Richard W.M. Jones 2017-03-22 17:26:42 EDT
FWIW I have (only once) seen the copy_process.part / _do_fork
traceback, with the latest upstream kernel from git (093b995e3b55a)
when booting the kernel virtualized under qemu-2.8.0-2.fc26.
Comment 17 Richard W.M. Jones 2017-03-22 17:31:13 EDT
A good way to see these issues is to run the following command
(all one line).  Run it as a normal non-root user:

  rm -rf /var/tmp/.guestfs-*; while LIBGUESTFS_BACKEND=direct libguestfs-test-tool -t 120 >& /tmp/log ; do echo -n . ; done

Then examine the log file containing the failure:

  cat /tmp/log
Comment 18 sumantro 2017-03-23 02:20:31 EDT
Tested on Alpha RC 1.2 on bare metal and Virtual Machine Manager couldn't reproduce Kernel Panic.
Comment 19 Kamil Páral 2017-03-23 04:51:11 EDT
(In reply to Richard W.M. Jones from comment #17)
>   rm -rf /var/tmp/.guestfs-*; while LIBGUESTFS_BACKEND=direct
> libguestfs-test-tool -t 120 >& /tmp/log ; do echo -n . ; done

I couldn't reproduce the issue on my F25 host (I assume that's because the test tool boots the same version of OS as the host system is), but Petr Schindler reproduced it on the very first run on his F26 host. Great reproducer, thanks.
Comment 20 Richard W.M. Jones 2017-03-23 05:56:36 EDT
(In reply to Kamil Páral from comment #19)
> (In reply to Richard W.M. Jones from comment #17)
> >   rm -rf /var/tmp/.guestfs-*; while LIBGUESTFS_BACKEND=direct
> > libguestfs-test-tool -t 120 >& /tmp/log ; do echo -n . ; done
> 
> I couldn't reproduce the issue on my F25 host (I assume that's because the
> test tool boots the same version of OS as the host system is), but Petr
> Schindler reproduced it on the very first run on his F26 host. Great
> reproducer, thanks.

It tests the highest numbered installed kernel (not necessarily the
running kernel).  This issue does not affect F25 kernels at all (or
any kernel < 4.11) so you would not expect to see it in F25 unless
you had installed an F26 kernel using

dnf update kernel --releasever=26 --best
Comment 21 Jan Kurik 2017-03-23 09:10:40 EDT
I concur with Adam and Mike here. If we can not reproduce it on bare metal, I am -1 to block Alpha on this.
Comment 22 Dusty Mabe 2017-03-23 14:17 EDT
Created attachment 1265870 [details]
dusty-panic1.txt

I've seen a lot of traces from VMs. here are a few.
Comment 23 Dusty Mabe 2017-03-23 14:17 EDT
Created attachment 1265871 [details]
dusty-panic2.txt

Another one
Comment 24 Adam Williamson 2017-03-23 17:14:46 EDT
As 1430297 is the earliest report, and we're fairly sure these are all the same problem, marking as a dupe of that. A kernel build with a potential fix is currently running, we will ask all affected people to test with that build once it's done. We can un-dupe reports later if there turn out to be separate bugs.

*** This bug has been marked as a duplicate of bug 1430297 ***

Note You need to log in before you can comment on or make changes to this bug.