Bug 1433899

Summary: Workstation Live panics during boot
Product: [Fedora] Fedora Reporter: Kamil Páral <kparal>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 26CC: awilliam, cz172638, gansalmon, gmarr, ichavero, itamar, jkurik, jonathan, jsedlak, kernel-maint, madhu.chinakonda, mchehab, mruckman, robatino, sumukher
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-23 21:14:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1349184    
Attachments:
Description Flags
kernel panic screenshot
none
another panic screenshot, maybe better none

Description Kamil Páral 2017-03-20 10:46:14 UTC
Created attachment 1264736 [details]
kernel panic screenshot

Description of problem:
If media test is attempted, Workstation Live panics on boot (even before mediacheck is started). If I don't attempt media test, the image boots fine.

Version-Release number of selected component (if applicable):
Fedora-Workstation-Live-x86_64-26_Alpha-1.1.iso
dracut-044-177.fc26.x86_64
kernel-4.11.0-0.rc2.git2.2.fc26.x86_64

How reproducible:
always

Steps to Reproduce:
1. use a default virt-manager VM
2. mount Workstation Live Alpha RC1.1 and try to boot it with media check performed
3. immediate kernel panic

Comment 1 Kamil Páral 2017-03-20 10:50:52 UTC
Created attachment 1264737 [details]
another panic screenshot, maybe better

Comment 2 Kamil Páral 2017-03-20 10:53:03 UTC
Both I and jsedlak reproduced this (F25 host). However, after trying a few times, it started working (the mediacheck is performed and completed fine, image boots) and we can't reproduce this anymore, even when trying many times. So perhaps this is a race condition?

Comment 3 Kamil Páral 2017-03-20 10:55:27 UTC
Proposing as a conditional blocker under:
"All release-blocking images must boot in their supported configurations. "
https://fedoraproject.org/wiki/Fedora_26_Alpha_Release_Criteria#Release-blocking_images_must_boot

Let's see how many people and how often can reproduce this (please try multiple times).

Comment 4 Kamil Páral 2017-03-20 11:07:41 UTC
So, after few minutes, jsedlak reproduced this again. Also, this sometimes seems to happen before mediacheck is started, and sometimes after it reaches 100%.

Comment 5 Jan Sedlák 2017-03-20 11:12:52 UTC
I was able to reproduce it using serial console. The VM running this has two CPU cores. This is its output:

[jsedlak@dhcp-28-124 ~]$ sudo virsh console fedora25
Connected to domain fedora25
Escape character is ^]
[    3.380815] dracut-pre-udev[364]: rpcbind: /run/rpcbind/rpcbind.lock: No such file or directory
[    3.742400] general protection fault: 0000 [#1] SMP
[    3.743016] Modules linked in: garp stp llc mrp virtio_blk virtio_net virtio_console crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi qemu_fw_cfg sunrpc scsi_transport_iscsi loop
[    3.744636] CPU: 1 PID: 21 Comm: rcuos/1 Not tainted 4.11.0-0.rc2.git2.2.fc26.x86_64 #1
[    3.745218] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
[    3.745839] task: ffff9fe97a14a580 task.stack: ffffc1aac03c0000
[    3.746307] RIP: 0010:rcu_nocb_kthread+0x15d/0x500
[    3.746692] RSP: 0018:ffffc1aac03c3e78 EFLAGS: 00010282
[    3.747088] RAX: ff0074757074756f RBX: ffff9fe97d51a3c0 RCX: ffff9fe97a14a580
[    3.747623] RDX: 0000000080000000 RSI: 0000000000000200 RDI: ffff9fe97ed0c000
[    3.748161] RBP: ffffc1aac03c3ef8 R08: ffff9fe97ed0c600 R09: 000000018010000d
[    3.748702] R10: fffff43041fdec40 R11: 0000000000003d00 R12: 000000000000007c
[    3.749219] R13: 000000000000007c R14: ffff9fe97ed0c000 R15: 2d316f6974726976
[    3.749740] FS:  0000000000000000(0000) GS:ffff9fe97d500000(0000) knlGS:0000000000000000
[    3.750314] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.750738] CR2: 000056183737d2f8 CR3: 000000007f9cc000 CR4: 00000000003406e0
[    3.751433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.752551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    3.753220] Call Trace:
[    3.753517]  ? get_state_synchronize_sched+0x20/0x20
[    3.753974]  kthread+0x11e/0x140
[    3.754196]  ? kthread_park+0x90/0x90
[    3.754453]  ret_from_fork+0x2c/0x40
[    3.754803] Code: 01 00 00 00 e8 85 63 75 00 4d 8b 3e 4d 85 ff 74 ee 65 81 05 b2 a9 ef 47 00 02 00 00 49 8b 46 08 4c 89 f7 48 3d ff 0f 00 00 76 2e <ff> d0 be 00 02 00 00 48 c7 c7 ef 28 11 b8 45 8d 6c 24 01 e8 0b 
[    3.757637] RIP: rcu_nocb_kthread+0x15d/0x500 RSP: ffffc1aac03c3e78
[    3.758500] ---[ end trace 68270651d3d36818 ]---
[    3.759157] Kernel panic - not syncing: Fatal exception in interrupt
[    3.760101] Kernel Offset: 0x37000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    3.761565] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Comment 6 sumantro 2017-03-20 12:06:18 UTC
I have the same problem while booting in Fedora 26 Aplha 1.1 in Virtual Machine Manager.

Comment 7 Kamil Páral 2017-03-20 14:15:28 UTC
Petr Schindler has hit this as well. Even though it seems like a race, it's clearly very common.

Comment 8 Adam Williamson 2017-03-20 15:33:05 UTC
https://fedoraproject.org/wiki/Fedora_26_Final_Release_Criteria#Media_consistency_verification seems like the most relevant criterion here, and is for Final.

Comment 9 Mike Ruckman 2017-03-20 16:19:44 UTC
This didn't happen on my latest bare metal installation.

Comment 10 Geoffrey Marr 2017-03-20 21:00:21 UTC
Discussed during the 2017-03-20 blocker review meeting: [1]

The decision was made to classify this bug as an AcceptedBlocker (Final) as it violates the following criteria:

"Validation of install media must work correctly for all release-blocking images."

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2017-03-20/f26-blocker-review.2017-03-20-16.06.txt

Comment 11 Kamil Páral 2017-03-22 13:15:16 UTC
See bug 1434462 comment 11, which was supposed to be present here (Stephen talking about booting Live). Re-proposing for Alpha, it's happening even without mediacheck. My suspicion is that this is the same bug as bug 1434462.

Comment 12 Jan Kurik 2017-03-23 13:23:36 UTC
Does it happen only in VM ? If so, we might use the same criterion as in https://bugzilla.redhat.com/show_bug.cgi?id=1434462#c14 and leave this blocker for Beta, instead of blocking Alpha.

Comment 13 Adam Williamson 2017-03-23 15:38:31 UTC
It's probably the same bug, as Kamil said. I was expecting us to wind up marking them as dupes.

Comment 14 Adam Williamson 2017-03-23 17:13:04 UTC
I also suspect https://bugzilla.redhat.com/show_bug.cgi?id=1430297 is the same bug, and they're all the same as https://bugzilla.kernel.org/show_bug.cgi?id=194911 .

Comment 15 Adam Williamson 2017-03-23 21:14:21 UTC
As 1430297 is the earliest report, and we're fairly sure these are all the same problem, marking as a dupe of that. A kernel build with a potential fix is currently running, we will ask all affected people to test with that build once it's done. We can un-dupe reports later if there turn out to be separate bugs.

*** This bug has been marked as a duplicate of bug 1430297 ***

Comment 16 Adam Williamson 2017-04-04 19:18:35 UTC
This got fixed before Alpha, so doesn't need commonbugs.