Bug 1756655

Summary:	kernel-5.4.0-0.rc0.git8.1.fc32 and kernel-5.4.0-0.rc0.git7.1.fc32 hang on boot
Product:	[Fedora] Fedora	Reporter:	Kevin Fenzi <kevin>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED RAWHIDE	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	rawhide	CC:	airlied, bskeggs, hdegoede, ichavero, itamar, jarodwilson, jcline, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, masami256, mchehab, mjg59, steved
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-10-02 19:27:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Kevin Fenzi 2019-09-28 22:56:40 UTC

I noticed this from rawhide composes. The Cloud Base compose failed and it seems the boot never finishes with these kernels. 

The x86_64 one has a screenshot: https://koji.fedoraproject.org/koji/taskinfo?taskID=37916159

I captured some console messages on aarch64: 

B Keyboard] on usb-0000:02:00.0-2/input0
[   40.056886] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:12]
[   40.058373] Modules linked in:
[   40.058994] irq event stamp: 32947716
[   40.059737] hardirqs last  enabled at (32947715): [<ffffa40c1771a74c>] _raw_write_unlock_irqrest
ore+0x74/0xa0
[   40.061709] hardirqs last disabled at (32947716): [<ffffa40c16a53734>] el1_irq+0xb4/0x1c0
[   40.063327] softirqs last  enabled at (32935094): [<ffffa40c16a51e80>] __do_softirq+0x480/0x57c
[   40.065048] softirqs last disabled at (32935087): [<ffffa40c16ade59c>] irq_exit+0x11c/0x148
[   40.066708] CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.4.0-0.rc0.git7.1.fc32.aarch64 #1
[   40.068370] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[   40.069753] Workqueue: events kmemleak_do_cleanup
[   40.070701] pstate: 60400005 (nZCv daif +PAN -UAO)
[   40.071654] pc : _raw_write_unlock_irqrestore+0x78/0xa0
[   40.072704] lr : _raw_write_unlock_irqrestore+0x74/0xa0
[   40.073747] sp : ffff800010073cb0
[   40.074415] x29: ffff800010073cb0 x28: ffffa40c1851ded8 
[   40.075479] x27: ffffa40c18331000 x26: 0000000000000000 
[   40.076542] x25: ffffa40c18331278 x24: 0000000000000000 
[   40.077631] x23: ffff0000c1173c00 x22: 0000000000000000 
[   40.078699] x21: ffffa40c16d8a4ec x20: ffffa40c18394d58 
[   40.079755] x19: 0000000000000000 x18: 0000000000000014 
[   40.080818] x17: 000000005c0a3c18 x16: 00000000ff323a58 
[   40.081880] x15: 0000000041d580da x14: 0000000059bc1e33 
[   40.082951] x13: 00000000560cec5e x12: ffffffffffffffff 
[   40.084016] x11: 0000000000000040 x10: 0101010101010101 
[   40.085080] x9 : 0000000000000002 x8 : ffffa40c18394d70 
[   40.086148] x7 : ffffa40c16d8a490 x6 : 0000000000000000 
[   40.087209] x5 : ffff800010073c58 x4 : 0000000000000003 
[   40.088288] x3 : 0000000000000002 x2 : 0000000000000007 
[   40.089353] x1 : ffff0000c0168918 x0 : ffff0000c0168000 
[   40.090420] Call trace:
[   40.090919]  _raw_write_unlock_irqrestore+0x78/0xa0
[   40.091899]  find_and_remove_object+0x8c/0xa0
[   40.092777]  delete_object_full+0x24/0x38
[   40.093585]  __kmemleak_do_cleanup+0x88/0x148
[   40.094461]  kmemleak_do_cleanup+0x60/0x80
[   40.095287]  process_one_work+0x278/0x6b0
[   40.096093]  worker_thread+0x50/0x408
[   40.096835]  kthread+0x108/0x138
[   40.097504]  ret_from_fork+0x10/0x18
[   60.696908] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 46s!
[   60.698436] Showing busy workqueues and worker pools:
[   60.699373] workqueue events: flags=0x0
[   60.700107]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=3/256
[   60.701279]     in-flight: 12:kmemleak_do_cleanup
[   60.702135]     pending: psi_avgs_work, vmstat_shepherd
[   60.703100] workqueue mm_percpu_wq: flags=0x8
[   60.703888]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[   60.704965]     pending: vmstat_update
[   60.705664] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=46s workers=2 idle: 5
[   78.986882] rcu: INFO: rcu_sched self-detected stall on CPU
[   78.988070] rcu:     0-....: (6495 ticks this GP) idle=252/1/0x4000000000000002 softirq=974/974 
fqs=3246 
[   78.989992]  (t=6500 jiffies g=505 q=318)
[   78.990822] Task dump for CPU 0:
[   78.991471] kworker/0:1     R  running task    13968    12      2 0x0000002a
[   78.992889] Workqueue: events kmemleak_do_cleanup
[   78.993829] Call trace:
[   78.994324]  dump_backtrace+0x0/0x148
[   78.995053]  show_stack+0x24/0x30
[   78.995716]  sched_show_task+0x224/0x250
[   78.996505]  dump_cpu_task+0x48/0x58
[   78.997227]  rcu_dump_cpu_stacks+0xa4/0xe4
[   78.998044]  check_cpu_stall.isra.0+0x4a8/0x670
[   78.998943]  rcu_sched_clock_irq+0xdc/0x438
[   78.999778]  update_process_times+0x34/0x70
[   79.000606]  tick_sched_handle.isra.0+0x34/0x70
[   79.001525]  tick_sched_timer+0x50/0xa0
[   79.002293]  __run_hrtimer+0x12c/0x560
[   79.003043]  __hrtimer_run_queues+0x80/0xe0
[   79.003875]  hrtimer_interrupt+0xec/0x248
[   79.004672]  arch_timer_handler_virt+0x3c/0x50
[   79.005556]  handle_percpu_devid_irq+0xe4/0x438
[   79.006464]  generic_handle_irq+0x34/0x50
[   79.007262]  __handle_domain_irq+0x6c/0xc0
[   79.008074]  gic_handle_irq+0x58/0xa8
[   79.008805]  el1_irq+0xf4/0x1c0
[   79.009434]  _raw_write_unlock_irqrestore+0x78/0xa0
[   79.010400]  find_and_remove_object+0x8c/0xa0
[   79.011277]  delete_object_full+0x24/0x38
[   79.012074]  __kmemleak_do_cleanup+0x88/0x148
[   79.012937]  kmemleak_do_cleanup+0x60/0x80
[   79.013749]  process_one_work+0x278/0x6b0
[   79.014550]  worker_thread+0x50/0x408
[   79.015278]  kthread+0x108/0x138
[   79.015932]  ret_from_fork+0x10/0x18
[  104.056878] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:12]
[  104.058568] Modules linked in:
[  104.059308] irq event stamp: 113632378
[  104.060196] hardirqs last  enabled at (113632377): [<ffffa40c1771a74c>] _raw_write_unlock_irqres
tore+0x74/0xa0
[  104.062509] hardirqs last disabled at (113632378): [<ffffa40c16a53734>] el1_irq+0xb4/0x1c0
[  104.064557] softirqs last  enabled at (58956150): [<ffffa40c16a51e80>] __do_softirq+0x480/0x57c
[  104.066739] softirqs last disabled at (58956077): [<ffffa40c16ade59c>] irq_exit+0x11c/0x148
[  104.068852] CPU: 0 PID: 12 Comm: kworker/0:1 Tainted: G             L    5.4.0-0.rc0.git7.1.fc32
.aarch64 #1
[  104.071292] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  104.073074] Workqueue: events kmemleak_do_cleanup
[  104.074232] pstate: 60400005 (nZCv daif +PAN -UAO)
[  104.075464] pc : _raw_write_unlock_irqrestore+0x78/0xa0
[  104.076771] lr : _raw_write_unlock_irqrestore+0x74/0xa0
[  104.078041] sp : ffff800010073cb0
[  104.078869] x29: ffff800010073cb0 x28: ffffa40c1851ded8 
[  104.080203] x27: ffffa40c18331000 x26: 0000000000000000 
[  104.081524] x25: ffffa40c18331278 x24: 0000000000000000 
[  104.082890] x23: ffff0000c1173c00 x22: 0000000000000000 
[  104.084225] x21: ffffa40c16d8a4ec x20: ffffa40c18394d58 
[  104.085549] x19: 0000000000000000 x18: 0000000000000014 
[  104.086859] x17: 000000005c0a3c18 x16: 00000000ff323a58 
[  104.088141] x15: 0000000041d580da x14: 0000000059bc1e33 
[  104.089456] x13: 00000000560cec5e x12: ffffffffffffffff 
[  104.090765] x11: 0000000000000040 x10: 0101010101010101 
[  104.092101] x9 : 0000000000000002 x8 : ffffa40c18394d70 
[  104.093433] x7 : ffffa40c16d8a490 x6 : 0000000000000000 
[  104.094676] x5 : ffff800010073c58 x4 : 0000000000000003 
[  104.095914] x3 : 0000000000000002 x2 : 0000000000000007 
[  104.097147] x1 : ffff0000c0168918 x0 : ffff0000c0168000 
[  104.098384] Call trace:
[  104.098978]  _raw_write_unlock_irqrestore+0x78/0xa0
[  104.100126]  find_and_remove_object+0x8c/0xa0
[  104.101163]  delete_object_full+0x24/0x38
[  104.102105]  __kmemleak_do_cleanup+0x88/0x148
[  104.103157]  kmemleak_do_cleanup+0x60/0x80
[  104.104130]  process_one_work+0x278/0x6b0
[  104.105084]  worker_thread+0x50/0x408
[  104.105956]  kthread+0x108/0x138
[  104.106733]  ret_from_fork+0x10/0x18

Also, it fails to boot on my laptop. I can get a screenshot if it helps. 

I suspect this won't be too hard to duplicate. :) 

If so, happy to provide more info...

Comment 1 Jeremy Cline 2019-10-02 14:37:24 UTC

Okay, tracked down the issue, it only impacts debug kernels so the rc1 kernel would work regardless. I've picked up the proposed upstream patch as well so the debug kernel should also boot (it does for me, anyway). Build is https://koji.fedoraproject.org/koji/taskinfo?taskID=38006261.